Projects and Tutorials

Data Mining & Machine Learning

Differentially Expressed Genes in Alzheimers

User Ratings :

I n this project, we will explore transcriptomics (RNA-seq) data of advanced staged AD patients to identify what are those altered pathways. Here, first we will explore transcriptomics data of advanced staged AD patients and normal samples employing PCA. Subsequently, significant differentially expressed genes would be identified using differential genes expression analysis. Eventually, we will scrutinize what are the important pathways altered in advanced AD in comparison to normal samples using gene enrichment and Gene set enrichment analysis (GSEA) analysis. Alzheimer's is a progressive neurodegenerative disorder where patients suffer from loss of memory, cognitive functions, possibly leading to loss of the ability to respond to the environment and perform daily tasks. Alzheimer’s disease (AD) is named after the German psychiatric Alois Alzheimer who observed it for the first time in 1907. [McGirr S, Venegas C, Swaminathan A. Alzheimer’s Disease: A Brief Review. J Exp Neurol. 2020;1(3): 89-98] It is the main cause of dementia worldwide.

Fig. The physiological structure of the brain and neurons in (a) healthy brain and (b) Alzheimer’s disease (AD) brain. (Source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7764106/ )

The factors that result in Alzheimer's are difficult to understand. It is a multifactorial disease and the risk factors involve Ageing, genetics, environmental factors, and medical conditions.

Some hypotheses proposed to understand cause and mechanism of AD are- cholinergic hypothesis, Tau hypothesis and most accepted Amyloid cascade hypothesis. The Cholinergic hypothesis is based on the observation that AD patients show reduced activity of enzymes choline acetyltransferase (ChAT) and acetylcholinesterase (AChE) in the cerebral cortex. Intraneuronal neurofibrillary lesions (NFT) formed in the brain’s of AD patients form the basis of another hypothesis i.e., Tau hypothesis. These Neurofibrillary tangles are composed of tau proteins which are normally present in neurons and function in neuronal microtubule network assembly. Due to hyperphosphorylation of Tau proteins, these polymerize into filaments forming neurofibrillary tangles. Third hypothesis, i.e., Amyloid cascade hypothesis points toward overproduction or reduced clearance of amyloid beta (Aβ) peptides in Brain which results in neuronal damage and formation of Senile plaques (SP) as observed in AD patients.

Objective

Previous literature showed the alterations in signaling pathways in AD patients. Thus, in this project, we will explore transcriptomics (RNA-seq) data of advanced staged AD patients to identify what are those altered pathways. Here, first we will explore transcriptomics data of advanced staged AD patients and normal samples employing PCA. Subsequently, significant differentially expressed genes would be identified using differential genes expression analysis. Eventually, we will scrutinize what are the important pathways altered in advanced AD in comparison to normal samples using gene enrichment and Gene set enrichment analysis (GSEA) analysis.

Dataset

We downloaded processed RNA-seq data from GEO with ID - GSE53697

(https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE53697). In this data, there are 8 control samples and 9 samples for advanced Alzheimer’s Disease (AD). Eight controls (no neurofibrillary tangles or plaque pathology), and nine advanced stage AD (CDR between 4 and 5) subjects matched for age and gender with short post mortem intervals (PMI). To get metadata, we downloaded a series matrix file for the data. The metadata is shown in the screenshot of Table (shown in below Figure). Raw RNA-seq data was processed using the RNA-seq pipeline. Briefly, alignment was performed on hg18 using Tophat (unambiguous mapping; mean inner distance: 20; st.dev.: 20) and inference of exon junctions was done using Bayesian method. In this data, we have gene expression in the form of RPKM values and raw read count values for a total 19,185 protein coding genes.

To identify the significant genes between advanced Alzheimer's disease and normal samples, we will perform differential gene expression analysis using the DESeq2 module on the T-Bioinfo server and for pathway annotation we will use enrichment analysis and GSEA analysis. To understand the steps to be followed to run the pipeline and interpret the results, visit: https://learn.omicslogic.com/courses/course/project-12-differentially-expressed-genes-in-alzheimer

For any questions, please email us at support@pine.bio