Projects and Tutorials

Differential Expression
Document Video

Differential Expression on T-Bioinfo Server

User Ratings :

Identification of differentially expressed genes involves the identification of genes that are differentially expressed in disease. In pharmaceutical and clinical research, DEGs can be valuable to pinpoint candidate biomarkers, therapeutic targets, and gene signatures for diagnostics.  

The three postulates of differential gene expression are as follows: 1.) Every cell nucleus contains the complete genome established in the fertilized egg. In molecular terms, the DNAs of all differentiated cells are identical. 2.) The unused genes in differentiated cells are not destroyed or mutated, and they retain the potential for being expressed. 3.) Only a small percentage of the genome is expressed in each cell, and a portion of the RNA synthesized in the cell is specific for that cell type.

To run the differential expression analysis pipeline, it is necessary that the data must be read count values/raw counts, must contain the gene ids/symbols with no duplicate genes in the data and should have no blank line at the end of the file. There are different methods for differential expression analysis such as edgeR & DESeq. The DESeq2 package is designed for normalization, visualization, & differential analysis of high- dimensional count data. It makes use of empirical Bayes techniques to estimate priors for log fold change and dispersion, & to calculate posterior estimates for these quantities. It requires setting up parameters for count filter, volcano plot threshold & database to be referred (human or mouse). To understand the biological implications of the significant genes, we will perform Enrichment Analysis & GSEA (Gene Set Enrichment Analysis) analysis. Based on the enrichment analysis, we will understand what are important pathways in which our significant genes are enriched. Similarly, we will learn what the important Gene Ontology (GO) terms are significantly associated with the set of significant genes. Users can optimize these parameters. But, to consider most significant genes and significantly enriched terms and pathways 0.05 is the key threshold.