Projects and Tutorials

Data Mining & Machine Learning
Document Video

Basic PCA: making a scatterplot of Principle Component Analysis results in Excel

User Ratings :

After we run a pipeline to process raw reads from the FASTQ file, we can study the gene expression table. Working with the gene expression table includes understanding our column and row names, as well as numbers that correspond with gene expression level. We can filter the table, selecting specific values or genes and we can do a comprehensive analysis of the full table. We will learn about these by using Quantile Normalization to transform numbers into a consistent scale that provides us with “normal distribution” and use principle component analysis (PCA) to study variability between samples. Multi-sample normalization is a standard and necessary part of every RNA-seq dataset analysis.

 

In the scatter plot obtained, the red points represent ER+ samples, while the green represent TN samples. Each set of samples from ER and TN clearly separate along PC 1 (X axis).


For better understanding, visit the lesson in the Transcriptomics course on Principle Component Analysis on the Omicslogic Learn Portal - https://learn.omicslogic.com/Learn/course-5-transcriptomics/lesson/07-t2-practical-normalization-and-pca-of-gene-expression-data