Projects and Tutorials

Data Mining & Machine Learning
Document Video

Principal component Analysis (PCA ):Tutorial

User Ratings :

Let's begin our journey into the cell line data with a review of Principal Component Analysis (PCA). PCA is a statistical approach from linear algebra that uses a matrix of covariance to find an efficient way to look at data summarizing it in such a way as to preserve it’s variance. There is a discussion on whether this analysis method can be considered “learning” per se. Whether you think so or not, we will explain how the method works and then demonstrate how it can help us describe the variability between samples in this breast cancer cell line dataset.

PCA is a useful approach for exploratory analysis and data preparation. PCA distills, or compresses the data from a large number of variables – in our case tens of thousands of genes – into a small number of variables that help make sense of the data. The output from PCA is a set of components, PC1, PC2… PCN, where the first component explains the majority of variability, the second the next largest variability, etc. In addition, for each sample we get a component score for each PC. 

These can then be displayed visually in a 2D or 3D plot, showing the grouping of the analyzed samples. Each of the variables usually correlates strongly with a specific component in some way. 

To learn about the Instructions to set parameters when you select PC_R_Library, visit the lesson on Principal component Analysis (PCA ):Tutorial

https://learn.omicslogic.com/Learn/course-5-transcriptomics/lesson/11-t3-principal-component-analysis-pca-tutorial

 

, where you will learn PCA pipeline step wise instructions and also learn to interpret the plots obtained from the pipeline.