Data Mining & Machine Learning

Data Mining & Machine Learning

Using PCA Draw - a different PCA on T-BioInfo

PCA identifies linear combinations of genes such that each combination (called a Principal Component) explains the maximum variance. It's often used to make data easy to explore and visualize. We l...

Data Mining & Machine Learning

Supervised Machine Learning: Feature Selection

Feature Selection Methods: swLDA and RF Feature Selection starts with testing all individual features (i.e., genes) and selects the one that provides the best classification quality (for the train...

Data Mining & Machine Learning

Supervised Machine Learning(Support Vector Machine (SVM))

Many times, it is not possible to have any linear discrimination and finding a quadratic function to delineate groups is practically impossible, which reduces prediction accuracy. In those cases, w...

Data Mining & Machine Learning

Supervised Machine Learning(Decision Tree and Random Forest)

Supervised Machine Learning is an algorithm that takes in data that is labeled – typically this is prepared by people who annotate the dataset. In biomedical projects, the annotation could be...

Data Mining & Machine Learning

Unsupervised Machine Learning (K-Mean Clustering)

Another conventional clustering method is called k-means. In this clustering method, we take a number of clusters k as an input parameter, then randomly select k initial “centroids” in ...

Data Mining & Machine Learning

Unsupervised Machine Learning (Hierarchical Clustering)

Complex patterns in large datasets are hard to find manually. These types of data show non-linear dependencies and contain noise that makes it hard to find statistically significant differences. Th...

Data Mining & Machine Learning

Principal component Analysis (PCA ):Tutorial

Let's begin our journey into the cell line data with a review of Principal Component Analysis (PCA). PCA is a statistical approach from linear algebra that uses a matrix of covariance to find an ef...

Data Mining & Machine Learning

Basic PCA: making a scatterplot of Principle Component Analysis results in Excel

After we run a pipeline to process raw reads from the FASTQ file, we can study the gene expression table. Working with the gene expression table includes understanding our column and row names, as ...

Data Mining & Machine Learning

Results from the RNA-seq Pipeline

The real research and struggle begins when the pipeline is complete. The results obtained from the pipeline are to be processed, normalized and then analyzed to interpret biological insights. It is...