Projects and Tutorials

Data Mining & Machine Learning
Document Video

Supervised Machine Learning(Decision Tree and Random Forest)

User Ratings :

Supervised Machine Learning is an algorithm that takes in data that is labeled – typically this is prepared by people who annotate the dataset. In biomedical projects, the annotation could be clinical data or other “phenotypic” data that describes what class each sample is associated with. That’s why supervised machine learning is also called “classification”.

 

Binary Decision Trees

One such classification algorithm is called “binary decision trees”. Each decision tree is created by rules. Binary means the branch can be either yes, or no – if the expression of a gene is higher than X, the classes will separate.

 

Random Forest

A natural extension of the decision tree algorithm is an algorithm called “Random Forest”. Random Forest is a technique that uses decision trees to analyze smaller portions of the full dataset in a process of random initialization (“bagging” – random selection of a subset of data) and voting. Different portions of data (or samples in our case) can be separated using different thresholds of various genes. 

 

To learn more about the concept in greater detail with step wise pipeline instructions and parameters, visit Lesson 14: Transcriptomics on OmicsLogic Learn Portal : https://learn.omicslogic.com/Learn/course-5-transcriptomics/lesson/14-t3-supervised-machine-learning

 

In this lesson, you will learn about supervised Machine learning methods, including LDA, Random Forest, SVM, and Naive Bayes, etc.