Projects and Tutorials

Data Mining & Machine Learning
Document Video

Unsupervised Machine Learning (K-Mean Clustering)

User Ratings :

Another conventional clustering method is called k-means. In this clustering method, we take a number of clusters k as an input parameter, then randomly select k initial “centroids” in the sample space (as we cluster samples it would be centroids in the feature space). Each sample is attributed to a cluster associated with the closest centroid. After that for each cluster its initial centroid (used in separation of samples into clusters) is replaced by a real centroid (mean) for the current set of samples forming this cluster, and the next iteration starts for each new set of centroids. Using the same dataset with 52 samples and approximately 6900 genes, we will create this simple pipeline in the Unsupervised Learning section of the T-BioInfo Platform.

Navigate to the “Unsupervised learning” area under “data mining” and upload CellLines_ExprData.txt as before; then create your pipeline as shown below:

k-means does not have a visual output but a table of samples with a cluster assigned to each sample. To learn how to analyze the tables obtained, you could draw a PCA plot and look at the clusters, for a better understanding, visit lesson 13: Unsupervised Machine Learning (Clustering)

https://learn.omicslogic.com/Learn/course-5-transcriptomics/lesson/13-t3-unsupervised-machine-learning-clustering on https://learn.omicslogic.com/ and get started with Clustering your data for a better analysis.