Precision Medicine promises to revolutionize the way patients are treated by using precise, molecular information that provides reliable indicators on effectiveness of treatment. This is useful both for drug development and for patient diagnosis to identify precise subgroups of patients. In this project, we will leverage cell lines from human breast tumors and a study where multi-omics data from cell lines was integrated with a response to various cancer treatments. As a result, you will learn to analyze various omics data types, integrate them and associate them with a phenotype (response to treatment) using sophisticated machine learning algorithms. To study the informative multi-omics features in an integrated way, this projects gives a perspective on drug screening and on precise diagnosis of patients.
Cell lines have been used to study for decades to study the function of specific cells, and in drug discovery cell line panels are regularly used to screen for compounds. Many therapeutic compound candidates must be tested for efficacy and toxicity to achieve the most benefit while limiting the side effects of treatment.
Key Concepts:
Molecular Profiling: Gene panels used to determine cancer subtype, such as PAM50 that is used to determine breast cancer subtypes from gene expression data. This profiling sometimes relies on genomics data (mutations) and sometimes on transcriptomics data (gene/isoform expression). This project will look at those types of features as well as combinations of features that have more predictive power for therapeutic efficacy.
Precision Medicine: Precision Medicine is an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person
Predicting Therapeutic Response: A major limitation of precision medicine is it’s limiting factor of more traditional therapeutics that are designed to work for anyone with a broad disease definition. Therefore, precise identification of patients that will respond to a therapeutic has been a major challenge in the clinical trials that are a major step in getting a therapeutic into clinical use. Methods of predicting response are critical for getting more precision therapeutics to patients.
Levels of molecular regulation: Multiomics means a new biological analyses approach where the data sets are multiple omes such as genome, proteome, transcriptome, epigenome, and microbiome. By combining these “-omes” into a set of “-omes”, one can analyze the complex big data efficiently enough to find biomarkers easily.
Machine Learning for Biomedical Data: Machine learning today was born from pattern recognition and the theory that computers can learn without being programmed to perform specific tasks. The iterative aspect of machine learning is important because as models are exposed to new data, they are able to independently adapt. They learn from previous computations to produce reliable, repeatable decisions and results.
This project uses data presented by Daemen et al. (Modeling precision treatment of breast cancer, Genome Biol. 2013. The researchers tested 90 therapeutic compounds on 70 breast cancer cell lines (out of 84 lines comprising their collection) and determined GI50 — concentrations required to inhibit cell growth by 50%. GI50 can be treated as a measure of efficacy of a given compound for a given breast cancer cell line. In addition to a table with GI50 values, Daemen et al. deposited results of RNA-Seq (GSE4821) for 56 cell lines, including 52 for which GI50 data was also available. Finally, the authors specified an associated breast cancer subtype for each cell line. For this course, we will be using data from those 52 cell lines for which GI50 data was available.
To learn how to perform unsupervised analysis on the cell lines data, please visit: https://learn.omicslogic.com/Learn/project-05-modeling-cancer-precision-medicine/lesson/02-unsupervised-machine-learning-analysis
In this lesson we will use different clustering methods to cluster data based on similarities and dissimilarities in the data.