Projects and Tutorials

Genomics
Document Video

COVID-19 Origin & Pathogenesis of SARS-COV2

User Ratings :

COVID19 stands for “CoronaVirus Disease” and 19 is the year this pandemic started, 2019. The pathogen causing this disease is the SARS-COV-2, Severe Acute Respiratory Syndrome Coronavirus 2. In this project we will explore this pathogen coronavirus itself, as well as the disease caused by it. We will also look into its origin and how much it is related with already existing human coronavirus strains. Next, we will also see what are important components of the virus through which they enter into the human cell. We will also look into how much it is different (in terms mutations) from other coronavirus strains. At last, we will see how we can identify important signatures that can significantly different under therapeutic (drug) treatment and untreated conditions.

Overview

The main question that needs to be addressed during any viral outbreaks is about understanding its origin and identity. The answer of this question acts as a foundation to implement instant practical measures and potential planning for the management of viral outbreak. This helps to specify the virus and detect it quickly so that the pandemic can be contained through the development of drugs and vaccines. The phylogenetic analysis is most helpful among all the techniques to determine the relationship between the existing and previously sequenced viruses. Further multiple sequence alignment will help us to understand the genomics variations in the genomes. Besides, transcriptomics profiling exploratory data analysis of the infected patients help us to elucidate the transcriptomics biomarkers. Thus, this project is designed to explore the key bioinformatics methods that can be used to understand the origin, pathogenesis and to identify important signatures for viral pandemic like COVID-19. Here, we will explore publicly available data associated with the COVID-19 global pandemic. These include databases of genomic data, data formats, as well as key experimental and clinical terminology used in data collection. We will also discuss and see how analytical methods like multiple sequence alignment (MSA) and phylogenetic analysis can be used to study evolutionary relationships between different viral genomes and help interpret genomic variability. Next, we will also learn what are important components of SARS-COV2 pathogenesis, or disease development. This includes SARS-COV-2 genome organization, its genes, and the viral proteins that it produces using host cell gene transcription and translation machinery. At the end, we will look at how we can identify transcriptomics signatures that are capable of distinguishing between drug (Ruxolitinib) treated and untreated samples. 

Keeping this background of COVID-19 Pandemic in the mind, in this project, we designed following five major questions and will address them implementing various bioinformatics techniques:

 

Objective1: SARS-COV2 Genomics

  • Data availability, collection and sharing
  • Types of data and how is it generated/produced
  • Resources: NGS (Next Generation Sequencing) of Viral samples

Objective 2: Zoonotic Origin and Analysis of Genomic Variation

  • Phylogenetic trees - evolutionary analysis
  • Nucleotide Sequence similarity and variability
  • Mutation and Recombination

Objective 3:Comparison with existing Coronavirus strains

  • Phylogenetic trees - evolutionary analysis
  • Nucleotide Sequence similarity and variability

Objective 4: SARS-COV2 Pathogenesis

  • Covid-19 Pathogenesis: Cell entry and replication
  • SARS-COV-2 Genome structure: genes, structural proteins and polyproteins
  • Spike protein and its functional domain
  • Genomic variation and protein function

Objective 5: Elucidation of Transcriptomics signature for Ruxolitinib Drug

  • Overview of Ruxolitinib Drug
  • Transcriptomics Dataset
  • Exploratory Data Analysis: Principal Component analysis (PCA)
  • Differential Gene Expression analysis
  • Unsupervised Machine learning methods: PCA and H-clust
  • Biological Interpretation of signatures

Finding Data on the SARS-COV2 Genome

There are different publicly available resources/repositories that can provide genomics data of SARS-COV2. Besides, there are different tools/softwares that can be employed to visualize and analyze data. Some of them are mentions below.

NCBI Virus

NCBI Virus is one of the major repositories from which we can obtain sequence data of the SARS-COV2 virus and genomics sequences of other coronaviruses strains. Users can obtain full and partial nucleotide and protein sequences. Further, users can specify various filters like host, source, geographical region, sequence completeness, etc.

URL for SARS-COV-2 Data: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049

Important Tools for Visualization/Analysis

Users must install these tools for visualization purposes in this project.

1.FigTree: For Phylogenetic Tree

FigTree is designed as a graphical viewer of phylogenetic trees and as a program for producing publication-ready figures.

2.UGENE: For multiple sequence Alignment</li>

Unipro UGENE is a multiplatform open-source software to manage, analyze and visualize their data. UGENE integrates widely used bioinformatics tools within a common user interface. It provides visualization modules for biological objects such as annotated genome sequences, Next Generation Sequencing (NGS) assembly data, multiple sequence alignments, phylogenetic trees and 3D structures.

3.ChimeraX: For Structure

UCSF ChimeraX (or simply ChimeraX) is the next-generation molecular visualization program from the Resource for Biocomputing, Visualization, and Informatics (RBVI)

To learn more about the project, you can visit https://learn.omicslogic.com/courses/course/project-01-covid-19-origin-and-pathogenesis-of-sars-cov2