Projects and Tutorials

Ebolavirus: Deadly Mutations

User Ratings :

We will study the evolutionary distance between different Ebola virus strains with the help of a sequence repository called NCBI Virus. Search by virus the Ebolavirus, taxid: 186536 and it will show you tables of nucleotide sequences, protein sequences etc that are arranged by accession number, release date, geolocation, sequence length, host and collection date Choose 9 different FASTA sequences on NCBI virus (full genomes, Ebola virus) from 3 different locations and years - 3 samples per site. Find a reference genome as well as references with known dates of collection. Download the FASTA files and make sure your genome sequence title is in GeneBank format (GB).

Search on NCBI Virus for the following accession numbers: KC242795, KC242797, KC242798, KP178538, MH425138, KP240932, KP184503, KP120616, KR025228 (Link for this search is here.) In the NCBI virus platform there is an option of multiple sequence alignment. By clicking on Align the selected samples are aligned against a consensus reference genome. The first three sequences (KC242795, KC242797, KC242798) have more differences from the consensus reference genome in comparison to the other samples which have been obtained more recently. This shows the genetic variations they have accumulated over the evolutionary timescale. The significance in change of nucleotide sequences is best understood by the change in amino acid codons. A change in amino acid sequence causes conformational change in proteins which indirectly or directly affects the protein function.

In the next step of analysis we will sort the 9 sequences in three different Fasta files, each file having the 3 viral sequences from one location.

 

In addition, we will utilize the reference genome NC_002549 (GenBank file) which is the Ebola virus genome sequence obtained in 1976. We also need two Outgroups (GenBank file) which are the Gabon Ebola virus sequence obtained in 2002 and the Liberia Ebola virus sequence obtained in 2014, for evolutionary time approximation.

Dataset

Reference Genome

https://raw.githubusercontent.com/pine-bio-support/Ebola_project/main/Reference_sequence.gb

Samples

UK sequences:

https://raw.githubusercontent.com/pine-bio-support/Ebola_project/main/UK_sequences.fasta

Gabon Sequences

https://raw.githubusercontent.com/pine-bio-support/Ebola_project/main/Gabon_sequences.fasta

Liberia sequences

https://raw.githubusercontent.com/pine-bio-support/Ebola_project/main/Liberia_sequences.fasta

 

Outgroups

https://raw.githubusercontent.com/pine-bio-support/Ebola_project/main/MH425138%20_sequence.gb

https://raw.githubusercontent.com/pine-bio-support/Ebola_project/main/KC242800_sequence.gb

 

After uploading the necessary files. Next, click on “Start” and do the Multiple Sequence Alignment of the nucleotide sequences. After alignment, you need to incorporate the change in charges of amino acids to the alignment by Multiple Sequence Alignment AA algorithm (Amino Acids). The next algorithm determines the codon positions for the phylogenetic analysis. When the 'third codon position' is chosen - MSA is generated for sequences consisting only of the third position of each codon. When '4D position' is chosen - MSA is generated for specific codons, in which the change of the third nucleotide does not alter the amino acid. The reasoning behind this can be further referred to the respective publication (doi:10.1371/journal.pgen.1003527).

The phylogenetic analysis is a mathematical model that accounts for the evolutionary process and tries to relate it with the data that is available. Hence these are statistical approaches that predict how likely a change is to occur. Here, we used the BEAST Speciation Birth-Death Process which is based on the birth-death model, where both birth rate and death rate (of organisms) are present. Afterwards, click on “End” and run the Pipeline using a suitable name. Finally, in the output of this pipeline, you can Identify groups of samples and see how they are related to a “phenotype” (in this case, geolocation or country name).

To learn more about the project, you can visit: https://learn.omicslogic.com/courses/course/project-02-ebolavirus-deadly-mutations