Metagenomics Data Analysis is the study of microbial communities in their original communities. Approaches to study these communities allow for various levels of resolution and functional annotation. Metagenomics sequencing includes amplicon, whole metagenome, metatranscriptome and metaproteome/metabolome resolution.
The most commonly used 16S rRNA sequencing can be used to study microbiome composition and identify over-represented species of microorganisms.
Processing Metagenomic Analysis
About DADA2 and QIIME2 Methods:
The DADA2 pipeline is based on running a number of programs, including DADA2, Ape, and Phyloseq algorithms. DADA2 generates amplicon sequence variant (ASV) tables, which are similar to OTU tables but detailed in that they tabulate the number of identical amplicon sequence variants from different samples. Microbial studies utilizing DADA2 provide high resolution accurately reconstructed amplicon sequences that improve the detection of sample diversity and biological variants.
QIIME2 is an open-source bioinformatics pipeline for performing microbiome analysis from raw DNA sequencing data. QIIME is designed to take users from raw sequencing data generated on the Illumina or other platforms through publication quality graphics and statistics. This includes demultiplexing and quality filtering, OTU picking, taxonomic assignment, and phylogenetic reconstruction, and diversity analyses and visualizations. QIIME has been applied to studies based on billions of sequences from tens of thousands of samples.
DADA 2 Pipeline
The DADA2 pipeline implements a complete pipeline to turn paired-end fastq files from the sequencer into merged, denoised, chimera-free, inferred sample sequences. In this analysis, we will use all of the steps that have been combined together to form the DADA2 pipeline. This pipeline includes several features that will correct amplicon sequence errors generated in the Illumina reads (“Reads” are the nucleotide output, as called, or “read” by the Illumina sequencing machine, and its software). After cleaning the reads, DADA combines complementary paired-end (double-stranded) reads (pair merge to create a single sequence from the two sequenced strands) to use as OTUs. We then look for consensus OTUs by comparing them with the reference database (assign taxon), including bacterial name and its higher-level classification or taxonomy. By using the DADA2 algorithm, fine-scale variations are identified and a quality-based model is produced. We will end up with merged, denoised, chimera-free, inferred sample sequences. DADA2 analysis will filter input data, dereplicate, providing abundances of unique sequence data and an ASV table, a denoised, high-resolution output file.
QIIME 2 Pipeline
Advances in the analysis of amplicon sequence datasets have introduced a methodological shift in how research teams investigate microbial biodiversity, away from sequence identity-based clustering (producing Operational Taxonomic Units, OTUs) to denoising methods (producing amplicon sequence variants, ASVs). Qiime2 is a computing environment for processing and analyzing amplicon library data. Typically, these amplicon libraries are generated by targeting the 16S rRNA gene in a prokaryotic community in order to gain insight to the taxa present and their relative abundances. This enabled studying higher resolution of bacterial communities. QIIME2 plugins in the pipeline on the T-Bioinfo server exist for latest-generation tools for sequence quality control from different sequencing platforms, DADA2 and Deblur, taxonomy assignment and phylogenetic insertion, which quantitatively improve the results. The platform provides many interactive visualization tools facilitating exploratory analyses and result reporting.
DADA2 Results
Results obtained from the DADA2 pipeline include:
- Microbial Abundance Taxonomy Barplot (Relative & Proportionate) – Indicating the richness & abundance of taxa identified (The view can be filtered based on different taxonomy classification)
- 2D & 3D PCA Plots to visualize the cluster separation between the samples of the dataset.
- Alpha Diversity Measure: That summarizes the structure of an ecological community with respect to its richness (number of taxonomic groups), evenness (distribution of abundances of the groups), or both. The two different measures involved are Chao (estimates the number of species) & Shannon (estimates effective number of species). Phyloseq plot based on OTU abundance (for different samples in the groups under study.
QIIME 2 Results
- Along with the DADA2 pipeline results (Taxonomic barplots, PCA & Alpha diversity measure), QIIME2 pipeline also provides list of results mentioned below:
- Alpha Diversity Boxplot: Determine if an environment has been sequenced to a sufficient depth through Evenness-group-significance & faith-pd-group-significance plots
- Qiime2 Remove Eukaryota: Generates an interactive bar plot of the taxa present in the samples, as determined by the taxonomic classification algorithm and reference sequence set used earlier. Bars can be aggregated at the desired taxonomic level and sorted by abundance of a specific taxonomic group or by metadata groupings. Color schemes can also be changed interactively,and plots and legends can be saved in vector graphic format.
- Group Significance Plots
- Alpha Group Significance: Boxplots of the alpha-diversity values and significant differences between groups are assessed with the Kruskal-Wallis test.
- Beta Group Significance: The beta-diversity command uses boxplots to visualize the distance between samples aggregated by groups specified in the metadata table file. Significant differences are assessed using a PERMANOVA analysis or optionally with ANOSIM.
- Qiime 2 Coremetrics: An array of alpha- and beta-diversity measures can be generated with a single command with QIIME2. The qiime diversity core-metrics-phylogenetic command will produce both phylogenetic and non-phylogenetic diversity measures, as well as alpha- and beta-diversity measures. The qiime diversity coremetrics-phylogenetic command generates Jaccard distance (qualitative measure of community dissimilarity) Bray-Curtis distance (quantitative measure of community dissimilarity) Unweighted UniFrac distance (qualitative measure of community dissimilarity that incorporates phylogenetic relationships between the features) Weighted UniFrac distance (quantitative measure of community dissimilarity that incorporates phylogenetic relationships between the features)