Pipelines
These are the bioinformatics pipelines I maintain and develop.
- High Resolution Shotgun Metagenomics Starting from shotgun metagenomics sequence libraries, this pipeline follows the procedures described in paper in Briefings in Bioinformatics: Tremblay, Schreiber & Greer, 2022.. Briefly, raw reads are controlled for quality and co-assembled. Genes are predicted from the co-assembly and annotated for taxonomy and functions. QCed reads are then mapped back to the co-assembly to estimate contig and gene abundance for each library/sample. Metagenome-Assembled Genomes (MAGs) are generated as well.
- rRNA amplicons (16S/18S/ITS) Starting from amplicon reads, this pipeline follows the procedures of our rRNA paper Tremblay & Yergeau, 2019.
- Whole genome assembly (PacBio) Starting from long PacBio reads, this pipeline will generate a genome assembly. Note that PacBio genome assembly usually achieves ~99.999% accuracy and completness for bacterial genomes.
We need metadata in order to run these pipelines. For the 16S/18S/ITS amplicon pipeline, we need what we call a mapping file of which an example can be found here. It is a simple tab delimited file in which the first column is your unique sample ID and the following columns are variable groups. Each subsequent columns contains variables related to that column. You can have as much columns (variables) as you see fit. These files can easily be created or edited with a simple text editor, Excel or similar software.