A Comprehensive Guide to Hybrid Assembly Pipeline for Genomic Sequencing

3 minute read

Microbes, life’s unseen workhorses, hold immense potential for bioremediation, medicine, and understanding our planet. Yet, their intricate workings remain largely a mystery. This is where microbial genomics steps in, offering a powerful tool to decode their genetic language.

1. Introduction

Genomic sequencing has revolutionized our understanding of microbial diversity and function. In this guide, we’ll walk through a hybrid assembly pipeline, combining long reads (ONT) and short reads (Illumina), using various tools for quality control, assembly, polishing, and assessment.

2. Requirements

1. Using Anaconda for Package Management

Anaconda is a powerful package manager and environment manager that simplifies the process of installing, managing, and updating software packages. It is particularly useful for managing bioinformatics tools and their dependencies.

Download and install Anaconda from the official website.

Create a new conda environment for your project:

conda create --name myenv
conda activate myenv

Before diving into the pipeline, ensure you have the necessary tools installed:

2. Quality Control:

3. Assembly:

4. Polishing:

5. Assembly Assessment:

3. Hybrid Assembly Pipeline

Step 1: Quality Control

For Long Reads (LR - ONT):

NanoPlot --fastq LR_input.fastq --N50 --verbose --outdir 1-NanoPlot_LR_raw/ -t 8

For Short Reads (SR - Illumina):

Step 2: Filter Long Reads

filtlong -1 SR_input_1.fastq -2 SR_input_2.fastq --min_length 1000 --keep_percent 90 LR_input.fastq > LR_filtered.fastq
NanoPlot --fastq LR_filtered.fastq --N50 --verbose --outdir 2-NanoPlot_LR_filtered/ -t 8

Step 3: Long Reads Assembly using Flye

python3 Flye/bin/flye --nano-corr LR_filtered.fastq --out-dir Flye/ --threads 8 --scaffold -g 6m

Step 4: First Polishing using Medaka

conda activate medaka
medaka_consensus -i LR_filtered.fastq -d Flye/assembly.fasta -o Polish1/ -m r941_min_fast_g303 -t 8

Step 5: Second Polishing using Polipolish

mkdir Polish2
bwa index Polish1/consensus.fasta
bwa mem -t 8 -a Polish1/consensus.fasta SR_input_1.fastq > Polish2/alignments_1.sam
bwa mem -t 8 -a Polish1/consensus.fasta SR_input_2.fastq > Polish2/alignments_2.sam
polypolish_insert_filter.py --in1 Polish2/alignments_1.sam --in2 Polish2/alignments_2.sam --out1 Polish2/filtered_1.sam --out2 Polish2/filtered_2.sam
polypolish Polish1/consensus.fasta Polish2/filtered_1.sam Polish2/filtered_2.sam > final_assembly.fasta

Step 6: Assembly Quality Assessment

conda activate busco
busco -i final_assembly.fasta -l bacteria_odb10 -m genome -o busco_final_assembly
quast.py -o quast_final_assembly -t 8 final_assembly.fasta

4. Annotation for 16S rRNA using Prokka and RAST

For further annotation, consider using tools like Prokka and RAST to identify and annotate features such as 16S rRNA genes. These tools can provide additional insights into the functional elements of your assembled genome.

Prokka

Prokka is a versatile tool for bacterial genome annotation. It predicts protein-coding genes, rRNAs, tRNAs, and other features.

conda install -c conda-forge -c bioconda prokka
prokka --outdir prokka_annotation --prefix final_assembly final_assembly.fasta

Prokka will generate annotation files in the specified directory, providing detailed information about the genomic features.

RAST

Visit the RAST website to submit your genome for annotation. RAST offers a user-friendly interface for functional annotation of bacterial and archaeal genomes.

5. Bottom line

This hybrid assembly pipeline, coupled with annotation tools like Prokka and RAST, empowers researchers to unravel the intricacies of microbial genomes. The combination of long and short reads, along with rigorous quality control and assessment, ensures a reliable and accurate representation of genomic information.