International Conference on Advances in Next Generation Sequencing


Hybrid Genome Assembly with Multi-Platform NGS Data

Farhat Habib, Scientist, IISER

Hybrid genome assembly refers to utilizing data from multiple sequencing platforms to assemble a genome from short and long fragmented, sequenced DNA resulting from shotgun sequencing. Genome assembly presents one of the most challenging tasks in genome sequencing as most modern DNA sequencing technologies can only produce reads that are, on average, ~100-1000 base pairs in length. Genomes are many magnitudes larger than even the longest reads that can be generated with long-read technologies and the task of putting the reads back together into a genome is complex. Among the challenges is that the genomes often contain multiple repeats that can account for a significant part of the genome for eukaryotic organisms. These repeats can be long enough that second generation sequencing reads are not long enough to bridge the repeat and, as such, resolving the  repeat structure in the genome can be difficult. Using a hybrid approach also allows one to correct for the biases and errors of one platform with information from a complementary platform. In this talk, I will illustrate some hybrid approaches and the approach taken with the assembly of the Hydra vulgaris genome which is a complex genome with a large repeat content.


