Hybrid Genome Assembly with Multi-Platform NGS Data
Farhat Habib, Scientist, IISER
Hybrid genome assembly refers to utilizing data from multiple
sequencing platforms to assemble a genome from short and long fragmented,
sequenced DNA resulting from shotgun sequencing. Genome assembly presents one
of the most challenging tasks in genome sequencing as most modern DNA
sequencing technologies can only produce reads that are, on average, ~100-1000
base pairs in length. Genomes are many magnitudes larger than even the longest
reads that can be generated with long-read technologies and the task of putting
the reads back together into a genome is complex. Among the challenges is that
the genomes often contain multiple repeats that can account for a significant
part of the genome for eukaryotic organisms. These repeats can be long enough
that second generation sequencing reads are not long enough to bridge the
repeat and, as such, resolving the repeat structure in the genome can be
difficult. Using a hybrid approach also allows one to correct for the biases
and errors of one platform with information from a complementary platform. In
this talk, I will illustrate some hybrid approaches and the approach taken with
the assembly of the Hydra vulgaris genome which is a complex genome with a
large repeat content.
|
|