Reference
Last updated on 2025-02-16 | Edit this page
Additional Reading
For those interested in further exploring genome assembly and bioinformatics, here are some recommended resources:
Bioinformatics and Functional Genomics - Jonathan Pevsner
A comprehensive textbook covering sequence alignment, genome assembly, and functional genomics. Suitable for both beginners and advanced learners.Ten Steps to Get Started in Genome Assembly and Annotation - Keith Bradnam & Ian Korf
A practical guide that provides a step-by-step approach to genome assembly and annotation, discussing essential concepts, methodologies, and tools.A Comprehensive Review of Scaffolding Methods in Genome Assembly - Xiang Ji et al.
This review article explores various scaffolding techniques used in genome assembly, comparing their effectiveness and discussing associated challenges.
Glossary
assembly
The process of reconstructing a genome sequence from sequencing reads.
contig
A contiguous sequence of DNA assembled from overlapping sequencing reads.
scaffold
A larger DNA sequence constructed by linking contigs using paired-end or long-read sequencing data.
long-read sequencing
A sequencing method that generates long DNA fragments, useful for resolving complex genomic regions.
short-read sequencing
A sequencing method that generates short DNA fragments, commonly used for high-accuracy genome sequencing.
PacBio
A long-read sequencing technology developed by Pacific Biosciences, known for high-accuracy HiFi reads.
Oxford Nanopore
A sequencing technology that uses nanopores to read DNA bases in real time, providing long reads with flexible throughput.
Illumina
A short-read sequencing technology known for high accuracy and low cost, widely used in genomics.
hybrid assembly
A genome assembly approach that combines long and short reads to improve contiguity and accuracy.
N50
A metric used to assess genome assembly quality, representing the contig length at which 50% of the genome is contained in contigs of at least that length.
BUSCO
A benchmarking tool that assesses the completeness of genome assemblies based on conserved single-copy orthologs.
error correction
A preprocessing step to correct sequencing errors before assembly.
polishing
A process of refining an assembly by correcting sequencing errors using high-accuracy reads.
base calling
The computational process of determining DNA bases from raw sequencing signal data.
barcode
A short, unique DNA sequence used to differentiate samples in multiplex sequencing experiments.
de novo assembly
Genome assembly performed without the use of a reference genome.
reference genome
A curated genome sequence used as a standard for alignment and comparison.
consensus sequence
A sequence derived from multiple reads that represents the most likely correct DNA sequence.
variant calling
The identification of genetic variants such as SNPs and indels from sequencing data.
k-mer
A short, fixed-length DNA sequence used in genome assembly and error correction.
phasing
The process of resolving haplotypes in a diploid or polyploid genome.
haplotype
A set of genetic variants inherited together from a single parent.
structural variation
Genomic rearrangements such as insertions, deletions, inversions, and translocations.
optical genome mapping
A technique for creating high-resolution genome maps using fluorescently labeled DNA molecules.
Hi-C
A chromosome conformation capture technique used to improve genome scaffolding by linking distant DNA fragments.
HiFi reads
High-accuracy long reads generated by PacBio sequencing.
contiguity
The measure of how well an assembly represents long, uninterrupted sequences.
heterozygosity
The presence of different alleles at a genetic locus within a genome.
assembly graph
A graph-based representation of sequencing reads and overlaps used in genome assembly.