Reference

Last updated on 2025-02-16 | Edit this page

Additional Reading


For those interested in further exploring genome assembly and bioinformatics, here are some recommended resources:

  1. Bioinformatics and Functional Genomics - Jonathan Pevsner
    A comprehensive textbook covering sequence alignment, genome assembly, and functional genomics. Suitable for both beginners and advanced learners.

  2. Ten Steps to Get Started in Genome Assembly and Annotation - Keith Bradnam & Ian Korf
    A practical guide that provides a step-by-step approach to genome assembly and annotation, discussing essential concepts, methodologies, and tools.

  3. A Comprehensive Review of Scaffolding Methods in Genome Assembly - Xiang Ji et al.
    This review article explores various scaffolding techniques used in genome assembly, comparing their effectiveness and discussing associated challenges.

Glossary


assembly

The process of reconstructing a genome sequence from sequencing reads.

contig

A contiguous sequence of DNA assembled from overlapping sequencing reads.

scaffold

A larger DNA sequence constructed by linking contigs using paired-end or long-read sequencing data.

long-read sequencing

A sequencing method that generates long DNA fragments, useful for resolving complex genomic regions.

short-read sequencing

A sequencing method that generates short DNA fragments, commonly used for high-accuracy genome sequencing.

PacBio

A long-read sequencing technology developed by Pacific Biosciences, known for high-accuracy HiFi reads.

Oxford Nanopore

A sequencing technology that uses nanopores to read DNA bases in real time, providing long reads with flexible throughput.

Illumina

A short-read sequencing technology known for high accuracy and low cost, widely used in genomics.

hybrid assembly

A genome assembly approach that combines long and short reads to improve contiguity and accuracy.

N50

A metric used to assess genome assembly quality, representing the contig length at which 50% of the genome is contained in contigs of at least that length.

BUSCO

A benchmarking tool that assesses the completeness of genome assemblies based on conserved single-copy orthologs.

error correction

A preprocessing step to correct sequencing errors before assembly.

polishing

A process of refining an assembly by correcting sequencing errors using high-accuracy reads.

base calling

The computational process of determining DNA bases from raw sequencing signal data.

barcode

A short, unique DNA sequence used to differentiate samples in multiplex sequencing experiments.

de novo assembly

Genome assembly performed without the use of a reference genome.

reference genome

A curated genome sequence used as a standard for alignment and comparison.

consensus sequence

A sequence derived from multiple reads that represents the most likely correct DNA sequence.

variant calling

The identification of genetic variants such as SNPs and indels from sequencing data.

k-mer

A short, fixed-length DNA sequence used in genome assembly and error correction.

phasing

The process of resolving haplotypes in a diploid or polyploid genome.

haplotype

A set of genetic variants inherited together from a single parent.

structural variation

Genomic rearrangements such as insertions, deletions, inversions, and translocations.

optical genome mapping

A technique for creating high-resolution genome maps using fluorescently labeled DNA molecules.

Hi-C

A chromosome conformation capture technique used to improve genome scaffolding by linking distant DNA fragments.

HiFi reads

High-accuracy long reads generated by PacBio sequencing.

contiguity

The measure of how well an assembly represents long, uninterrupted sequences.

heterozygosity

The presence of different alleles at a genetic locus within a genome.

assembly graph

A graph-based representation of sequencing reads and overlaps used in genome assembly.