Hybrid Long Read Assembly (optional)
Last updated on 2025-02-19 | Edit this page
Estimated time: 12 minutes
Overview
Questions
- What is hybrid assembly, and how does it combine different sequencing technologies?
- How can you perform hybrid assembly using both types of long-read data?
- What are the key steps in hybrid assembly, including polishing and scaffolding?
- How do you evaluate the quality of a hybrid assembly using bioinformatics tools?
Objectives
- Understand the concept of hybrid assembly and its advantages in genome sequencing.
- Learn how to perform hybrid assembly using long-read sequencing data from PacBio and ONT platforms.
- Explore the key steps involved in hybrid assembly, including polishing and scaffolding.
- Evaluate the quality of a hybrid assembly using bioinformatics tools such as QUAST and Compleasm.
Flye for Hybrid Assembly
For hybrid assembly using flye
, first, run the pipeline
with all your reads in the –pacbio-raw mode (you can specify multiple
files, no need to merge all you reads into one). Also add –iterations 0
to stop the pipeline before polishing. Once the assembly finishes, run
polishing using either PacBio or ONT reads only. Use the same assembly
options, but add –resume-from polishing. Here is an example of a script
that should do the job:
BASH
ml --force purge
ml biocontainers
ml flye
# reads
PBREADS="At_pacbio-hifi-filtered.fastq"
ONTREADS="At_ont-reads-filtered.fastq"
# round 1
flye \
--pacbio-raw $PBREADS $ONTREADS \
--iterations 0 \
--out-dir hybrid_flye_out \
--genome-size 135m \
--threads ${SLURM_CPUS_ON_NODE}
# round 2
flye \
--pacbio-raw $PBREADS \
--resume-from polishing \
--out-dir hybrid_flye_out \
--genome-size 135m \
--threads ${SLURM_CPUS_ON_NODE}
Evaluating Assembly Quality
For quick evaluation, we will run quast
and
compleasm
on the hybrid assembly output.
Scaffolding with Bionano
To scaffold the assembly using Bionano data, we will use the
bionano solve
. We can run the following script to scaffold
the assembly:
BASH
ml --force purge
export PATH=$PATH:/apps/biocontainers/exported-wrappers/bionano/3.8.0
fasta="hybrid_flye_out/assembly.fasta"
run_hybridscaffold.sh \
-c /opt/Solve3.7_10192021_74_1/HybridScaffold/1.0/hybridScaffold_DLE1_config.xml\
-b workshop_assembly/col-0_bionano/Evry.OpticalMap.Col-0.cmap \
-n ${fasta} \
-u CTTAAG \
-z results_bionano_hybrid_scaffolding.zip \
-w log.txt \
-B 2 \
-N 2 \
-g \
-f \
-r /opt/Solve3.7_10192021_74_1/RefAligner/1.0/sse/RefAligner \
-p /opt/Solve3.7_10192021_74_1/Pipeline/1.0 \
-o bionano_hybrid_scaffolding
Once this completes, you can generate the final scaffold-level
assembly by merging placed and unplaced contigs in the
bionano_hybrid_scaffolding/hybrid_scaffolds
directory.
BASH
cd bionano_hybrid_scaffolding/hybrid_scaffolds
cat *HYBRID_SCAFFOLD.fasta *_HYBRID_SCAFFOLD_NOT_SCAFFOLDED.fasta \
> ../../assembly_scaffolds.fasta
cd ../..
You can evaluate the final assembly using quast
and
compleasm
as before.
Hybrid Assembly Summary
In this section you have learned how to perform a hybrid assembly using Flye, polish the assembly, scaffold it using Bionano, and evaluate the final assembly quality. This workflow combines the advantages of ONT and PacBio sequencing, improves structural accuracy with Bionano scaffolding, and ensures a high-quality genome assembly. The steps involved are:
-
Run Hybrid Assembly with Flye
- Use Flye in
--pacbio-raw
mode to assemble both PacBio and ONT reads. - Set
--iterations 0
to stop before polishing.
- Use Flye in
-
Polishing the Assembly
- Polish the assembly using either PacBio or ONT
reads by resuming Flye with
--resume-from polishing
.
- Polish the assembly using either PacBio or ONT
reads by resuming Flye with
-
Evaluate Assembly Quality
- Run QUAST for basic assembly metrics (contig count,
N50, genome size, misassemblies).
- Use Compleasm to assess genome completeness based on conserved single-copy genes.
- Run QUAST for basic assembly metrics (contig count,
N50, genome size, misassemblies).
-
Scaffold Assembly with Bionano Optical Mapping
- Use Bionano Solve to integrate optical maps and
scaffold the assembly.
- Run
run_hybridscaffold.sh
with the reference.cmap
optical map file.
- Use Bionano Solve to integrate optical maps and
scaffold the assembly.
-
Generate Final Scaffold-Level Assembly
- Merge placed and unplaced contigs from the Bionano scaffolding output to create the final genome assembly.
-
Final Evaluation of Scaffolds
- Re-run QUAST and Compleasm to validate improvements and ensure genome completeness after scaffolding.
Key Points
- Hybrid assembly with Flye combines ONT and PacBio reads to leverage
long-read continuity and high-accuracy sequencing, with separate
polishing steps to refine base-level errors.
- Assembly quality assessment using QUAST and Compleasm provides
critical insights into contiguity, completeness, and potential
misassemblies, ensuring reliability before scaffolding.
- Bionano Optical Genome Mapping (OGM) improves hybrid assemblies by scaffolding contigs, resolving misassemblies, and enhancing genome continuity, leading to chromosome-scale assemblies.
- Final scaffolding validation and quality assessment ensure the integrity of the genome assembly, with QUAST and Compleasm used to confirm improvements after Bionano integration.