Scaffolding using Optical Genome Mapping
Last updated on 2025-02-19 | Edit this page
Overview
Questions
- What is Bionano optical genome mapping (OGM) and how does it improve genome assembly?
- How does Bionano Solve hybrid scaffolding integrate optical maps with sequence assemblies?
- What are the key steps involved in running the Bionano Solve pipeline for hybrid scaffolding?
- How can you assess the quality of hybrid scaffolds generated by Bionano Solve?
Objectives
- Understand the principles of Bionano optical genome mapping (OGM) and its role in genome assembly.
- Learn how to run the Bionano Solve hybrid scaffolding pipeline to improve genome assemblies.
- Explore the key steps involved in scaffolding HiFiasm and Flye assemblies using Bionano Solve.
- Evaluate the quality of hybrid scaffolds generated by Bionano Solve and interpret the results.
Introduction to Bionano optical genome mapping (OGM)
Bionano optical mapping is a high-resolution genome analysis technique that generates long-range structural information by labeling and imaging ultra-long DNA molecules. It provides genome-wide maps that can be used to scaffold contigs from sequencing-based assemblies, significantly improving contiguity and structural accuracy. By integrating Bionano maps with assemblies from PacBio HiFi and Oxford Nanopore Technologies (ONT), misassemblies can be corrected, chimeric contigs resolved, and scaffold N50s increased by orders of magnitude. This approach is particularly valuable for complex genomes, where repetitive sequences and structural variations pose challenges for traditional sequencing methods. Bionano hybrid scaffolding has become a standard for enhancing genome assemblies, enabling researchers to achieve high-quality, chromosome-level assemblies efficiently.
Bionano Solve Hybrid Scaffolding
Bionano Solve improves genome assembly by integrating optical genome mapping data with sequence assemblies, generating ultra-long hybrid scaffolds that enhance contiguity and accuracy. The pipeline identifies and resolves assembly conflicts, orders and orients sequence contigs, and estimates gap sizes between adjacent sequences.
The scaffolding workflow involves:
- In Silico Map Generation – Converting sequence assembly into a map format for alignment.
- Conflict Resolution – Aligning in silico maps to Bionano genome maps and identifying misassemblies.
- Hybrid Scaffolding – Merging high-confidence sequence and optical maps into a refined scaffold.
- Final Alignment – Mapping sequence contigs back to the hybrid scaffold for consistency validation.
- Output Generation – Producing final AGP and FASTA files with corrected genome structures.
What is the source of this data?
- The optical genome mapping data was obtained from the project PRJEB50694.
- Data corresponds to Arabidopsis thaliana ecotype Col-0,
generated using Bionano optical genome mapping technology.
- The dataset is publicly available on the European Nucleotide Archive
(ENA) and can be downloaded using:
- The
.cmap
file contains high-resolution optical maps used for scaffolding and structural validation of genome assemblies.
To download:
Installation and Setup
Bionano Solve is available on Bionano.com and can be installed on Linux-based systems. The software requires a valid license and access to Bionano data files for processing.
Custom container with just the hybrid scaffolding tools can be used to run the Bionano Solve pipeline. On Negishi, you can add it to your PATH using the command below:
Running Bionano Solve
To scaffold a genome using Bionano Solve, you need to provide the following input files:
BASH
export PATH=$PATH:/apps/biocontainers/exported-wrappers/bionano/3.8.0
run_hybridscaffold.sh
-c /opt/Solve3.7_10192021_74_1/HybridScaffold/1.0/hybridScaffold_DLE1_config.xml\
-b input.cmap \
-n genome.fasta \
-u CTTAAG \
-z results_output.zip \
-w log.txt \
-B 2 \
-N 2 \
-g \
-f \
-r /opt/Solve3.7_10192021_74_1/RefAligner/1.0/sse/RefAligner \
-p /opt/Solve3.7_10192021_74_1/Pipeline/1.0 \
-o output_dir
Options used
Option | Argument | Description |
---|---|---|
-c |
/opt/Solve3.7_10192021_74_1/HybridScaffold/1.0/hybridScaffold_DLE1_config.xml |
Specifies the hybrid scaffolding configuration file required for the pipeline. |
-b |
input.cmap |
Input Bionano CMAP file, which contains the optical genome map data. |
-n |
genome.fasta |
Input genome sequence in FASTA format from NGS assembly. |
-u |
CTTAAG |
Specifies the sequence of the enzyme recognition site, overriding the one in the config XML file. |
-z |
results_output.zip |
Generates a ZIP archive containing essential output files. |
-w |
log.txt |
Defines the name of the status text file needed for IrysView. |
-B |
2 |
Conflict filter level: 2 means cut the contig at
conflict points (required if not using -M ). |
-N |
2 |
Conflict filter level: 2 means cut the contig at
conflict points (same as -B , applied to sequencing
contigs). |
-g |
(No argument) | Enables trimming of overlapping NGS sequences during AGP and FASTA export. |
-f |
(No argument) | Forces output generation and overwrites any existing files in the output directory. |
-r |
/opt/Solve3.7_10192021_74_1/RefAligner/1.0/sse/RefAligner |
Specifies the path to the RefAligner program, which is required for scaffolding. |
-p |
/opt/Solve3.7_10192021_74_1/Pipeline/1.0 |
Specifies the directory for the de novo assembly pipeline (optional,
required for -x ). |
-o |
output_dir |
Defines the output folder where scaffolded results will be stored. |
Scffolding HiFiasm assembly with Bionano Solve
For HiFiasm assembly, you need to provide the HiFiasm assembly FASTA file as input to the Bionano Solve pipeline. The command structure remains the same, with the only change being the input sequence file.
BASH
export PATH=$PATH:/apps/biocontainers/exported-wrappers/bionano/3.8.0
run_hybridscaffold.sh \
-c /opt/Solve3.7_10192021_74_1/HybridScaffold/1.0/hybridScaffold_DLE1_config.xml\
-b Evry.OpticalMap.Col-0.cmap \
-n hifiasm_60x/athaliana_hifi.asm.bp.p_ctg.fasta \
-u CTTAAG \
-z results_bionano_hifiasm_scaffolding.zip \
-w log.txt \
-B 2 \
-N 2 \
-g \
-f \
-r /opt/Solve3.7_10192021_74_1/RefAligner/1.0/sse/RefAligner \
-p /opt/Solve3.7_10192021_74_1/Pipeline/1.0 \
-o bionano_hifiasm_scaffolding
Scffolding Flye assembly with Bionano Solve
For Flye assembly, the process is similar to HiFiasm, but you need to provide the Flye assembly FASTA file instead of the HiFiasm assembly. The command structure remains the same, with the only change being the input sequence file.
BASH
export PATH=$PATH:/apps/biocontainers/exported-wrappers/bionano/3.8.0
run_hybridscaffold.sh \
-c /opt/Solve3.7_10192021_74_1/HybridScaffold/1.0/hybridScaffold_DLE1_config.xml\
-b workshop_assembly/col-0_bionano/Evry.OpticalMap.Col-0.cmap \
-n flye_ont_60x/assembly.fasta \
-u CTTAAG \
-z results_bionano_flye_scaffolding.zip \
-w log.txt \
-B 2 \
-N 2 \
-g \
-f \
-r /opt/Solve3.7_10192021_74_1/RefAligner/1.0/sse/RefAligner \
-p /opt/Solve3.7_10192021_74_1/Pipeline/1.0 \
-o bionano_flye_scaffolding
Understanding Hybrid Scaffolding Output
The output of the Bionano Solve pipeline includes scaffolded genome assemblies in AGP and FASTA formats, along with alignment and conflict resolution information. The hybrid scaffolds provide a more accurate representation of the genome structure, with improved contiguity and reduced misassemblies. The output files can be visualized using genome browsers or alignment viewers to assess the quality and completeness of the assembly.
Folder | Contents |
---|---|
agp_fasta/ | Final scaffolded genome assembly in FASTA, AGP format, alignment results, gap information, and logs. |
align0/ | Initial alignment of optical maps to the sequence assembly, including XMAP, CMAP, and error logs. |
align1/ | Secondary alignment refinement, similar to align0 but after resolving initial inconsistencies. |
align_final/ | Final alignment results of hybrid scaffolds to optical maps, including mapping rates and statistics. |
assignAlignType/ | Tracks conflicts between NGS contigs and optical maps, includes exclusion and trimming decisions. |
cut_conflicts/ | Stores files related to contig trimming and conflict resolution between NGS and optical maps. |
fa2cmap/ | Converts the sequence assembly into Bionano’s CMAP format before integration with optical maps. |
hybrid_scaffolds/ | Contains final scaffolded genome with CMAP, XMAP, AGP files, and a scaffolding report. |
mergeNGS_BN/ | Stores intermediate files merging NGS contigs with Bionano maps, including hybrid scaffold progress. |
results_output.zip | Compressed archive containing essential scaffolding results for easy sharing. |
Within hybrid_scaffolds
, the files ending with
HYBRID_SCAFFOLD.fasta
and
HYBRID_SCAFFOLD_NOT_SCAFFOLDED.fasta
represent the final
scaffolded genome and unplaced contigs, respectively. You will need to
merge these files to obtain your final scaffolded genome assembly.
Quality Assessment of Hybrid Scaffolds
The hybird scaffolds report file will be in the
hybrid_scaffolds
directory and will provide a summary of
the scaffolding process, including alignment statistics, conflict
resolution, and scaffold N50 values. This report is essential for
evaluating the quality and completeness of the hybrid scaffolds and
identifying any potential issues that need further investigation.
Category | PacBio HiFi (hifiasm) | ONT (Flye) |
---|---|---|
Original BioNano Genome Map | ||
Count | 18 | 18 |
Min length (Mbp) | 0.342 | 0.342 |
Median length (Mbp) | 3.956 | 3.956 |
Mean length (Mbp) | 7.396 | 7.396 |
N50 length (Mbp) | 15.529 | 15.529 |
Max length (Mbp) | 17.518 | 17.518 |
Total length (Mbp) | 133.124 | 133.124 |
Original NGS Sequences | ||
Count | 152 | 43 |
Min length (Mbp) | 0.027 | 0.008 |
Median length (Mbp) | 0.050 | 0.276 |
Mean length (Mbp) | 0.896 | 2.797 |
N50 length (Mbp) | 7.981 | 9.261 |
Max length (Mbp) | 13.758 | 14.609 |
Total length (Mbp) | 136.156 | 120.259 |
Conflict Resolution (BNG-NGS Alignment) | ||
Conflict cuts made to Bionano maps | 2 | 0 |
Conflict cuts made to NGS sequences | 30 | 0 |
Bionano maps to be cut | 2 | 0 |
NGS sequences to be cut | 18 | 0 |
NGS FASTA Sequence in Hybrid Scaffold | ||
Count | 40 | 26 |
Min length (Mbp) | 0.033 | 0.065 |
Median length (Mbp) | 0.945 | 1.681 |
Mean length (Mbp) | 2.689 | 4.558 |
N50 length (Mbp) | 8.437 | 9.261 |
Max length (Mbp) | 13.484 | 14.609 |
Total length (Mbp) | 107.558 | 118.508 |
Hybrid Scaffold FASTA | ||
Count | 11 | 12 |
Min length (Mbp) | 0.104 | 0.524 |
Median length (Mbp) | 11.824 | 12.426 |
Mean length (Mbp) | 10.518 | 9.891 |
N50 length (Mbp) | 14.479 | 14.886 |
Max length (Mbp) | 15.227 | 16.188 |
Total length (Mbp) | 115.698 | 118.689 |
Hybrid Scaffold FASTA + Not Scaffolded NGS | ||
Count | 161 | 33 |
Min length (Mbp) | 0.024 | 0.006 |
Median length (Mbp) | 0.051 | 0.159 |
Mean length (Mbp) | 0.896 | 3.650 |
N50 length (Mbp) | 14.083 | 14.886 |
Max length (Mbp) | 15.227 | 16.188 |
Total length (Mbp) | 144.295 | 120.440 |
Which assembler and data performed better?
- The HiFiasm assembly with PacBio HiFi data resulted in a higher N50 length and total length in the hybrid scaffold compared to the Flye assembly with ONT data.
- The conflict resolution process involved more cuts in the NGS sequences for the HiFiasm assembly, indicating a higher level of alignment discrepancies.
- The final hybrid scaffold from the HiFiasm assembly had a higher N50 length and total length, suggesting better contiguity and completeness compared to the Flye assembly.
Key Points
- Bionano optical genome mapping (OGM) provides long-range structural information for scaffolding genome assemblies.
- Bionano Solve hybrid scaffolding integrates optical maps with sequence assemblies to improve contiguity and accuracy.
- The Bionano Solve pipeline involves in silico map generation, conflict resolution, hybrid scaffolding, and final alignment.
- The output of Bionano Solve includes scaffolded genome assemblies in AGP and FASTA formats, alignment results, and conflict resolution information.
- Quality assessment of hybrid scaffolds involves evaluating alignment statistics, conflict resolution, scaffold N50 values, and completeness of the assembly.