Scaffolding using Optical Genome Mapping

Last updated on 2025-02-19 | Edit this page

Estimated time: 12 minutes

Overview

Questions

  • What is Bionano optical genome mapping (OGM) and how does it improve genome assembly?
  • How does Bionano Solve hybrid scaffolding integrate optical maps with sequence assemblies?
  • What are the key steps involved in running the Bionano Solve pipeline for hybrid scaffolding?
  • How can you assess the quality of hybrid scaffolds generated by Bionano Solve?

Objectives

  • Understand the principles of Bionano optical genome mapping (OGM) and its role in genome assembly.
  • Learn how to run the Bionano Solve hybrid scaffolding pipeline to improve genome assemblies.
  • Explore the key steps involved in scaffolding HiFiasm and Flye assemblies using Bionano Solve.
  • Evaluate the quality of hybrid scaffolds generated by Bionano Solve and interpret the results.

Introduction to Bionano optical genome mapping (OGM)


Bionano optical mapping is a high-resolution genome analysis technique that generates long-range structural information by labeling and imaging ultra-long DNA molecules. It provides genome-wide maps that can be used to scaffold contigs from sequencing-based assemblies, significantly improving contiguity and structural accuracy. By integrating Bionano maps with assemblies from PacBio HiFi and Oxford Nanopore Technologies (ONT), misassemblies can be corrected, chimeric contigs resolved, and scaffold N50s increased by orders of magnitude. This approach is particularly valuable for complex genomes, where repetitive sequences and structural variations pose challenges for traditional sequencing methods. Bionano hybrid scaffolding has become a standard for enhancing genome assemblies, enabling researchers to achieve high-quality, chromosome-level assemblies efficiently.

Overview of Bionano OGM
Overview of Bionano OGM

Bionano Solve Hybrid Scaffolding


Bionano Solve improves genome assembly by integrating optical genome mapping data with sequence assemblies, generating ultra-long hybrid scaffolds that enhance contiguity and accuracy. The pipeline identifies and resolves assembly conflicts, orders and orients sequence contigs, and estimates gap sizes between adjacent sequences.

The scaffolding workflow involves:

  1. In Silico Map Generation – Converting sequence assembly into a map format for alignment.
  2. Conflict Resolution – Aligning in silico maps to Bionano genome maps and identifying misassemblies.
  3. Hybrid Scaffolding – Merging high-confidence sequence and optical maps into a refined scaffold.
  4. Final Alignment – Mapping sequence contigs back to the hybrid scaffold for consistency validation.
  5. Output Generation – Producing final AGP and FASTA files with corrected genome structures.
Scaffolding using Bionano
Scaffolding using Bionano

What is the source of this data?

  • The optical genome mapping data was obtained from the project PRJEB50694.
  • Data corresponds to Arabidopsis thaliana ecotype Col-0, generated using Bionano optical genome mapping technology.
  • The dataset is publicly available on the European Nucleotide Archive (ENA) and can be downloaded using:
  • The .cmap file contains high-resolution optical maps used for scaffolding and structural validation of genome assemblies.

To download:

BASH

wget ftp://ftp.sra.ebi.ac.uk/vol1/analysis/ERZ227/ERZ2272299/Evry.OpticalMap.Col-0.cmap.gz
gunzip Evry.OpticalMap.Col-0.cmap.gz

Installation and Setup


Bionano Solve is available on Bionano.com and can be installed on Linux-based systems. The software requires a valid license and access to Bionano data files for processing.

Custom container with just the hybrid scaffolding tools can be used to run the Bionano Solve pipeline. On Negishi, you can add it to your PATH using the command below:

BASH

export PATH=$PATH:/apps/biocontainers/exported-wrappers/bionano/3.8.0

Running Bionano Solve


To scaffold a genome using Bionano Solve, you need to provide the following input files:

BASH

export PATH=$PATH:/apps/biocontainers/exported-wrappers/bionano/3.8.0
run_hybridscaffold.sh
  -c /opt/Solve3.7_10192021_74_1/HybridScaffold/1.0/hybridScaffold_DLE1_config.xml\
  -b input.cmap \
  -n genome.fasta \
  -u CTTAAG \
  -z results_output.zip \
  -w log.txt \
  -B 2 \
  -N 2 \
  -g \
  -f \
  -r /opt/Solve3.7_10192021_74_1/RefAligner/1.0/sse/RefAligner \
  -p /opt/Solve3.7_10192021_74_1/Pipeline/1.0 \
  -o output_dir

Options used

Option Argument Description
-c /opt/Solve3.7_10192021_74_1/HybridScaffold/1.0/hybridScaffold_DLE1_config.xml Specifies the hybrid scaffolding configuration file required for the pipeline.
-b input.cmap Input Bionano CMAP file, which contains the optical genome map data.
-n genome.fasta Input genome sequence in FASTA format from NGS assembly.
-u CTTAAG Specifies the sequence of the enzyme recognition site, overriding the one in the config XML file.
-z results_output.zip Generates a ZIP archive containing essential output files.
-w log.txt Defines the name of the status text file needed for IrysView.
-B 2 Conflict filter level: 2 means cut the contig at conflict points (required if not using -M).
-N 2 Conflict filter level: 2 means cut the contig at conflict points (same as -B, applied to sequencing contigs).
-g (No argument) Enables trimming of overlapping NGS sequences during AGP and FASTA export.
-f (No argument) Forces output generation and overwrites any existing files in the output directory.
-r /opt/Solve3.7_10192021_74_1/RefAligner/1.0/sse/RefAligner Specifies the path to the RefAligner program, which is required for scaffolding.
-p /opt/Solve3.7_10192021_74_1/Pipeline/1.0 Specifies the directory for the de novo assembly pipeline (optional, required for -x).
-o output_dir Defines the output folder where scaffolded results will be stored.

Scffolding HiFiasm assembly with Bionano Solve

For HiFiasm assembly, you need to provide the HiFiasm assembly FASTA file as input to the Bionano Solve pipeline. The command structure remains the same, with the only change being the input sequence file.

BASH

export PATH=$PATH:/apps/biocontainers/exported-wrappers/bionano/3.8.0
run_hybridscaffold.sh \
  -c /opt/Solve3.7_10192021_74_1/HybridScaffold/1.0/hybridScaffold_DLE1_config.xml\
  -b Evry.OpticalMap.Col-0.cmap \
  -n hifiasm_60x/athaliana_hifi.asm.bp.p_ctg.fasta \
  -u CTTAAG \
  -z results_bionano_hifiasm_scaffolding.zip \
  -w log.txt \
  -B 2 \
  -N 2 \
  -g \
  -f \
  -r /opt/Solve3.7_10192021_74_1/RefAligner/1.0/sse/RefAligner \
  -p /opt/Solve3.7_10192021_74_1/Pipeline/1.0 \
  -o bionano_hifiasm_scaffolding

Scffolding Flye assembly with Bionano Solve

For Flye assembly, the process is similar to HiFiasm, but you need to provide the Flye assembly FASTA file instead of the HiFiasm assembly. The command structure remains the same, with the only change being the input sequence file.

BASH

export PATH=$PATH:/apps/biocontainers/exported-wrappers/bionano/3.8.0
run_hybridscaffold.sh \
  -c /opt/Solve3.7_10192021_74_1/HybridScaffold/1.0/hybridScaffold_DLE1_config.xml\
  -b workshop_assembly/col-0_bionano/Evry.OpticalMap.Col-0.cmap \
  -n flye_ont_60x/assembly.fasta \
  -u CTTAAG \
  -z results_bionano_flye_scaffolding.zip \
  -w log.txt \
  -B 2 \
  -N 2 \
  -g \
  -f \
  -r /opt/Solve3.7_10192021_74_1/RefAligner/1.0/sse/RefAligner \
  -p /opt/Solve3.7_10192021_74_1/Pipeline/1.0 \
  -o bionano_flye_scaffolding

Understanding Hybrid Scaffolding Output


The output of the Bionano Solve pipeline includes scaffolded genome assemblies in AGP and FASTA formats, along with alignment and conflict resolution information. The hybrid scaffolds provide a more accurate representation of the genome structure, with improved contiguity and reduced misassemblies. The output files can be visualized using genome browsers or alignment viewers to assess the quality and completeness of the assembly.

Folder Contents
agp_fasta/ Final scaffolded genome assembly in FASTA, AGP format, alignment results, gap information, and logs.
align0/ Initial alignment of optical maps to the sequence assembly, including XMAP, CMAP, and error logs.
align1/ Secondary alignment refinement, similar to align0 but after resolving initial inconsistencies.
align_final/ Final alignment results of hybrid scaffolds to optical maps, including mapping rates and statistics.
assignAlignType/ Tracks conflicts between NGS contigs and optical maps, includes exclusion and trimming decisions.
cut_conflicts/ Stores files related to contig trimming and conflict resolution between NGS and optical maps.
fa2cmap/ Converts the sequence assembly into Bionano’s CMAP format before integration with optical maps.
hybrid_scaffolds/ Contains final scaffolded genome with CMAP, XMAP, AGP files, and a scaffolding report.
mergeNGS_BN/ Stores intermediate files merging NGS contigs with Bionano maps, including hybrid scaffold progress.
results_output.zip Compressed archive containing essential scaffolding results for easy sharing.

Within hybrid_scaffolds, the files ending with HYBRID_SCAFFOLD.fasta and HYBRID_SCAFFOLD_NOT_SCAFFOLDED.fasta represent the final scaffolded genome and unplaced contigs, respectively. You will need to merge these files to obtain your final scaffolded genome assembly.

Quality Assessment of Hybrid Scaffolds


The hybird scaffolds report file will be in the hybrid_scaffolds directory and will provide a summary of the scaffolding process, including alignment statistics, conflict resolution, and scaffold N50 values. This report is essential for evaluating the quality and completeness of the hybrid scaffolds and identifying any potential issues that need further investigation.

Category PacBio HiFi (hifiasm) ONT (Flye)
Original BioNano Genome Map
Count 18 18
Min length (Mbp) 0.342 0.342
Median length (Mbp) 3.956 3.956
Mean length (Mbp) 7.396 7.396
N50 length (Mbp) 15.529 15.529
Max length (Mbp) 17.518 17.518
Total length (Mbp) 133.124 133.124
Original NGS Sequences
Count 152 43
Min length (Mbp) 0.027 0.008
Median length (Mbp) 0.050 0.276
Mean length (Mbp) 0.896 2.797
N50 length (Mbp) 7.981 9.261
Max length (Mbp) 13.758 14.609
Total length (Mbp) 136.156 120.259
Conflict Resolution (BNG-NGS Alignment)
Conflict cuts made to Bionano maps 2 0
Conflict cuts made to NGS sequences 30 0
Bionano maps to be cut 2 0
NGS sequences to be cut 18 0
NGS FASTA Sequence in Hybrid Scaffold
Count 40 26
Min length (Mbp) 0.033 0.065
Median length (Mbp) 0.945 1.681
Mean length (Mbp) 2.689 4.558
N50 length (Mbp) 8.437 9.261
Max length (Mbp) 13.484 14.609
Total length (Mbp) 107.558 118.508
Hybrid Scaffold FASTA
Count 11 12
Min length (Mbp) 0.104 0.524
Median length (Mbp) 11.824 12.426
Mean length (Mbp) 10.518 9.891
N50 length (Mbp) 14.479 14.886
Max length (Mbp) 15.227 16.188
Total length (Mbp) 115.698 118.689
Hybrid Scaffold FASTA + Not Scaffolded NGS
Count 161 33
Min length (Mbp) 0.024 0.006
Median length (Mbp) 0.051 0.159
Mean length (Mbp) 0.896 3.650
N50 length (Mbp) 14.083 14.886
Max length (Mbp) 15.227 16.188
Total length (Mbp) 144.295 120.440

Which assembler and data performed better?

  • The HiFiasm assembly with PacBio HiFi data resulted in a higher N50 length and total length in the hybrid scaffold compared to the Flye assembly with ONT data.
  • The conflict resolution process involved more cuts in the NGS sequences for the HiFiasm assembly, indicating a higher level of alignment discrepancies.
  • The final hybrid scaffold from the HiFiasm assembly had a higher N50 length and total length, suggesting better contiguity and completeness compared to the Flye assembly.

Key Points

  • Bionano optical genome mapping (OGM) provides long-range structural information for scaffolding genome assemblies.
  • Bionano Solve hybrid scaffolding integrates optical maps with sequence assemblies to improve contiguity and accuracy.
  • The Bionano Solve pipeline involves in silico map generation, conflict resolution, hybrid scaffolding, and final alignment.
  • The output of Bionano Solve includes scaffolded genome assemblies in AGP and FASTA formats, alignment results, and conflict resolution information.
  • Quality assessment of hybrid scaffolds involves evaluating alignment statistics, conflict resolution, scaffold N50 values, and completeness of the assembly.