Summary and Schedule
Welcome to the Genome Assembly Workshop
This workshop provides a hands-on introduction to long-read genome assembly using HiFiasm and Flye, optimized for the RCAC cluster. You’ll learn best practices for assembly, polishing, and scaffolding with Bionano optical maps, along with strategies for quality assessment and troubleshooting.
Designed for researchers and bioinformaticians, this workshop will equip you with the skills to build high-quality, reproducible genome assemblies on RCAC HPC resources.
Let’s get started!
| Setup Instructions | Download files required for the lesson | |
| Duration: 00h 00m | 1. Introduction to Genome assembly |
What is genome assembly, and why is it important? What sequencing technologies can be used for genome assembly? What are de novo and reference-guided assemblies? What challenges arise when generating high-quality assemblies? What software tools are used for assembling genomes? |
| Duration: 00h 30m | 2. Assembly Strategies |
What factors influence the choice of genome assembly strategy? How do different assembly methods compare in terms of read length, accuracy, and computational requirements? What are the key steps in evaluating genome assemblies using BUSCO and QUAST? How do Bionano OGM and Hi-C sequencing improve genome continuity and organization? |
| Duration: 00h 50m | 3. Data Quality Control |
What is data quality checking and filtering? Why is it necessary to assess the quality of raw sequencing data? What are the key steps in filtering long-read sequencing data? How can visualization tools like NanoPlot help in quality assessment? |
| Duration: 01h 50m | 4. PacBio HiFi Assembly using HiFiasm |
What is HiFiasm, and how does it improve genome assembly using PacBio
HiFi reads? What are the key steps in running HiFiasm for haplotype-resolved assembly? How does HiFiasm handle haplotype resolution and purging of duplications? What are the benefits of using HiFiasm for assembling complex and heterozygous genomes? |
| Duration: 02h 50m | 5. Oxford Nanopore Assembly using Flye |
What are the key features of ONT reads? Why is Flye good for assembling ONT reads? What are the main steps in the Flye assembly workflow? How can you evaluate the quality of a Flye assembly? |
| Duration: 03h 50m | 6. Hybrid Long Read Assembly (optional) |
What is hybrid assembly, and how does it combine different sequencing
technologies? How can you perform hybrid assembly using both types of long-read data? What are the key steps in hybrid assembly, including polishing and scaffolding? How do you evaluate the quality of a hybrid assembly using bioinformatics tools? |
| Duration: 04h 35m | 7. Scaffolding using Optical Genome Mapping |
What is Bionano optical genome mapping (OGM) and how does it improve
genome assembly? How does Bionano Solve hybrid scaffolding integrate optical maps with sequence assemblies? What are the key steps involved in running the Bionano Solve pipeline for hybrid scaffolding? How can you assess the quality of hybrid scaffolds generated by Bionano Solve? |
| Duration: 05h 15m | 8. Assembly Assessment |
Why is evaluating genome assembly quality important? What tools can be used to assess assembly completeness, accuracy, and structural integrity? How do you interpret key metrics from assembly evaluation tools? What are the main steps in evaluating a genome assembly using bioinformatics tools? |
| Duration: 06h 05m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Instructors
- Arun Seetharam, Ph.D.: Arun is a lead bioinformatics scientist at Purdue University’s Rosen Center for Advanced Computing. With extensive expertise in comparative genomics, genome assembly, annotation, single-cell genomics, NGS data analysis, metagenomics, proteomics, and metabolomics. Arun supports a diverse range of bioinformatics projects across various organisms, including human model systems.
Schedule
| Time | Session |
|---|---|
| 9:00 AM | Introduction to Genome Assembly – Sequencing technologies, assembly concepts, and workshop overview |
| 9:30 AM | Assembly Strategies – Comparing approaches, evaluation metrics, and resource planning |
| 9:50 AM | Data Quality Control – NanoPlot, Filtlong, KMC, and GenomeScope2 for read QC |
| 10:30 AM | Morning Break |
| 10:45 AM | PacBio HiFi Assembly – HiFiasm assembly, purge levels, GFA conversion, and Flye for HiFi |
| 12:00 PM | Lunch Break |
| 1:00 PM | Oxford Nanopore Assembly – Flye assembler, Medaka polishing, and HiFiasm for ONT |
| 1:45 PM | Hybrid Assembly – Combining ONT + HiFi reads with Flye, Bionano scaffolding |
| 2:30 PM | Afternoon Break |
| 2:45 PM | Scaffolding with Optical Genome Mapping – Bionano Solve for HiFiasm and Flye assemblies |
| 3:15 PM | Assembly Evaluation – QUAST, Compleasm, Merqury, Bandage, and comparative analysis |
| 3:50 PM | Wrap-Up & Discussion – Summary, Q&A, and next steps |
| 4:00 PM | Dismissal |
What is not covered
- Short read assembly
- Hi-C scaffolding
- Annotation
- Comparative analyses
Pre-requisites
- Basic knowledge of genomics
- Basic knowledge of command line interface
- Basic knowledge of bioinformatics tools
Data Sets
To copy the data to your scratch space:
The worked-out results folder is also available at
/depot/workshop/data/genome-assembly/genome-assembly-data
on the training cluster. Only use this if you are unable to finish the
exercises in the workshop.
Software Setup
Details
SSH key setup for different systems is detailed in the expandable sections below.