Summary and Schedule
Welcome to the Genome Assembly Workshop
This workshop provides a hands-on introduction to long-read genome assembly using HiFiasm and Flye, optimized for the RCAC cluster. You’ll learn best practices for assembly, polishing, and scaffolding with Bionano optical maps, along with strategies for quality assessment and troubleshooting.
Designed for researchers and bioinformaticians, this workshop will equip you with the skills to build high-quality, reproducible genome assemblies on RCAC HPC resources.
Let’s get started!
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. Introduction to Genome assembly |
What is genome assembly, and why is it important? What sequencing technologies can be used for genome assembly? What are de novo and reference-guided assemblies? What challenges arise when generating high-quality assemblies? What software tools are used for assembling genomes? |
Duration: 00h 12m | 2. Assembly Strategies |
What factors influence the choice of genome assembly strategy? How do different assembly methods compare in terms of read length, accuracy, and computational requirements? What are the key steps in evaluating genome assemblies using BUSCO and QUAST? How do Bionano OGM and Hi-C sequencing improve genome continuity and organization? |
Duration: 00h 24m | 3. Data Quality Control |
What is data quality checking and filtering? Why is it necessary to assess the quality of raw sequencing data? What are the key steps in filtering long-read sequencing data? How can visualization tools like NanoPlot help in quality assessment? |
Duration: 00h 36m | 4. PacBio HiFi Assembly using HiFiasm |
What is HiFiasm, and how does it improve genome assembly using PacBio
HiFi reads? What are the key steps in running HiFiasm for haplotype-resolved assembly? How does HiFiasm handle haplotype resolution and purging of duplications? What are the benefits of using HiFiasm for assembling complex and heterozygous genomes? |
Duration: 00h 48m | 5. Oxford Nanopore Assembly using Flye |
What are the key features of ONT reads? Why is Flye good for assembling ONT reads? What are the main steps in the Flye assembly workflow? How can you evaluate the quality of a Flye assembly? |
Duration: 01h 00m | 6. Hybrid Long Read Assembly (optional) |
What is hybrid assembly, and how does it combine different sequencing
technologies? How can you perform hybrid assembly using both types of long-read data? What are the key steps in hybrid assembly, including polishing and scaffolding? How do you evaluate the quality of a hybrid assembly using bioinformatics tools? |
Duration: 01h 12m | 7. Scaffolding using Optical Genome Mapping |
What is Bionano optical genome mapping (OGM) and how does it improve
genome assembly? How does Bionano Solve hybrid scaffolding integrate optical maps with sequence assemblies? What are the key steps involved in running the Bionano Solve pipeline for hybrid scaffolding? How can you assess the quality of hybrid scaffolds generated by Bionano Solve? |
Duration: 01h 24m | 8. Assembly Assessment |
Why is evaluating genome assembly quality important? What tools can be used to assess assembly completeness, accuracy, and structural integrity? How do you interpret key metrics from assembly evaluation tools? What are the main steps in evaluating a genome assembly using bioinformatics tools? |
Duration: 01h 36m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Instructors
Arun Seetharam, Ph.D.: Arun is a lead bioinformatics scientist at Purdue University’s Rosen Center for Advanced Computing. With extensive expertise in comparative genomics, genome assembly, annotation, single-cell genomics, NGS data analysis, metagenomics, proteomics, and metabolomics. Arun supports a diverse range of bioinformatics projects across various organisms, including human model systems.
Charles Christoffer, Ph.D.: Charles is a Senior Computational Scientist at Purdue University’s Rosen Center for Advanced Computing. He has a Ph.D. in Computer Science in the area of structural bioinformatics and has extensive experience in protein structure prediction.
Schedule
Time | Session |
---|---|
8:30 AM | Arrival & Setup |
9:00 AM | Introduction & UNIX/HPC refresher – Cluster setup and essential UNIX commands for assembly workflows |
10:30 AM | Break |
10:40 AM | Introduction to Genome Assembly – Overview of long-read assembly strategies, challenges, and tools |
11:00 AM | Genome Assembly with HiFiasm/Flye – Running HiFiasm on RCAC clusters, parameter selection, and best practices |
12:00 PM | Lunch Break |
1:00 PM | Hybrid Assembly (ONT + PacBio) and scaffolding – Combining long-read technologies for improved assembly accuracy, and scaffolding with Bionano optical maps |
2:50 PM | Break |
3:10 PM | Assembly Evaluation & Visualization – QC metrics, polishing |
4:30 PM | Wrap-Up & Discussion – Troubleshooting, Q&A, and next steps |
What is not covered
- Short read assembly
- Hi-C scaffolding
- Annotation
- Comparative analyses
Pre-requisites
- Basic knowledge of genomics
- Basic knowledge of command line interface
- Basic knowledge of bioinformatics tools
Data Sets
To copy only data:
The worked out folder is available at
/depot/workshop/data/genome-assembly/genome-assembly-data
on the training cluster. You can copy the data to your scratch space
using the following command:
Only use this if you are unable to finish the exercises in the workshop.
Software Setup
Details
SSH key setup for different systems is detailed in the expandable sections below.