Summary and Setup
Welcome to the Genome Assembly Workshop
This workshop provides a hands-on introduction to long-read genome assembly using HiFiasm and Flye, optimized for the RCAC cluster. You’ll learn best practices for assembly, polishing, and scaffolding with Bionano optical maps, along with strategies for quality assessment and troubleshooting.
Designed for researchers and bioinformaticians, this workshop will equip you with the skills to build high-quality, reproducible genome assemblies on RCAC HPC resources.
Let’s get started!
Instructors
Arun Seetharam, Ph.D.: Arun is a lead bioinformatics scientist at Purdue University’s Rosen Center for Advanced Computing. With extensive expertise in comparative genomics, genome assembly, annotation, single-cell genomics, NGS data analysis, metagenomics, proteomics, and metabolomics. Arun supports a diverse range of bioinformatics projects across various organisms, including human model systems.
Charles Christoffer, Ph.D.: Charles is a Senior Computational Scientist at Purdue University’s Rosen Center for Advanced Computing. He has a Ph.D. in Computer Science in the area of structural bioinformatics and has extensive experience in protein structure prediction.
Schedule
Time | Session |
---|---|
8:30 AM | Arrival & Setup |
9:00 AM | Introduction & UNIX/HPC refresher – Cluster setup and essential UNIX commands for assembly workflows |
10:30 AM | Break |
10:40 AM | Introduction to Genome Assembly – Overview of long-read assembly strategies, challenges, and tools |
11:00 AM | Genome Assembly with HiFiasm/Flye – Running HiFiasm on RCAC clusters, parameter selection, and best practices |
12:00 PM | Lunch Break |
1:00 PM | Hybrid Assembly (ONT + PacBio) and scaffolding – Combining long-read technologies for improved assembly accuracy, and scaffolding with Bionano optical maps |
2:50 PM | Break |
3:10 PM | Assembly Evaluation & Visualization – QC metrics, polishing |
4:30 PM | Wrap-Up & Discussion – Troubleshooting, Q&A, and next steps |
What is not covered
- Short read assembly
- Hi-C scaffolding
- Annotation
- Comparative analyses
Pre-requisites
- Basic knowledge of genomics
- Basic knowledge of command line interface
- Basic knowledge of bioinformatics tools
Data Sets
To copy only data:
The worked out folder is available at
/depot/workshop/data/genome-assembly/genome-assembly-data
on the training cluster. You can copy the data to your scratch space
using the following command:
Only use this if you are unable to finish the exercises in the workshop.
Software Setup
Details
SSH key setup for different systems is detailed in the expandable sections below.