Summary and Setup

Welcome to the Genome Assembly Workshop


This workshop provides a hands-on introduction to long-read genome assembly using HiFiasm and Flye, optimized for the RCAC cluster. You’ll learn best practices for assembly, polishing, and scaffolding with Bionano optical maps, along with strategies for quality assessment and troubleshooting.

Designed for researchers and bioinformaticians, this workshop will equip you with the skills to build high-quality, reproducible genome assemblies on RCAC HPC resources.

Let’s get started!

Instructors


  1. Arun Seetharam, Ph.D.: Arun is a lead bioinformatics scientist at Purdue University’s Rosen Center for Advanced Computing. With extensive expertise in comparative genomics, genome assembly, annotation, single-cell genomics, NGS data analysis, metagenomics, proteomics, and metabolomics. Arun supports a diverse range of bioinformatics projects across various organisms, including human model systems.

  2. Charles Christoffer, Ph.D.: Charles is a Senior Computational Scientist at Purdue University’s Rosen Center for Advanced Computing. He has a Ph.D. in Computer Science in the area of structural bioinformatics and has extensive experience in protein structure prediction.

Schedule


Time Session
8:30 AM Arrival & Setup
9:00 AM Introduction & UNIX/HPC refresher – Cluster setup and essential UNIX commands for assembly workflows
10:30 AM Break
10:40 AM Introduction to Genome Assembly – Overview of long-read assembly strategies, challenges, and tools
11:00 AM Genome Assembly with HiFiasm/Flye – Running HiFiasm on RCAC clusters, parameter selection, and best practices
12:00 PM Lunch Break
1:00 PM Hybrid Assembly (ONT + PacBio) and scaffolding – Combining long-read technologies for improved assembly accuracy, and scaffolding with Bionano optical maps
2:50 PM Break
3:10 PM Assembly Evaluation & Visualization – QC metrics, polishing
4:30 PM Wrap-Up & Discussion – Troubleshooting, Q&A, and next steps

What is not covered


  1. Short read assembly
  2. Hi-C scaffolding
  3. Annotation
  4. Comparative analyses

Pre-requisites


  1. Basic knowledge of genomics
  2. Basic knowledge of command line interface
  3. Basic knowledge of bioinformatics tools

Data Sets


To copy only data:

BASH

rsync -avP /depot/workshop/data/genome-assembly/genome-assembly-data $RCAC_SCRATCH

The worked out folder is available at /depot/workshop/data/genome-assembly/genome-assembly-data on the training cluster. You can copy the data to your scratch space using the following command:

BASH

rsync -avP /depot/workshop/data/genome-assembly/genome-assembly-data $RCAC_SCRATCH

Only use this if you are unable to finish the exercises in the workshop.

Software Setup


Details

SSH key setup for different systems is detailed in the expandable sections below.

Open a terminal and run:

SH

ssh-keygen -b 4096 -t rsa
type .ssh\id_rsa.pub | ssh trainXX@negishi.rcac.purdue.edu "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"

Open Terminal and run

SH

ssh-keygen -b 4096 -t rsa
cat .ssh/id_rsa.pub | ssh trainXX@negishi.rcac.purdue.edu "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"

Open a terminal and run:

SH

ssh-keygen -b 4096 -t rsa
cat .ssh/id_rsa.pub | ssh trainXX@negishi.rcac.purdue.edu "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"