Summary and Schedule

This is a new lesson built with The Carpentries Workbench.

Download files required for the lesson

00h 00m

What are the different choices to consider when planning an RNA-seq experiment?
How does one process the raw fastq files to generate a table with read counts per gene and sample?
Where does one find information about annotated genes for a given organism?
What are the typical steps in an RNA-seq analysis?

01h 40m

2. Downloading and organizing files

What files are required to process raw RNA seq reads?
Where do we obtain raw reads, reference genomes, annotations, and transcript sequences?
How should project directories be organized to support a smooth workflow?
What practical considerations matter when downloading and preparing RNA seq data?

02h 15m

3. Quality control of RNA-seq reads

How do we check the quality of raw RNA-seq reads?
What information do FastQC and MultiQC provide?
How do we decide if trimming is required?
What QC issues are common in Illumina RNA-seq data?

02h 50m

4. A. Genome-based quantification (STAR + featureCounts)

How do we map RNA-seq reads to a reference genome?
How do we determine library strandness before alignment?
How do we quantify reads per gene using featureCounts?
How do we submit mapping jobs on RCAC clusters with SLURM?

04h 00m

5. B. Transcript-based quantification (Salmon)

How do we quantify expression without genome alignment?
What inputs does Salmon require?
How do we run Salmon for paired-end RNA-seq data?
How do we interpret transcript-level outputs (TPM, NumReads)?
How do we summarize transcripts to gene-level counts using tximport?

05h 10m

6. Gene-level QC and differential expression (DESeq2)

How do we explore RNA seq count data before running DESeq2.
How do we restrict analysis to protein coding genes.
How do we perform differential expression with DESeq2.
How do we visualize sample relationships and DE genes.
How do we save a useful DE results table with annotation and expression values.

06h 35m

7. B. Differential expression using DESeq2 (Salmon/Kallisto pathway)

How do we import transcript-level quantification from Salmon or Kallisto into DESeq2?
What exploratory analyses should we perform before differential expression testing?
How do we perform differential expression analysis with DESeq2 using tximport data?
How do we visualize and interpret DE results from transcript-based quantification?
What are the key differences between genome-based and transcript-based DE workflows?

08h 00m

8. Gene set enrichment analysis

What is over-representation analysis and how does it work statistically?
How do we identify functional pathways and biological themes from DE genes?
How do we run enrichment analysis for GO, KEGG, and MSigDB gene sets?
How do we interpret and compare enrichment results across different databases?
What are the limitations and best practices for enrichment analysis?

09h 35m

Finish

The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.

Instructors

Arun Seetharam, Ph.D.: Arun is a lead bioinformatics scientist at Purdue University’s Rosen Center for Advanced Computing. With extensive expertise in comparative genomics, genome assembly, annotation, single-cell genomics, NGS data analysis, metagenomics, proteomics, and metabolomics. Arun supports a diverse range of bioinformatics projects across various organisms, including human model systems.
Michael Carlson, Ph.D.: Michael has a background in computational physics, specifically hypersonic materials. He also leads many introductory workshops in the High-Performance Computing domain.

Schedule

Time	Session
8:30 AM	Arrival & Setup
9:00 AM	Introduction to RNA-seq Analysis: Experimental design, biological replicates, sequencing depth, and overview of the analysis workflow (QC → alignment → quantification → DE)
9:45 AM	Data Preparation & Quality Control: Inspecting raw FASTQ files, running FastQC and MultiQC, trimming with fastp
10:30 AM	Break
10:45 AM	Read Alignment & Quantification: Mapping with STAR, building indices, generating gene-level counts with featureCounts (Salmon covered conceptually only)
12:00 PM	Lunch Break
1:00 PM	Differential Expression Analysis (DESeq2): Importing counts into R, normalization, exploratory plots (VST, distance heatmaps, PCA), and identifying significantly differentially expressed genes
2:15 PM	Break
2:30 PM	Visualization & Interpretation: Volcano plots, heatmaps, PCA review, summary tables. Introduction to gene set enrichment methods (ORA/GSEA) with pointers to explore independently.
3:30 PM	Wrap-Up & Discussion: Review of workflow, troubleshooting common issues, recommended next steps
4:00 PM	End of Workshop

What is not covered

Raw data generation, library preparation, or experimental design optimization
De novo transcriptome assembly (e.g., Trinity) or genome-guided transcript reconstruction
Single-cell RNA-seq or spatial transcriptomics analysis
Alternative splicing, isoform quantification, or long-read transcript analysis
Advanced visualization dashboards or interactive analysis tools (e.g., Shiny, iDEP)

Pre-requisites

Basic understanding of genomics concepts (genes, transcripts, and genome structure)
Familiarity with the command line interface (Linux/Unix shell)
Prior exposure to basic bioinformatics tools and file formats (FASTA, GFF, FASTQ)

Data sets

To copy only the training data:

BASH

rsync -avP /scratch/negishi/aseethar/rnaseq-workshop ${RCAC_SCRATCH}/

A completed version of the workshop data is available at:

/depot/workshop/data/rnaseq-workshop_results

You can copy it to your scratch space using:

BASH

rsync -avP /scratch/negishi/aseethar/rnaseq-workshop/rnaseq-workshop_results ${RCAC_SCRATCH}/

Use this folder only if you are unable to complete the exercises during the workshop.

Software setup

Discussion

Details

SSH key setup for different systems is provided in the expandable sections below. Follow the instructions for your operating system to configure passwordless access.

Windows

Open PowerShell or Git Bash and run:

BASH

ssh-keygen -b 4096 -t rsa
type .ssh\id_rsa.pub | ssh boiler@scholar.rcac.purdue.edu "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"

macOS

Open Terminal and run:

BASH

ssh-keygen -b 4096 -t rsa
cat .ssh/id_rsa.pub | ssh boiler@scholar.rcac.purdue.edu "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"

Linux

Open a terminal and run:

BASH

ssh-keygen -b 4096 -t rsa
cat .ssh/id_rsa.pub | ssh boiler@scholar.rcac.purdue.edu "mkdir -p ~/.ssh; cat >> ~/.ssh/authorized_keys"