Summary and Schedule
This is a new lesson built with The Carpentries Workbench.
| Setup Instructions | Download files required for the lesson | |
| Duration: 00h 00m | 1. Introduction to RNA-seq |
What are the different choices to consider when planning an RNA-seq
experiment? How does one process the raw fastq files to generate a table with read counts per gene and sample? Where does one find information about annotated genes for a given organism? What are the typical steps in an RNA-seq analysis? |
| Duration: 01h 40m | 2. Downloading and organizing files |
What files are required to process raw RNA seq reads? Where do we obtain raw reads, reference genomes, annotations, and transcript sequences? How should project directories be organized to support a smooth workflow? What practical considerations matter when downloading and preparing RNA seq data? |
| Duration: 02h 15m | 3. Quality control of RNA-seq reads |
How do we check the quality of raw RNA-seq reads? What information do FastQC and MultiQC provide? How do we decide if trimming is required? What QC issues are common in Illumina RNA-seq data? |
| Duration: 02h 50m | 4. A. Genome-based quantification (STAR + featureCounts) |
How do we map RNA-seq reads to a reference genome? How do we determine library strandness before alignment? How do we quantify reads per gene using featureCounts? How do we submit mapping jobs on RCAC clusters with SLURM? |
| Duration: 04h 00m | 5. B. Transcript-based quantification (Salmon) |
How do we quantify expression without genome alignment? What inputs does Salmon require? How do we run Salmon for paired-end RNA-seq data? How do we interpret transcript-level outputs (TPM, NumReads)? How do we summarize transcripts to gene-level counts using tximport? |
| Duration: 05h 10m | 6. Gene-level QC and differential expression (DESeq2) |
How do we explore RNA seq count data before running DESeq2. How do we restrict analysis to protein coding genes. How do we perform differential expression with DESeq2. How do we visualize sample relationships and DE genes. How do we save a useful DE results table with annotation and expression values. |
| Duration: 06h 35m | 7. Gene set enrichment analysis |
How do we identify functional pathways and biological themes from DE
genes. How do we run over representation analysis for GO, KEGG, and MSigDB sets. How do we create simple visualizations of enriched terms. How do we interpret enrichment results in a biologically meaningful way. |
| Duration: 07h 55m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Instructors
Arun Seetharam, Ph.D.: Arun is a lead bioinformatics scientist at Purdue University’s Rosen Center for Advanced Computing. With extensive expertise in comparative genomics, genome assembly, annotation, single-cell genomics, NGS data analysis, metagenomics, proteomics, and metabolomics. Arun supports a diverse range of bioinformatics projects across various organisms, including human model systems.
Michael Carlson, Ph.D.: Michael has a background in computational physics, specifically hypersonic materials. He also leads many introductory workshops in the High-Performance Computing domain.
Schedule
| Time | Session |
|---|---|
| 8:30 AM | Arrival & Setup |
| 9:00 AM | Introduction to RNA-seq Analysis: Experimental design, biological replicates, sequencing depth, and overview of the analysis workflow (QC → alignment → quantification → DE) |
| 9:45 AM | Data Preparation & Quality Control: Inspecting raw FASTQ files, running FastQC and MultiQC, trimming with fastp |
| 10:30 AM | Break |
| 10:45 AM | Read Alignment & Quantification: Mapping with STAR, building indices, generating gene-level counts with featureCounts (Salmon covered conceptually only) |
| 12:00 PM | Lunch Break |
| 1:00 PM | Differential Expression Analysis (DESeq2): Importing counts into R, normalization, exploratory plots (VST, distance heatmaps, PCA), and identifying significantly differentially expressed genes |
| 2:15 PM | Break |
| 2:30 PM | Visualization & Interpretation: Volcano plots, heatmaps, PCA review, summary tables. Introduction to gene set enrichment methods (ORA/GSEA) with pointers to explore independently. |
| 3:30 PM | Wrap-Up & Discussion: Review of workflow, troubleshooting common issues, recommended next steps |
| 4:00 PM | End of Workshop |
What is not covered
- Raw data generation, library preparation, or experimental design optimization
- De novo transcriptome assembly (e.g., Trinity) or genome-guided transcript reconstruction
- Single-cell RNA-seq or spatial transcriptomics analysis
- Alternative splicing, isoform quantification, or long-read transcript analysis
- Advanced visualization dashboards or interactive analysis tools (e.g., Shiny, iDEP)
Pre-requisites
- Basic understanding of genomics concepts (genes, transcripts, and genome structure)
- Familiarity with the command line interface (Linux/Unix shell)
- Prior exposure to basic bioinformatics tools and file formats (FASTA, GFF, FASTQ)
Data sets
To copy only the training data:
A completed version of the workshop data is available at:
/depot/workshop/data/rnaseq-workshop_results
You can copy it to your scratch space using:
Use this folder only if you are unable to complete the exercises during the workshop.
Software setup
Details
SSH key setup for different systems is provided in the expandable sections below. Follow the instructions for your operating system to configure passwordless access.