Prerequisites
- RCAC cluster account (apply here)
- Basic command-line familiarity (cd, ls, mkdir, nano/vim)
- Access to Negishi, Gautschi, or Bell
Prerequisites
What you will learn
This guide covers the practical skills you need to run bioinformatics software on RCAC clusters. RCAC deploys bioinformatics tools as BioContainers (pre-built Apptainer containers) accessed through the Lmod module system. For tools not in the RCAC collection, you can use Conda environments or pull your own containers. Most production work runs as batch jobs through the SLURM scheduler.
By the end of this guide you will be able to find any bioinformatics tool on RCAC, run it correctly, and submit efficient batch jobs.
You need a terminal session on the cluster before running any commands. There are two options:
negishi with the cluster name (e.g., gateway.gautschi.rcac.purdue.edu, gateway.bell.rcac.purdue.edu, gateway.gilbreth.rcac.purdue.edu).ssh <boilerid>@negishi.rcac.purdue.eduReplace <boilerid> with your Purdue career account username.
On Gautschi, use gautschi.rcac.purdue.edu.
RCAC uses the Lmod module system to manage software. Bioinformatics tools are deployed as pre-built BioContainers (Apptainer containers) and accessed through the biocontainers module.
First, load the biocontainers module to make bioinformatics tools visible:
module --force purgemodule load biocontainersmodule spider samtoolsmodule spider searches all modules, including those not yet visible. It shows available versions and any prerequisite modules.
To get loading instructions for a specific version:
module spider samtools/1.21To list all available biocontainer modules:
module availmodule --force purgemodule load biocontainers samtools/1.21The biocontainers module unlocks all bioinformatics software. You must load it before any tool module becomes visible.
Use module --force purge (not just module purge) to remove sticky modules like xalt that use a newer glibc and conflict with containerized tools. Here is what happens if you skip --force:
module load biocontainers bwabwa/bin/sh: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.33' not found (required by /apps/external/apps/xalt3/xalt/xalt/lib64/libxalt_init.so)/bin/sh: /lib/x86_64-linux-gnu/libc.so.6: version `GLIBC_2.34' not found (required by /apps/external/apps/xalt3/xalt/xalt/lib64/libxalt_init.so)The fix is to always start with module --force purge:
module --force purgemodule load biocontainers bwabwaAfter loading, run the tool as usual — the output will appear as if the tool is installed natively.
Behind the scenes, RCAC creates shell functions that wrap each tool in an apptainer/singularity container call. When you type bwa, the function runs singularity run <container.sif> bwa for you.
Because the tools are containerized, which and type will show the shell function that wraps the container call, not the actual executable:
which bwabwa (){ /usr/bin/singularity run /apps/biocontainers/images/quay.io_biocontainers_bwa:0.7.17--h5bf99c6_8.sif env LANG=C.UTF-8 bwa "$@"}type bwabwa is a functionbwa (){ /usr/bin/singularity run /apps/biocontainers/images/quay.io_biocontainers_bwa:0.7.17--h5bf99c6_8.sif env LANG=C.UTF-8 bwa "$@"}For most use cases this does not matter — the function handles everything transparently. However, if a pipeline or workflow checks the executable path (e.g., which bwa to verify the installation), it will get the function definition instead of a file path. In that case, you may need to either:
$BIOC_IMAGE_DIR environment variableLoading the biocontainers module sets the $BIOC_IMAGE_DIR environment variable, which points to the directory containing all container images:
echo $BIOC_IMAGE_DIR# /apps/biocontainers/imagesYou can use this to run containers directly with singularity run or apptainer exec when you need more control (e.g., custom bind mounts, GPU flags, or piping between containerized tools):
singularity run ${BIOC_IMAGE_DIR}/quay.io_biocontainers_bwa:0.7.17--h5bf99c6_8.sif bwa mem ref.fa reads.fqmodule listsamtools --versionmodule --force purgeUse module --force purge at the top of every SLURM script and whenever you hit a module conflict. The --force flag is important because it also removes sticky modules (like xalt) that a plain module purge would leave behind.
If you try to load two modules built with different compiler toolchains, Lmod will refuse with an error. The fix:
module --force purgemodule load biocontainers samtools/1.21 bwa-mem2/2.2.1Loading both in a single command lets Lmod resolve the dependency tree.
All bioinformatics modules on RCAC are already containerized via BioContainers (see Finding Software with Modules above). However, if a tool is not in the RCAC collection, you can pull your own container.
module spider after loading biocontainersIf a tool is not in the biocontainers collection, pull it from a container registry:
cd ${RCAC_SCRATCH}apptainer pull docker://quay.io/biocontainers/bwa:0.7.18--he4a0461_1This creates a .sif file in the current directory. Run commands inside it with apptainer exec:
apptainer exec bwa_0.7.18--he4a0461_1.sif bwaRCAC auto-binds /home, /scratch, /depot, and /tmp into containers. For data in non-standard locations, bind manually:
apptainer exec --bind /my/custom/path container.sif <command>Conda is useful for niche Python or R packages and tools with complex dependency trees that are not available as modules or containers.
module --force purgemodule load condaconda create -n multiqc_env -c bioconda -c conda-forge multiqc=1.25 -yconda activate multiqc_envmultiqc --versionConda environments are large (often 2—10 GB). Your Home directory is only ~25 GB.
Redirect Conda storage to Scratch by creating a .condarc file:
pkgs_dirs: - /scratch/negishi/${USER}/.conda/pkgsenvs_dirs: - /scratch/negishi/${USER}/.conda/envschannels: - conda-forge - bioconda - defaultsauto_activate_base: falseThen create the directories:
mkdir -p /scratch/negishi/${USER}/.conda/pkgsmkdir -p /scratch/negishi/${USER}/.conda/envsUse this decision table to pick the right method:
| Step | Action | Command |
|---|---|---|
| 1 | Load biocontainers | module --force purge && module load biocontainers |
| 2 | Search for the tool | module spider <toolname> |
| 3 | If found | module load biocontainers <tool>/<version> |
| 4 | If not, search Conda | conda search -c bioconda <toolname> |
| 5 | If found in Conda | conda create -n <env> -c bioconda -c conda-forge <tool>=<ver> |
| 6 | If not found anywhere | Pull a Docker/Apptainer container or build from source |
| Biocontainers (Modules) | Conda | Custom Container | |
|---|---|---|---|
| Maintained by | RCAC | You | You |
| Install effort | None | Medium | High |
| Reproducibility | Excellent (immutable image) | Fragile (solver can change) | Excellent |
| Storage cost | None | High (2—10 GB per env) | Medium (0.5—2 GB per image) |
| Speed | Near-native | Native | Near-native |
| Updates | RCAC manages | You manage | You manage |
| Best for | Most bioinformatics tools | Niche packages, R/Python envs | Full control, custom builds |
Login nodes are for editing files and submitting jobs. All computation should happen on compute nodes through SLURM.
For quick testing, request an interactive session:
sinteractive -A <account-name> -n 4 -N 1 --time=1:00:00This gives you a shell on a compute node where you can load modules and test commands. Type exit when done.
A SLURM batch script has three parts:
#!/bin/bash#SBATCH directives: resource requests parsed by SLURM (not executed by bash)#!/bin/bash#SBATCH --job-name=bwa_align#SBATCH --account=<account-name>#SBATCH --partition=<partition-name>#SBATCH --nodes=1#SBATCH --ntasks=1#SBATCH --cpus-per-task=16#SBATCH --time=04:00:00#SBATCH --mem=32G#SBATCH --output=%x_%j.out#SBATCH --error=%x_%j.err
module --force purgemodule load biocontainers bwa-mem2/2.2.1 samtools/1.21
WORKDIR=/scratch/negishi/${USER}/alignment_projectREF=${WORKDIR}/ref/genome.faR1=${WORKDIR}/fastq/sample_R1.fastq.gzR2=${WORKDIR}/fastq/sample_R2.fastq.gzOUTDIR=${WORKDIR}/bam
mkdir -p ${OUTDIR}
bwa-mem2 mem \ -t ${SLURM_CPUS_ON_NODE} \ -R "@RG\tID:sample\tSM:sample\tPL:ILLUMINA\tLB:lib1" \ ${REF} ${R1} ${R2} \ | samtools sort -@ 4 -m 2G -o ${OUTDIR}/sample.sorted.bam -
samtools index ${OUTDIR}/sample.sorted.bamsamtools flagstat ${OUTDIR}/sample.sorted.bamsbatch slurm_bwa_align.shsqueue -u ${USER}scancel <jobid>| Directive | Description | Typical value |
|---|---|---|
--account | Allocation/account name | Check with mybalance |
--partition | Queue/partition | Cluster-specific |
--nodes | Number of nodes | 1 (almost always for bioinformatics) |
--ntasks | Number of processes | 1 for single tools |
--cpus-per-task | Threads per process | Match tool’s -t flag (4—32) |
--time | Wall clock limit | Start generous, tighten after sacct |
--mem | Total memory | Check tool docs; start with 16—32G |
--job-name | Name shown in squeue | Short, descriptive |
--output | stdout file | %x_%j.out (name + job ID) |
--error | stderr file | %x_%j.err |
--array | Array job indices | 0-N for batch processing |
When running the same tool on multiple input files, use array jobs instead of submitting separate scripts.
Each array task gets a unique SLURM_ARRAY_TASK_ID (0, 1, 2, …) that you use to select the input file.
#!/bin/bash#SBATCH --job-name=fastqc#SBATCH --account=<account-name>#SBATCH --partition=<partition-name>#SBATCH --cpus-per-task=2#SBATCH --time=01:00:00#SBATCH --mem=4G#SBATCH --array=0-5#SBATCH --output=fastqc_%A_%a.out#SBATCH --error=fastqc_%A_%a.err
module --force purgemodule load biocontainers fastqc/0.12.1
FASTQ_LIST=/scratch/negishi/${USER}/project/fastq_list.txtOUTDIR=/scratch/negishi/${USER}/project/fastqc_resultsmkdir -p ${OUTDIR}
FASTQ=$(sed -n "$((SLURM_ARRAY_TASK_ID + 1))p" ${FASTQ_LIST})
fastqc --outdir ${OUTDIR} --threads ${SLURM_CPUS_ON_NODE} --quiet ${FASTQ}Create the file list first:
ls /scratch/negishi/${USER}/project/fastq/*.fastq.gz > fastq_list.txtConda requires shell initialization inside batch scripts. Without it, conda activate will fail:
#!/bin/bash#SBATCH --job-name=multiqc#SBATCH --account=<account-name>#SBATCH --partition=<partition-name>#SBATCH --cpus-per-task=2#SBATCH --time=00:30:00#SBATCH --mem=4G#SBATCH --output=%x_%j.out#SBATCH --error=%x_%j.err
module --force purgemodule load condaeval "$(conda shell.bash hook)"conda activate multiqc_env
multiqc /scratch/negishi/${USER}/project/fastqc_results \ --outdir /scratch/negishi/${USER}/project/multiqc_output \ --filename multiqc_report \ --force
conda deactivateThe key line is eval "$(conda shell.bash hook)" — this initializes Conda for the non-interactive bash shell that SLURM uses.
After a job completes, check what it actually used:
sacct -j <jobid> --format=JobID,JobName,MaxRSS,Elapsed,State,ExitCode--mem next time.--time.Start generous, then tighten. Over-requesting wastes your allocation priority but under-requesting kills your job.
Problem: You run a tool and get bash: samtools: command not found.
Diagnosis: The module is not loaded, or you loaded biocontainers but forgot the tool module.
Fix:
module --force purgemodule load biocontainers samtools/1.21If module spider cannot find the tool after loading biocontainers, it may not be installed on this cluster. Try Conda or pull a custom container.
Problem: Your job vanishes from squeue but produced no output files.
Diagnosis: Check the .err file and sacct:
cat <jobname>_<jobid>.errsacct -j <jobid> --format=JobID,State,ExitCode,MaxRSSCommon states:
| State | Meaning |
|---|---|
COMPLETED | Finished successfully (exit code 0:0) |
FAILED | Your script had an error |
OUT_OF_MEMORY | Exceeded --mem request |
TIMEOUT | Exceeded --time request |
CANCELLED | Manually cancelled or preempted |
Problem: sacct shows OUT_OF_MEMORY.
Fix: Increase --mem. Check MaxRSS of the failed job to see peak usage, then request 20—30% more.
Problem: Your files on /scratch are gone.
Diagnosis: Scratch is purged after 60 days of inactivity. There is no warning and no recovery.
Fix: Move important results to Home or Depot promptly. For active projects, periodic access resets the clock.
Problem: Lmod has detected the following error: ... when loading modules.
Fix: Start fresh:
module --force purgemodule load biocontainers <tool1>/<version> <tool2>/<version>Problem: CommandNotFoundError: Your shell has not been properly configured...
Fix: Add shell initialization before conda activate:
module --force purgemodule load condaeval "$(conda shell.bash hook)"conda activate myenvProblem: Script fails because /scratch/negishi/ does not exist on Gautschi.
Fix: Use ${RCAC_SCRATCH} instead of hardcoding the cluster name:
WORKDIR=${RCAC_SCRATCH}/my_projectThis resolves to the correct path on any RCAC cluster.
When a job fails, follow these steps:
echo $? (0 = success, non-zero = failure)cat <jobname>_<jobid>.errsacct -j <jobid> --format=JobID,State,ExitCode,MaxRSS,Elapsedsinteractive -A <account-name> -n 4 --time=1:00:00, load modules, run the failing commandCopy this .condarc to your home directory to redirect Conda storage off of Home:
pkgs_dirs: - /scratch/negishi/${USER}/.conda/pkgsenvs_dirs: - /scratch/negishi/${USER}/.conda/envschannels: - conda-forge - bioconda - defaultsauto_activate_base: falsenegishi with your cluster name)Session 6: QC for Genomics — April 7, 2026, 11:00 AM — 12:00 PM ET
Topics: FastQC interpretation, fastp trimming, MultiQC aggregation, quality control strategies for different sequencing platforms.