Python & Pylance
For code completion and debugging.
Goal: Enable password-less login to run automated scripts and transfer files seamlessly.
Generate a Key Pair
Run this on your local computer (Terminal or PowerShell). Press Enter to accept defaults (file location and no passphrase).
ssh-keygen -t ed25519Copy Public Key to Cluster
Send your public key to the cluster. Replace boilerid with your actual username.
ssh-copy-id boilerid@bell.rcac.purdue.edutype $env:USERPROFILE\.ssh\id_ed25519.pub | ssh boilerid@bell.rcac.purdue.edu "mkdir -p ~/.ssh && cat >> ~/.ssh/authorized_keys"Test Connection
You should now be able to log in without typing a password:
ssh boilerid@bell.rcac.purdue.eduGoal: Simplify login commands (e.g., type ssh bell instead of ssh user@bell.rcac.purdue.edu).
Create/Edit the Config File
Open ~/.ssh/config on your local computer using a text editor (VS Code, Nano, Notepad, etc.).
Paste the Configuration
Copy the block below. Be sure to replace boilerid with your specific username.
# --- GLOBAL RCAC DEFAULTS ---# Applies to all RCAC clusters automaticallyHost *.rcac.purdue.edu User boilerid # <--- REPLACE THIS with your username IdentityFile ~/.ssh/id_ed25519 Port 22 ForwardAgent yes ForwardX11 yes
# Keep connection alive to prevent timeouts ServerAliveInterval 300 ServerAliveCountMax 2
# Multiplexing & Persistence (Speed boost for multiple windows) ControlMaster auto ControlPath ~/.ssh/cm-%r@%h:%p ControlPersist 10m
# --- CLUSTER SHORTCUTS ---
Host bell HostName bell.rcac.purdue.edu
Host negishi HostName negishi.rcac.purdue.edu
Host anvil HostName anvil.rcac.purdue.edu # User x-boilerid # Uncomment and update if Anvil username differsUsage
You can now connect instantly using the short names:
ssh bellssh negishissh anvilrsync -av ~/localfolder/ bell:/depot/project/remote_folder/Goal: Save keystrokes on repetitive commands and prevent home directory quota issues.
Instead of cluttering your main .bashrc file, create a separate file for shortcuts.
Create the file:
nano ~/.bash_aliasesPaste the content below (Customize the APPTAINER_CACHEDIR path!).
Activate it:
Open your .bashrc (nano ~/.bashrc) and ensure this block exists (it usually does by default):
if [ -f ~/.bash_aliases ]; then . ~/.bash_aliasesfiReload:
Run source ~/.bashrc to apply changes immediately.
Copy this into your ~/.bash_aliases file.
# --- 1. ENVIRONMENT VARIABLES ---# CRITICAL: Point Apptainer cache to Depot/Scratch to avoid filling up Homeexport APPTAINER_CACHEDIR="/depot/itap/$USER/apptainer"
# --- 2. LISTING & NAVIGATION ---alias pwd='pwd -P' # Show physical path (resolves symlinks)alias ls='ls --color=auto -v' # Colorized outputalias ll='ls -l' # Standard long listalias la='ls -Al' # Show hidden filesalias lt='ls -ltr' # Sort by date (newest at bottom) - GREAT for checking logsalias lk='ls -lSr' # Sort by size (biggest at bottom)alias ld='ls -d */' # List directories only
# Advanced Listing (Requires 'exa' if you use 'lr')# alias lr='exa --long --color-scale --tree --level=3'
# --- 3. DISK USAGE ---alias du='du -kh' # Human readable sizesalias dd='du -sch *' # Summary of current directory sizes
# --- 4. SAFETY ---alias rm='rm -i' # Ask before deletingalias cp='cp -i' # Ask before overwritingalias mv='mv -i' # Ask before overwriting
# --- 5. SLURM (JOB MANAGEMENT) ---
# Check MY jobs (Formatted for readability)alias myq='squeue -o "%12i %20j %2t %8u %10q %10a %10P %10Q %5D %5C %11l %11L %R" -u $USER'
# Check ALL jobsalias qs='squeue -a -o "%12i %20j %2t %8u %10q %10a %10P %10Q %5D %5C %11l %11L %R"'
# Check Node Status (Idle, Mixed, Allocated, etc.)alias ql='sinfo -o "%20P %5D %14F %8z %10m %10d %11l %N"'
# Quick Interactive Job (Customize allocation/time as needed)alias interact='sinteractive -A <YOUR_ALLOCATION> -t 04:00:00 -N 1 -n 16'Version control your configuration to keep clusters and local machines in sync.
Create a private GitHub repository named dotfiles.
Move your config files there and symlink them back.
# Example workflowmkdir ~/dotfilesmv ~/.bash_aliases ~/dotfiles/ln -s ~/dotfiles/.bash_aliases ~/.bash_aliasesGoal: Edit cluster files with a full graphical interface, syntax highlighting, and integrated terminals.
Setup Steps
Install: Download VS Code locally.
Extension: Open VS Code, go to Extensions (square icon on left), and install Remote - SSH.
Connect: Click the green ”><” icon (bottom-left corner) → Connect to Host… → Select bell (or your target cluster).
Essential Extensions (Install on Remote) Once connected, install these in the “SSH: bell” section of your extensions pane:
Python & Pylance
For code completion and debugging.
ShellCheck
To catch errors in your bash scripts automatically.
GitLens
To visualize who edited code and when.
R
For R language support (requires extra config).
Editing Tips
bell-fe02), stick to it. Connecting to different nodes launches multiple VS Code server instances, which wastes resources.Goal: Isolate software dependencies per project and avoid “works on my machine” issues.
Setup: Avoid Quota Issues Conda environments are large. Configure them to store data in your Depot space, not your Home directory (which has strict limits).
Create the directory: mkdir -p /depot/your_lab/user/conda_envs
Edit your config: nano ~/.condarc
Add this text:
envs_dirs: - /depot/your_lab/user/conda_envs - ~/.conda/envspkgs_dirs: - /depot/your_lab/user/conda_pkgs - ~/.conda/pkgsQuick Commands (Use mamba for speed)
The RCAC conda module includes mamba, which is faster than standard conda.
| Action | Command |
|---|---|
| Start | module load conda |
| Create | mamba create -n my_env python=3.10 |
| Activate | conda activate my_env |
| Install | mamba install bioconda::samtools |
| List Envs | conda env list |
| Remove | conda env remove -n my_env |
Reproducibility: The “Time Capsule” Before you finish a project, save a snapshot of your environment. This guarantees reproducibility.
Export (Save):
# Saves only the packages you explicitly asked for (cleaner)conda env export --from-history > environment.yml
# Saves EXACT versions of everything (safest for immediate reproduction)conda list --explicit > spec.txtImport (Load):
mamba env create -f environment.yml📄 View Quick Reference (PDF) Includes visual workflows for creating vs. using environments and version pinning examples.
Goal: Use software that runs the same way everywhere, without installation headaches or file quota (inode) issues.
Why Apptainer (Singularity)?
.sif). It loads faster than a Conda environment with 30,000 tiny files.Quick Commands
# Pull a container from Docker Hubapptainer pull fastqc.sif docker://biocontainers/fastqc:v0.11.9_cv8
# Run a tool inside the containerapptainer exec fastqc.sif fastqc input.fq
# Run on GPU (Bell/Gilbreth only)apptainer exec --nv deepvariant.sif run_deepvariant ...The “Inode Saver” (Overlays)
# 1. Create a 5GB overlay fileapptainer overlay create --size 5120 my_overlay.img
# 2. Run your tool with the overlay attachedapptainer exec --overlay my_overlay.img maker.sif maker ...📄 View Quick Reference (PDF) Includes recipes for building your own containers and advanced bind-mount usage.
Goal: Standardize your tools and run complex commands with a single word.
Setup
Create the directory: mkdir -p ~/bin
Add to your $PATH (if not already there):
# Add this to your ~/.bashrcexport PATH="$HOME/bin:$PATH"Reload: source ~/.bashrc
Recommended Wrapper Scripts
Save these files in ~/bin, then run chmod +x ~/bin/* to make them executable.
Converts SAM to BAM, sorts it, and indexes it in one go.
#!/bin/bash# Usage: sam2bam input.sam# Output: input.sorted.bam and input.sorted.bam.bai
if [ -z "$1" ]; then echo "Usage: sam2bam <input.sam>" exit 1fi
BASE="${1%.sam}"THREADS=4
echo "Converting $1 -> $BASE.sorted.bam..."
# Pipe view directly to sort to save disk I/Osamtools view -uS -@ $THREADS "$1" | \samtools sort -@ $THREADS -o "$BASE.sorted.bam"
# Index immediatelysamtools index "$BASE.sorted.bam"echo "Done."Runs BWA from a container without needing to load modules or remember the container path.
#!/bin/bash# Usage: run_bwa <ref.fa> <read1.fq> <read2.fq># Output: read1.sam
CONTAINER="/depot/itap/biocontainers/bwa.sif" # Update with your pathREF=$1R1=$2R2=$3THREADS=12
if [ -z "$3" ]; then echo "Usage: run_bwa <ref.fa> <r1.fq> <r2.fq>" exit 1fi
OUTPUT="$(basename ${R1%.*}.sam)"
echo "Running BWA MEM with $THREADS threads..."apptainer exec $CONTAINER bwa mem -t $THREADS "$REF" "$R1" "$R2" > "$OUTPUT"Quickly sort a BAM file using available threads.
#!/bin/bash# Usage: bsort input.bam
if [ -z "$1" ]; then echo "Usage: bsort <input.bam>" exit 1fi
BASE="${1%.bam}"THREADS=8
samtools sort -@ $THREADS -o "$BASE.sorted.bam" "$1"samtools index "$BASE.sorted.bam"Quickly checks read counts in gzipped files (useful for verifying transfers).
#!/bin/bash# Usage: fqcount file.fastq.gz
zcat "$1" | echo "$1: $(( $(wc -l) / 4 )) reads"Goal: Structure your data and code so that collaborators (and “Future You”) can understand and reproduce your work.
The “Standard” Directory Tree Adopt this structure for every new experiment to keep things consistent.
Best Practices
01_data/raw. Treat it as read-only.01_qc, 02_align, 03_call_variants).Airway_Study, not Airway Study).../01_data rather than /scratch/bell/user/project/... so the folder is portable.Version Control (Git)
Track your 02_scripts folder, but ignore large data files.
# 1. Start trackingcd 02_scriptsgit init
# 2. Save changesgit add *.shgit commit -m "Added alignment script"
# 3. Push to cloud (optional but recommended)git remote add origin [https://github.com/user/project.git](https://github.com/user/project.git)git push -u origin main📄 View Quick Reference (PDF) Includes a step-by-step workflow for starting a new project.
Goal: Move data efficiently without corrupting files or freezing your terminal.
Choose the Right Tool
| Data Size | Recommended Tool | Why? |
|---|---|---|
| < 1 GB | scp | Quick, simple, no setup required. |
| Directories | rsync | Resumes if interrupted, syncs only changes. |
| Big Data (>100GB) | Globus | ”Fire and forget.” Fast, reliable, background transfer. |
| Cloud (Box/Drive) | rclone | Command-line sync for cloud storage. |
1. rsync (The Workhorse)
Use this for moving project folders between your laptop and the cluster, or between scratch and depot.
-a: Archive (preserve permissions/times)-v: Verbose-P: Partial + Progress (allows resuming)# Push: Local -> Clusterrsync -avP ./local_folder/ boilerid@bell.rcac.purdue.edu:/scratch/bell/boilerid/dest/
# Pull: Cluster -> Localrsync -avP boilerid@bell.rcac.purdue.edu:~/remote_file.txt ./2. scp (Quick Copy)
Best for grabbing a single config file or script.
scp boilerid@bell.rcac.purdue.edu:~/slurm_job.out ./3. Globus (Big Data) For terabytes of sequencing data, do not use the command line. Use Globus for reliable, high-speed transfers that run in the background. First install Globus Connect Personal on your laptop and set it up.
📄 View Quick Reference (PDF) Includes detailed setup for Globus Connect Personal.
Goal: Balance performance and data safety. Don’t let a full disk crash your pipeline.
The Hierarchy
| Location | Path | Speed | Persistence | Use Case |
|---|---|---|---|---|
| Local Scratch | $TMPDIR | ⚡ Fastest | Temporary (Job only) | High I/O (thousands of reads/writes). |
| Scratch | $RCAC_SCRATCH | 🚀 Fast | Purged (volatile) | Active analysis & intermediate files. |
| Depot | /depot/lab | 🐢 Slower | Permanent | Long-term storage & shared data. |
| Home | $HOME | 🐢 Slower | Permanent (Small Quota) | Config files, scripts, small logs. |
Quota Survival Commands
Check Status: Run myquota to see your usage across all filesystems.
Find Space Hogs: If you hit your limit, use these commands to find which directories are taking up space:
# Check Homedu -h --max-depth=1 $HOME
# Check Scratchdu -h --max-depth=1 $RCAC_SCRATCHClean Up Routine
tar -czf results.tar.gz results/rm -rf results/ from scratch.Goal: Build a “Second Brain.” Your shell history (history | grep cmd) is temporary; your notes are permanent.
The Strategy: Digital Lab Notebook Use tools like Obsidian (markdown-based) or OneNote.
awk, sed, or slurm commands you spent hours perfecting.-n 10 because memory failed at -n 5”).Example Note Structure:
## Error: Slurm OOM on Trinity Job**Date:** 2025-12-09**Error:** `slurmstepd: error: Detected 1 oom-kill event(s)`**Fix:** Increased mem-per-cpu from 4G to 8G.**Command:** `sbatch --mem-per-cpu=8G submit.sh`Goal: Stop copy-pasting images into PowerPoint. Create reports where the code is the documentation.
The Workflow
.Rmd) to write your analysis code and your explanation in the same file.Quick Start Header Use this YAML header to add a table of contents and code folding (hides code by default so non-coders can read the report).
---title: "Project QC Report"author: "Your Name"date: "2025-12-09"output: html_document: toc: true toc_float: true code_folding: hide---📄 View Quick Reference (PDF) Official reference for syntax, chunk options, and formatting.
Goal: Save your work history so you can undo mistakes and collaborate without emailing zip files.
1. First-Time Setup Tell Git who you are (run once per computer):
git config --global user.name "Your Name"git config --global user.email "your.email@purdue.edu"2. Starting a Project
cd my_project then git initgit clone https://github.com/username/repo.git3. The Daily Workflow (Save & Sync)
Check: See what changed.
git statusStage: Select files to save (use . for everything).
git add .Commit: Save the snapshot with a message.
git commit -m "Added QC plotting script"Push: Send changes to GitHub.
git push origin main4. The “Golden Rule” of Bioinformatics Git
Solution: Create a file named .gitignore in your folder.
Add these lines to it:
*.fastq*.bam*.samresults/data/📄 View Quick Reference (PDF) Includes branching workflows and a list of Git “Don’ts”.
Goal: Stop guessing how much RAM you need. Requesting 100GB and using 2GB kills your priority and wastes cluster space.
The Audit Tool: seff
After a job finishes, run this command to see what actually happened.
seff <job_id>What to look for:
The “Sinteractive” Test Don’t write a 50-line script and hope it works. Test interactively first.
# Get a node for 1 hoursinteractive -A <account> -t 01:00:00 --mem=8G
# Run your commands manually. If they work, put them in a script.Goal: Process 100 samples in parallel using ONE script. Never write a for loop to submit jobs.
The Logic
Slurm launches multiple copies of your script. In each copy, the variable $SLURM_ARRAY_TASK_ID changes (1, 2, 3…). You use this number to pick which file to process.
The Template
Create a file named samples.txt listing your input filenames (one per line).
#!/bin/bash#SBATCH --job-name=array_demo#SBATCH --output=logs/sample_%a.out # %a becomes the task ID#SBATCH --array=1-24 # Process samples 1 through 24#SBATCH --cpus-per-task=4
# 1. Get the filename for THIS task IDSAMPLE=$(sed -n "${SLURM_ARRAY_TASK_ID}p" samples.txt)
echo "Processing sample: $SAMPLE"
# 2. Run the toolapptainer exec tool.sif my_tool -i "$SAMPLE.fq" -o "$SAMPLE.bam"Common Array Commands
sbatch --array=1-100 script.sh: Submit 100 tasks.sbatch --array=1-100%20 script.sh: Submit 100, but only run 20 at a time (be nice to the queue).scancel <jobid>_<taskid>: Cancel just one specific task in the array.📄 View Quick Reference (PDF) Includes detailed syntax for GPU jobs and parameter sweeps.
Goal: Unblock yourself quickly by choosing the right channel.
Peer Support
Best for “How do I run X?” or “Why does this plot look weird?” Join RCAC Genomics Discord
Tutorials
Step-by-step guides for common bioinformatics tasks. Visit RCAC Bioinformatics Docs
Announcements
Subscribe to the Bioinformatics Mailing List to catch upcoming workshops and events.
System Issues
Best for “The node crashed” or “I can’t log in.” Email: rcac-help@purdue.edu
Goal: Get your problem fixed in one email, not ten.
The “Good Ticket” Template
When emailing rcac-help@purdue.edu, include these four things to skip the “back-and-forth” phase:
12345678) – Support staff can look up exactly why it failed..err log file. Don’t just say “it failed.”/scratch/bell/user/project/run.sh)Example Email: