Skip to content

Gene prediction using BRAKER3

BRAKER3 is a pipeline that combines GeneMark-ET and AUGUSTUS to predict genes in eukaryotic genomes. This pipeline is particularly useful for annotating newly sequenced genomes. The flexibility of BRAKER3 allows users to provide various input datasets for improving gene prediction accuracy. In this example, we will use various scenarios to predict genes in a Maize genome using BRAKER3. Following are the scenarios we will cover:

Input TypeCase 1Case 2Case 3Case 4Case 5Case 6Case 7Case 8
Genome✔️✔️✔️✔️✔️✔️✔️✔️
RNA-Seq✔️^*✔️✔️✔️
Iso-Seq✔️✔️
Conserved proteins✔️✔️✔️✔️
Pretrained species model✔️

minimal RNA-Seq data (one library/one tissue)

We will use the apptainer tool to build a Singularity container for BRAKER3. The Singularity container will contain all the necessary dependencies and tools required to run BRAKER3. To build the Singularity container, run the following command:

apptainer build --fakeroot braker3.sif docker://teambraker/braker3:latest

This will create a Singularity container named braker3.sif with BRAKER3 installed.

Before running BRAKER3, we need to set up:

  1. GeneMark-ES/ET/EP/ETP license key
  2. The AUGUSTUS_CONFIG_PATH configuration path

The license key for GeneMark-ES/ET/EP/ETP can be obtained from the GeneMark website. Once downloaded, you need to place it in your home directory:

tar xf gm_key_64.gz
cp gm_key_64 ~/.gm_key

For the AUGUSTUS_CONFIG_PATH, we need to copy the config directory from the Singularity container to the scratch directory. This is required because BRAKER3 needs to write to the config directory, and the Singularity container is read-only. To copy the config directory, run the following command:

apptainer exec braker3.sif cp -r /opt/Augustus/config ${RCAC_SCRATCH}/braker/augustus_config

The paths to the following variables need to be set:

BRAKER_SIF="${RCAC_SCRATCH}/braker/braker3.sif"
AUGUSTUS_CONFIG_PATH="${RCAC_SCRATCH}/braker/augustus_config"
GENEMARK_PATH="/opt/ETP/bin/gmes"
genome="${RCAC_SCRATCH}/braker/Zm-B73-REFERENCE-NAM-5.0_softmasked.fa"
workdir=${PWD}/$(basename ${genome%.*})_braker

With genome only (no external evidence)

InputType
GenomeB73.v5 (softmasked)
RNA-Seq dataNone
Protein sequencesNone
Long-read dataNone
Pretrained species modelNone
mkdir -p ${workdir}
apptainer exec --bind ${RCAC_SCRATCH} ${BRAKER_SIF} braker.pl \
--AUGUSTUS_CONFIG_PATH=${AUGUSTUS_CONFIG_PATH} \
--GENEMARK_PATH=${GENEMARK_PATH} \
--esmode \
--genome=${genome} \
--species=Zm_$(date +"%Y%m%d").c1 \
--workingdir=${workdir} \
--gff3 \
--threads ${SLURM_CPUS_ON_NODE}

Busco results

Assigned features

Assigned features

Assigned features

Pfam Domains

Assigned features

GFF3 stats

braker consistency

braker_cds-gc