GBRS is a suite of tools for reconstructing genomes using RNA-Seq data from multiparent population and quantifying allele specific expression. Although we tested it with mouse models only, GBRS should work for any multiparent populations. If you are interested in using it for other multiparent population model, please contact Kwangbom “KB” Choi at The Jackson Laboratory. For the Diversity Outbred (DO), Collaborative Cross (CC) mice, or F1 hybrids of CC’s (CCRIX), required data files are available at ftp://churchill-lab.jax.org/software/GBRS/.

Installation

Note: Although GBRS is available at PyPI for pip install or easy_install, we highly recommend using Anaconda distribution of python to install all the dependencies without issues. GBRS is also available on Anaconda Cloud, so add the following channels if you have not already:

  $ conda config --add channels r
  $ conda config --add channels bioconda

To avoid conflicts among dependencies, we also highly recommend using conda virtual environment:

  $ conda create -n gbrs python=2
  $ source activate gbrs

Once GBRS virtual environment is created and activated, your shell prompt will show ‘(gbrs)’ at the beginning to specify what virtual environment you are currently in. Now please type the following and install GBRS:

(gbrs) $ conda install -c kbchoi gbrs

That's all! Now gbrs is available. Once you are done using GBRS, you can go out from GBRS virtual environment anytime:

(gbrs) $ source deactivate

Usage

Note: We will assume you installed GBRS in its own conda virtual environment. First of all, you have to “activate” the virtual environment by doing the following:

$ source activate gbrs

The first step is to align our RNA-Seq reads against pooled transcriptome of all founder strains:

$ bowtie -q -a --best --strata --sam -v 3 ${GBRS_DATA}/bowtie.transcriptome ${FASTQ} \
    | samtools view -bS - > ${BAM_FILE}

Before quantifying multiway allele specificity, bam file should be converted into emase format:

$ gbrs bam2emase -i ${BAM_FILE} \
                 -m ${GBRS_DATA}/ref.transcripts.info \
                 -s ${COMMA_SEPARATED_LIST_OF_HAPLOTYPE_CODES} \
                 -o ${EMASE_FILE}

We can compress EMASE format alignment incidence matrix:

$ gbrs compress -i ${EMASE_FILE} \
                -o ${COMPRESSED_EMASE_FILE}

If storage space is tight, you may want to delete ${BAM_FILE} or ${EMASE_FILE} at this point since ${COMPRESSED_EMASE_FILE} has all the information the following steps would need. If you want to merge emase format files in order to, for example, pool technical replicates, you run ‘compress’ once more listing files you want to merge with commas:

$ gbrs compress -i ${COMPRESSED_EMASE_FILE1},${COMPRESSED_EMASE_FILE2},... \
                -o ${MERGED_COMPRESSED_EMASE_FILE}

and use ${MERGED_COMPRESSED_EMASE_FILE} in the following steps. Now we are ready to quantify multiway allele specificity:

$ gbrs quantify -i ${COMPRESSED_EMASE_FILE} \
                -g ${GBRS_DATA}/ref.gene2transcripts.tsv \
                -L ${GBRS_DATA}/gbrs.hybridized.targets.info \
                -M 4 --report-alignment-counts

Then, we reconstruct the genome based upon gene-level TPM quantities (assuming the sample is a female from the 20th generation Diversity Outbred mice population)

$ gbrs reconstruct -e gbrs.quantified.multiway.genes.tpm \
                   -t ${GBRS_DATA}/tranprob.DO.G20.F.npz \
                   -x ${GBRS_DATA}/avecs.npz \
                   -g ${GBRS_DATA}/ref.gene_pos.ordered.npz

We can now quantify allele-specific expressions on diploid transcriptome:

$ gbrs quantify -i ${COMPRESSED_EMASE_FILE} \
                -G gbrs.reconstructed.genotypes.tsv \
                -g ${GBRS_DATA}/ref.gene2transcripts.tsv \
                -L ${GBRS_DATA}/gbrs.hybridized.targets.info \
                -M 4 --report-alignment-counts

Genotype probabilities are on a grid of genes. For eQTL mapping or plotting genome reconstruction, we may want to interpolate probability on a decently-spaced grid of the reference genome.:

$ gbrs interpolate -i gbrs.reconstructed.genoprobs.npz \
                   -g ${GBRS_DATA}/ref.genome_grid.69k.txt \
                   -p ${GBRS_DATA}/ref.gene_pos.ordered.npz \
                   -o gbrs.interpolated.genoprobs.npz

To plot a reconstructed genome:

$ gbrs plot -i gbrs.interpolated.genoprobs.npz \
            -o gbrs.plotted.genome.pdf \
            -n ${SAMPLE_ID}

Announcements