g2gtools is a new suite of tools that creates personal diploid genomes. It allows us to
easily extract regions from personal genomes so we can create individualized alignment indexes for
next-generation sequencing reads. g2gtools can liftover alignments on personal genome
coordinates back to that of reference so we can compare alignments from among samples in a population.
Unlike other liftover tools, g2gtools does not throw away alignments that land on indel
regions.
We highly recommend using Anaconda distribution of python to install all the dependencies without issues, although g2gtools is also available at PyPI for ‘pip install’ or ‘easy_install’. The most recent version is also available on Anaconda Cloud, so add the following channels if you have not already.
$ conda config --add channels r
$ conda config --add channels bioconda
To avoid conflicts among dependencies, we highly recommend using conda virtual environment:
$ conda create -n g2gtools jupyter ipykernel
$ source activate g2gtools
Once g2gtools virtual environment is created and activated, your shell prompt will show ‘(g2gtools)’ at the beginning to specify what virtual environment you are currently in. Now type the following and install g2gtools:
(g2gtools) $ conda install -c kbchoi g2gtools
That’s all! You can go out from g2gtools virtual environment anytime by deactivating it:
(g2gtools) $ source deactivate
We show a typical workflow for creating diploid genome, exome, and transcriptome using a human example: NA19670 from 1000 Genomes (phase 3). You can download our bash script here.
$ g2gtools vcf2vci -o NA19670.vci -s NA19670 --diploid -p 8 \
-i ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr2.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr3.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr4.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr5.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr6.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr7.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr8.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr9.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr10.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr11.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr12.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr13.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr14.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr15.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr16.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr17.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr18.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr19.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr20.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr21.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chr22.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz \
-i ALL.chrX.phase3_shapeit2_mvncall_integrated_v1b.20130502.genotypes.vcf.gz \
-i ALL.chrY.phase3_integrated_v1b.20130502.genotypes.vcf.gz \
-i ALL.chrMT.phase3_callmom.20130502.genotypes.vcf.gz
$ g2gtools patch -i hs37d5.fa -c NA19670.vci.gz -o NA19670.fa -p 8
$ g2gtools transform -i NA19670.patched.fa -c NA19670.vci.gz -o NA19670.fa -p 8
$ g2gtools convert -i Homo_sapiens.GRCh37.75.gtf -c NA19670.vci.gz -o NA19670.gtf
$ g2gtools gtf2db -i NA19670.gtf -o NA19670.db
$ g2gtools extract -i NA19670.fa -db NA19670.db --exons > NA19670.exons.fa
$ g2gtools extract -i NA19670.fa -db NA19670.db --transcripts > NA19670.transcripts.fa
$ g2gtools extract -i NA19670.fa -db NA19670.db --genes > NA19670.genes.fa