GATK on sciborg | Rieseberg Lab Resources

A few notes on GATK.

1. GATK requires a younger version of Java than what is on the cluster currently.

> java -jar ./bin/GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar --help                                                                                    Exception in thread "main" java.lang.UnsupportedClassVersionError: org/broadinstitute/sting/gatk/CommandLineGATK : Unsupported major.minor version 51.0

Get the linux 64 bit version here:

http://java.com/en/download/

move it to sciborg:

> scp ../Downloads/jdk-7u45-linux-x64.tar.gz user@zoology.ubc.ca:cluster/bin/
#ssh  to the cluster extract it:
> tar -zxvf jdk-7u45-linux-x64.tar.gz
#Add it to your path file or use it like so:
> ./bin/jdk1.7.0_45/bin/java -jar ./bin/GenomeAnalysisTK-2.8-1-g932cd3a/GenomeAnalysisTK.jar --help

2. You must have a reference with a relatively small number of contigs/scaffolds

Kay identified and addressed this problem and wrote this script: CombineScaffoldForGATK. I’ve modified it very slightly.

perl CombineScaffoldForGATK.pl GenomeWithManyScaffolds.fa tmp.fa

WARNING: this can print empty lines. The script could be modified to address this but I don’t want to break it. Instead you can fix it with a sed one liner:

sed '/^$/d' tmp.fa > GenomeWith1000Scaffolds.fa

3. You must also prepare the “fasta file”

You also have to index it for BWA with bwa index but GATK needs more, see: http://gatkforums.broadinstitute.org/discussion/1601/how-can-i-prepare-a-fasta-file-to-use-as-reference

In short (in the same dir as your genome):

>java -jar ../bin/picard-tools-1.105/CreateSequenceDictionary.jar R= Nov22k22.split.pheudoScf.fa O= Nov22k22.split.pheudoScf.dict
>samtools faidx Nov22k22.split.pheudoScf.fa