I have tested various SNP calling methods using exome re-sequencing data from 12 interior spruce samples. I tried Bowtie2, BWA (mem), Picard (mark duplicates) and GATK for indel realignment and base quality recalibration. For SNP calling I used mpileup with and without BAQ as well as the Unified Genotyper from GATK. For an interesting and informative workshop outlining the Broad best practices SNP calling pipeline check out these youtube videos (http://www.youtube.com/watch?v=1m0ZiEvzDKI&list=PLlMMtlgw6qNgNKNv5V9qmjAxbkHAZS1Mf). My results are in a series of blog posts and I hope you find them useful. Please let me know if you have any suggestions for SNP calling. We only want to do the alignments and SNP calling once for the entire set of samples, because it is going to take a long time!
Recently BWA (an alignment program) suddenly started giving a strange error message, indicating that a reference file ending in *.nt.ann was missing. This file type was unfamiliar to me, with good reason: it’s a colourspace reference file, which shouldn’t be generated when we index the fasta-based references we’re using (at least, I don’t know of anyone in our lab using SOLID data as a reference). DO NOT rebuild the reference with the -c (colourspace) flag, as you might see suggested on the web, because we don’t know what effect that might have on our alignments. DO rebuild it with the usual settings.