Turning STACKS output into IMa2 input files

This script extract sequence haplotypes from the “alleles.tsv” files generated by STACKS and does some light filtering (you may want to add more). It’s very similar to the one I used for our 2013 Molecular Ecology paper, and still has some Great Sand Dunes-specific parameter names, but should work ok for other data sets. Oh, and I was using the “pstacks” reference-guided workflow in a slightly older version STACKS, in case that matters.

extract_haplotype_sequences_v4_annotated.r

example_alleles.tsv

Please let me know if you use this script and whether it needs tweaking.

STACKS installation (Rose)

Installing stacks on Ubuntu Natty Narwhal or Oneiric Ocelot

STACKS is a piece of software produced by Julian Catchen in the Cresko lab. It’s designed to identify loci and alleles from RAD (or GBS) reads either de novo or after alignment to a reference. It consists of several modules that can be run separately, but to completely install it as a pipeline, it relies on a web server, unfortunately. Many of the required instructions are given in the README file, but because nobody in our lab is an expert on this, we had to fiddle around to get the program running on our Ubuntu machines.

Continue reading

Filtering unmapped/unaligned reads from SAM files (Rose)

This is a post about some time-saving help Chris Grassa gave me.

STACKS (post coming soon) doesn’t deal well with all of the unaligned reads in SAM files, so I tried using PICARD to remove them. However, PICARD doesn’t like the SAM output of BWA, but Chris G showed me how to use the Unix command awk to do it much more easily. This is his command for my file 1076.sam:
Continue reading