Turning STACKS output into IMa2 input files

This script extract sequence haplotypes from the “alleles.tsv” files generated by STACKS and does some light filtering (you may want to add more). It’s very similar to the one I used for our 2013 Molecular Ecology paper, and still has some Great Sand Dunes-specific parameter names, but should work ok for other data sets. Oh, and I was using the “pstacks” reference-guided workflow in a slightly older version STACKS, in case that matters.

extract_haplotype_sequences_v4_annotated.r

example_alleles.tsv

Please let me know if you use this script and whether it needs tweaking.

Jaatha – training data sets (Rose)

I’ve generated three training data sets, which will save you around 5 days if you decide to run Jaatha, a molecular demography program. It uses the joint site frequency spectrum of two populations to model various aspects of population history (split time, population size and growth, migration). Here’s the paper: Naduvilezhath et al 2011.

1. Using the default model, with the following maxima: tmax=20, mmax=5, qmax=10.

2. Alternative maxima: tmax=5, mmax=20, qmax=20.

3. Alternative maxima: tmax=5, mmax=20, qmax=5.

They can’t be uploaded because they’re compressed R data structures, but let me know if you’d like to give them a whirl.

If BWA wants *.nt.ann file… (Rose)

Recently BWA (an alignment program) suddenly started giving a strange error message, indicating that a reference file ending in *.nt.ann was missing. This file type was unfamiliar to me, with good reason: it’s a colourspace reference file, which shouldn’t be generated when we index the fasta-based references we’re using (at least, I don’t know of anyone in our lab using SOLID data as a reference). DO NOT rebuild the reference with the -c (colourspace) flag, as you might see suggested on the web, because we don’t know what effect that might have on our alignments. DO rebuild it with the usual settings.

Lab camera manual: Panasonic Lumix DMC-ZS7 (Rose)

A couple of points:

1. There is a spare battery, so please swap out and charge the one that you have just used.

2. The photos stored on the card could be deleted at any time (if a big job needs more room on the storage card), so PLEASE download them  before you take the camera back to the lab to avoid losing them.

3. The GPS should be turned on only when you need it (and set to OFF when you get on a plane).

DMCZS7 Basic Operating Instructions

DMCZS7 Operating Instructions

Bioportal (Rose)

Bioportal is a free computing resource that provides several applications in our area. I’ve been running STRUCTURE on both the “low priority” and normal queues and it’s been fantastic (unlike Westgrid, who haven’t even responded to my application). For those of you who are struggling to find room on the cluster, it might be useful to you too. Much as I’d like to keep it to myself and exploit the hell out of it, here’s the address:

https://www.bioportal.uio.no/

 

STACKS installation (Rose)

Installing stacks on Ubuntu Natty Narwhal or Oneiric Ocelot

STACKS is a piece of software produced by Julian Catchen in the Cresko lab. It’s designed to identify loci and alleles from RAD (or GBS) reads either de novo or after alignment to a reference. It consists of several modules that can be run separately, but to completely install it as a pipeline, it relies on a web server, unfortunately. Many of the required instructions are given in the README file, but because nobody in our lab is an expert on this, we had to fiddle around to get the program running on our Ubuntu machines.

Continue reading

R script for plotting STRUCTURE results (Q values) (Rose)

This is an R Script that plots individual Q values and labels populations. It can be modified to take average group membership from CLUMPP output and/or to import different population names and higher level groupings from elsewhere.

N.B. I haven’t run this on very many data sets, so it will probably need to be tweaked for your results. But please leave a comment if you run into any problems.

Continue reading

Our favourite text editors (Rose)

I hope we can start a conversation about this because a good text editor can make a big difference to a newbie, so PLEASE REPLY!!! I wanted to proselytise about Npp, but it only runs on Windows. So if you use a different OS, please make that BLEEDINGLY OBVIOUS.

Notepad++ (WINDOWS )

I’ve tried a numerous text editors over the years (like Context), but Notepad++ (Npp) is easily my favourite. It only runs on Windows, but I use it to export Unix formatted files routinely. You can set shortcut keys to change formats very easily. Npp can highlight lots of languages, including R, perl and unix. You can also define your own languages for highlighting – I did that to make my Migrate parameter files easier to read.
Continue reading

Filtering unmapped/unaligned reads from SAM files (Rose)

This is a post about some time-saving help Chris Grassa gave me.

STACKS (post coming soon) doesn’t deal well with all of the unaligned reads in SAM files, so I tried using PICARD to remove them. However, PICARD doesn’t like the SAM output of BWA, but Chris G showed me how to use the Unix command awk to do it much more easily. This is his command for my file 1076.sam:
Continue reading