Approximate Bayesian Computations

In many cases it may be more straightforward (and informative) to test specific models using our data. An interesting approach for inferring population parameters and/or model testing is approximate Bayesian computations (ABC). There are several available tools such as msBayes, DIYABC, PopABC abctools R package, ABCTools.

Although ABC is a powerful and useful approach it has some caveats, e.g. choice of summary statistics, number and complexity of the models tested, amount of data and more. For realistic expectations and simple models ABC could really add some interesting insights to popgen studies.

Estimating Insert Sizes

We recently had some trouble estimating insert sizes with our Mate Pair (aka Jumping, larger insert sizes) Libraries.  All the libraries sequenced by Biodiversity and the Genome Sciences Centre (GSC) were shockingly bad, but the libraries sequenced by INRA were very good.  For example, according to the pipeline, the GSC 10kbp insert size library had an average 236bp insert size, but the INRA 20kb library an average insert size of 20630bp.

See the histogram for the 10kbp library:

Continue reading

GBS, coverage and heterozygosity

I’m running some tests on my GBS data to look for population expansion. I know from looking at GBS data from an F1 genetic mapping population that for GBS data heterozygotes can be under called due to variation in amplification and digestions. Also, for my data observed heterozygosity is almost always under expected. Heterozygotes can also be overcalled when duplicated loci are aligned together. The tests I’m going to use explicitly use observed heterozygosity so this is worrying.

Continue reading

STACKS installation (Rose)

Installing stacks on Ubuntu Natty Narwhal or Oneiric Ocelot

STACKS is a piece of software produced by Julian Catchen in the Cresko lab. It’s designed to identify loci and alleles from RAD (or GBS) reads either de novo or after alignment to a reference. It consists of several modules that can be run separately, but to completely install it as a pipeline, it relies on a web server, unfortunately. Many of the required instructions are given in the README file, but because nobody in our lab is an expert on this, we had to fiddle around to get the program running on our Ubuntu machines.

Continue reading

How to post – code (Dan E.)

We have a problem sharing code via RLR.

The Problem

Unfortunately WordPress has a list of acceptable file types that it allows to be uploaded to our media library and none of the useful coding file types are on that list. The list is simply a list of acceptable file extensions. This means if you write a useful R script (or perl or python) script and save it with a standard file extension, like .R or .pl, WordPress will not allow you to upload it to the RLR media library so that you can share it via a post.

The Solution

The list of acceptable file extensions can be hacked and I might give it a try but, until I do, you will have to do one of these things:

  • Change the file extension. If you save your script as a .txt file it will upload fine. You should make it clear in your post what kind of script it is and then people who download it can change the .txt extension to whatever they want.
  • Put the code in your post. If your script is not too long you can simply copy and paste the code from your text editor into the post editor. The formatting of the code will remain true to the original so users can simply copy and paste it back out into a text editor or R-Studio or wherever. See Rose’s post about plotting STRUCTURE results for an example of this.
  • Compress your script file. If your script is big you can try zipping it and then uploading the compressed file. Users can then just download and unzip it. [As of November 2011 this hasn’t been tested.]

Dan E.

R script for plotting STRUCTURE results (Q values) (Rose)

This is an R Script that plots individual Q values and labels populations. It can be modified to take average group membership from CLUMPP output and/or to import different population names and higher level groupings from elsewhere.

N.B. I haven’t run this on very many data sets, so it will probably need to be tweaked for your results. But please leave a comment if you run into any problems.

Continue reading