GBS missing data

Posted on November 6, 2012 by Greg B.

I’ve done a small analysis on my GBS data and posted it on my blog: http://www.proseedwithscience.com/?p=816

Edit: This is mostly just a quick look at the amount of missing data in the data and some potential explanations of where it might be from.

Blast2GO

Posted on October 25, 2012 by Kay

This describes how you can run blast2go on a server using b2gpipe and a local database. This makes blast2go a viable option for annotating large fasta files. Otherwise it is much too slow. The database is currently set up on an AdapTree server. This took a while for me to troubleshoot, so you could run into different problems, but you will hopefully avoid some of the issues I ran into. The b2g Google group is good for troubleshooting. You can find many of these instructions at http://www.blast2go.com/b2glaunch/resources/35-localb2gdb

Continue reading →

Allowed File Types At RLR – you can now upload scripts with their usual file extensions

Posted on October 16, 2012 by Dan E.

Hello All,

Many of us have been annoyed by the restricted file types that WordPress allows to be uploaded to RLR. It’s especially annoying because all WordPress is doing when it permits or denies an upload is checking the file extension against a list of allowable extensions. {Even the most malicious code could be uploaded to our blog as long as it had a .txt file extension. Whether that code could then be made to execute, however, is far beyond my web-programming grasp – WordPress would treat it as plain text so it may be impossible.}

We’ve been sharing code via RLR by sidestepping the file extension rules and uploading scripts as .txt text files or by compressing files into zip archives or just putting the code itself into posts. Admittedly these were simple solutions, but now it’s even simpler – I just added some of the relevant file extensions to the list that RLR will allow for upload.

I added: “.pl”, “.py”, “.sh”, “.R”, “.r” and “.kml”.

Any file with one of those extensions will upload as plain text, i.e. WordPress will treat it as a text file.

If I’ve omitted something useful let me know.

Please remember that code can simply be copied into the body of a post and that will often be the best way to share it. But, in addition to that presentation, and especially for long scripts, you can now upload the script with its file extension to the RLR media library and put a link to it in your helpful post explaining what it does.

Dan.

Scripts for Formatting SNP Tables

Posted on September 26, 2012 by Greg B.

Some SNP table to useful table conversion scripts are here: FormattingScripts_v0.4

Readme.txt explains usage, makes fasta, bayescan, structure files as well as converting to digits for R.

Let me know if you find any of this useful or broken.

Greg

Edit: updated small fix to structure formatter

CheapEasy DIY Barcodes in R

Posted on July 31, 2012 by Rob

I couldn’t believe how expensive the software was for writing barcodes, so I wrote a short program in R to do it for FREE. And, frankly it should be faster and easier if you already have your labels in an Excel file. You don’t really need to understand the program or even R functions to use it, as long as you know how to run an R program.

Setup and Overview:

[UPDATED (see notes below)] – R-code. Start with this (Note I could not upload a .R file, so this is .txt but still an R program).

Input – barcodes128.csv – You need this file to run the program. Save it in your working directory (see comments in R code for how to set this). AND labels.csv – This is a sample file showing the format for your labels. Even though it’s a .csv, it is a single column with each label as a separate row, so there are no actual commas

Output – BarcodesOut.pdf – A sample output: a pdf file for the 0.5″x1.75″ Worth Poly Label WP0517 (Polyester Label Stock), currently in the lab

That’s really all you need to know, everything that follows is extraneous info. If you have any problems, check out the Detailed Instructions, Troubleshooting Tips, or add a comment below. Continue reading →

Old lab PC – new Ubuntu computer

Posted on June 15, 2012 by Seb

I’ve installed the latest version of Ubuntu (12.04) on the old PC lab computer:

-Username, computer name and password are written on the computer itself, if needed.
-I’ve also installed on it a few of my favorite programs (LibreOffice, Inkscape, Gimp, R, Chrome).
-It boots in about 35 seconds, not bad for an “old piece of junk”!

Feel free to use it!

seb

SnoWhite Tips and Troubleshooting (Thuy)

Posted on May 18, 2012 by Thuy

Snowhite is a tool for cleaning 454 and illumina reads. There are quite a few gotchas that will take you half a day to debug. This wiki has a lot of good tips.

Snowhite invokes other bioinformatics programs, one of them being TagDust. If you get a segfault error from TagDust, it may be because you are searching for contaminant sequences larger than TagDust can handle. TagDust can only handle maximum 1000 characters per line in the contaminant fasta file and maximum 1000 base contaminant sequence lengths.

A segfault (or segmentation fault) happens when a program accesses the wrong piece of memory. After TagDust hits the 1000 line character/sequence base limit, TagDust keeps trying to access memory past the 1000 memory slots it has allocated. It may try to access non-existent memory locations or off-limits memory locations. You need to edit the TagDust source code so it allocates enough memory for the sequences and does not wander into bad memory locations.

Go into your TagDust source code directory and edit file “input.c”.

Go to line 68:

char line[MAX_LINE];

Change MAX_LINE to a number larger than the number of characters in the longest line in your contaminant fasta file. You probably can skip this step if you are using the NCBI UniVec.fasta files, since the default of 1000 is enough.
Go to line 69:

char tmp_seq[MAX_LINE];

Change MAX_LINE to a number larger than the number of bases in the longest contaminant sequence in your contaminant fasta file. I tried 1000000 with a recent NCBI UniVec.fasta file and it worked for me.

Recompile your TagDust source code
- Delete all the existing executables by executing make clean in the same directory as the Makefile
- Compile all your files again by executing make clean in the same directory as the Makefile
- If you decided to allocate a lot of memory to your arrays, and your program requires > 2GB of memory at compile time, you may run into “relocation truncated to fit: R_X86_64_PC32 against symbol” errors during linkage. This occurs when the compiler is unable to allocate enough space for the program’s statically allocated objects. Edit the Makefile so that

CC = gcc
becomes
CC = gcc -mcmodel=medium

Reference: http://www.obihai.org/2010/05/relocation-truncated-to-fit-rx866432s.html

Compiled Sunflower QTLs (GregO)

Posted on May 3, 2012 by Greg Owens

Last year I worked on a project to see if any of the domestication outlier genes were found with previously mapped QTLs. The project ultimately fell flat when new data showed that the outlier I was working on wasn’t an outlier, but I did compile a large table of sunflower QTLs which may be useful. The table has 369 mapped QTLs.

I’ve shared this with a couple of people, but I’m posting it here on a google doc for everyone to use. Here is the link: https://docs.google.com/spreadsheet/ccc?key=0AgfXIvTZMEqPdHdJWTk3UVlVa3dkdGFTak9ySlUtNkE

A couple notes:
-It was compiled about a year ago, so it may be out of date. Also, although I tried to include every applicable study, I may have missed some. If you do find a study that I missed, I encourage you to add it to the table.
-It is only from annuus crosses, and a majority are domestics
-The position values are in cM

Anyway, read and enjoy. Change it if you find errors or new papers!

Global climate and soil data (Kathryn)

Posted on April 27, 2012 by Kathryn

There are a few publicly available data sets that are useful for looking at the abiotic environments of specific locations.

Continue reading →

Jaatha – training data sets (Rose)

Posted on April 27, 2012 by Rose

I’ve generated three training data sets, which will save you around 5 days if you decide to run Jaatha, a molecular demography program. It uses the joint site frequency spectrum of two populations to model various aspects of population history (split time, population size and growth, migration). Here’s the paper: Naduvilezhath et al 2011.

1. Using the default model, with the following maxima: tmax=20, mmax=5, qmax=10.

2. Alternative maxima: tmax=5, mmax=20, qmax=20.

3. Alternative maxima: tmax=5, mmax=20, qmax=5.

They can’t be uploaded because they’re compressed R data structures, but let me know if you’d like to give them a whirl.

Guide to Next-Generation Sequencers (Brook)

Posted on April 25, 2012 by Brook

Occasionally I find myself reading a news item or a paper that mentions a particular sequencing platform and scratching my head to remember of what exactly that particular platform is capable. If you ever find yourself in that same boat, the Molecular Ecologist has a very handy and often-updated guide here.

Illumina Sequencing Adapters (Dan E.)

Posted on March 13, 2012 by Dan E.

I’ve been writing posts about the various steps involved in making WGS sequencing libraries for the Biodiversity Centre’s Illumina HiSeq machine and I was tempted to try to explain what the adapters are for. Then I had the bright idea of seeing if somebody else had already done it. Somebody has. Continue reading →

Illumina Sequencing Adapters and Barcodes (Dan E.)

Posted on March 13, 2012 by Dan E.

As of March 2012 we are using the Bioo Scientific NEXTflex barcoded adapters for WGS sequencing libraries made by ourselves, (well me so far). The set we are currently using comprises 48 barcodes, so we can multiplex up to a 48-plex in one lane on the Illumina HiSeq sequencer.

Bioo Sci. 48 barcoded adapters

Below are the sequences of the Illumina adapters and the 48 barcodes we are currently using. Continue reading →

Using RSEM to estimate gene and isoform expression (Sam)

Posted on March 2, 2012 by Sam

RSEM is a relatively new bioinformatics tool that has been developed in conjunction with Trinity for the analysis of RNAseq data. RSEM can be used to estimate expression levels for both genes and different isoforms of genes, and is quite quick and easy to use, with an excellent google group for help (“RSEM users”). All it requires is an RNAseq dataset (either fasta or fq format) and a reference transcriptome that it can be aligned to.

Continue reading →

GNU Parallel (Chris)

Posted on February 20, 2012 by Chris

GNU Parallel makes it easy to take advantage of multiple processors at once. I really enjoy using it to parallelize Bash scripts. For example, here’s a pipeline that will generate a reference-based transcriptome from Illumina-sequenced ESTs: . . . Continue reading →

Turning your SNP table into a STRUCTURE input file (Brook)

Posted on February 9, 2012 by Brook

I suspect there are probably several homemade versions of this kind of script kicking around, but here is a perl script I’ve written for turning your SNP table into a STRUCTURE input file. To use it, you should change the .txt to a .pl after downloading the script. More on STRUCTURE input files (and so much more!) is in the documentation here.

Continue reading →

If BWA wants *.nt.ann file… (Rose)

Posted on February 3, 2012 by Rose

Recently BWA (an alignment program) suddenly started giving a strange error message, indicating that a reference file ending in *.nt.ann was missing. This file type was unfamiliar to me, with good reason: it’s a colourspace reference file, which shouldn’t be generated when we index the fasta-based references we’re using (at least, I don’t know of anyone in our lab using SOLID data as a reference). DO NOT rebuild the reference with the -c (colourspace) flag, as you might see suggested on the web, because we don’t know what effect that might have on our alignments. DO rebuild it with the usual settings.

Unix cheat sheet (Dan E.)

Posted on February 1, 2012 by Dan E.

Hi guys,

Following up on Nolan’s bioinfo workshop last week, here is a copy of
Alistair’s cheat sheet from the workshop he gave last year. I have
found it quite useful.

cheers,

seb

Image analysis with ImageJ (Kathryn)

Posted on January 31, 2012 by Kathryn

See also this post by Allan (Dan E.)

ImageJ is a simple, free software package available for analyzing images. The download is available here.

Continue reading →

Bioportal (Rose)

Posted on January 30, 2012 by Rose

Bioportal is a free computing resource that provides several applications in our area. I’ve been running STRUCTURE on both the “low priority” and normal queues and it’s been fantastic (unlike Westgrid, who haven’t even responded to my application). For those of you who are struggling to find room on the cluster, it might be useful to you too. Much as I’d like to keep it to myself and exploit the hell out of it, here’s the address:

https://www.bioportal.uio.no/

Rieseberg Lab Resources

RLR: Technical resources for Rieseberglers

Category Archives: Bioinformatics

GBS missing data

Blast2GO

Allowed File Types At RLR – you can now upload scripts with their usual file extensions

Scripts for Formatting SNP Tables

CheapEasy DIY Barcodes in R

Old lab PC – new Ubuntu computer

SnoWhite Tips and Troubleshooting (Thuy)

Compiled Sunflower QTLs (GregO)

Global climate and soil data (Kathryn)

Jaatha – training data sets (Rose)

Guide to Next-Generation Sequencers (Brook)

Illumina Sequencing Adapters (Dan E.)

Illumina Sequencing Adapters and Barcodes (Dan E.)

Using RSEM to estimate gene and isoform expression (Sam)

GNU Parallel (Chris)

Turning your SNP table into a STRUCTURE input file (Brook)

If BWA wants *.nt.ann file… (Rose)

Unix cheat sheet (Dan E.)

Image analysis with ImageJ (Kathryn)

Bioportal (Rose)