There is now a second set of barcoded adapters that allows higher multiplexing. They also appear to address the quality issues which have been observed in the second read of GBS runs.
This blog post has 1) Info on how to use the barcodes and where they are and 2) some data that might convince you to use them.
These add a second barcode to the start of the second read before the MSP RE site. The first bases of the second read contain the barcode, just like with the first read. Marco T. designed and ordered these and the info needed to order them is here: https://docs.google.com/spreadsheets/d/1ZXuHKfaR1BYPBX6g0p9GdZHp_21A3z_9pPt_aW0amwM/edit?usp=sharing
I’ve labeled them MTC1-12 and the barcode sequences are as follows.
MTC1 AACT MTC2 CCAG MTC3 TTGA MTC4 GGTCA MTC5 AACAT MTC6 CCACG MTC7 CTTGTA MTC8 TCGTAT MTC9 GGACGT MTC10 AACAGAT MTC11 CTTGTTA MTC12 TCGTAAT
They are used in place of the common adapter in the standard protocol (1ul/sample). One possible use, and simplest to use as an example, would be to use these to run 12 plates in a lane. In this case you would make a master mix for the ligation of each plate which contains a different MTC adapter.
Where are they? In the -20 at the back left corner of the bay on the bottom shelf in a box that has a pink lab tape label that says something to the extent of “barcodes + barcoded adapters 1-12”. This contains the working concentration for each of the MTC adapters. Beside that is a box containing the unannealed and as ordered oligos and the annealed stock. The information regarding what I did and what is in the box is written there. The stock needs an additional 1/20 dilution to get to the working concentration
How it looks
First, the quality of the second read is just about as nice as the first read. Using fastqc to look at 4million reads of some random run:
Now, for the slightly more idiosyncratic part: read counts. In short I dont see any obvious issue with any of these barcodes. I did 5 sets of 5 plates/lane. For all the plates I used the 97-192 bacodes for the Pst side. Then each plate got a differnt MTC barcode for the MSP side. Following the PCR I pooled all of the samples from the plate and quantified. Each plate had a different number of samples which I took into account during the pooling step. Here is the read counts from a randomly selected 4 million reads corrected to number of samples in that plate. Like I said it is a little idiosyncratic but the take home is that they are about as even as you might expect given usual in accuracies in the lab, my hands, and the fact that this is a relatively small sample.
Lane 1 MTC5 14464 MTC1 13518 MTC7 14463 MTC9 13448 MTC3 14232 Lane 2 MTC10 30395 MTC6 11267 MTC2 8263 MTC4 19295 MTC8 14766 Lane 3 MTC5 16631 MTC7 17315 MTC11 11623 MTC9 16256 MTC3 13831 Lane 4 MTC10 11302 MTC6 12120 MTC4 10326 MTC12 18959 MTC8 12832 Lane 5 MTC1 13151 MTC6 13490 MTC2 12851 MTC11 12460 MTC12 17296
Both stacks and uneak are made for single end reads. If you have paired end data here is a little cheat that puts “fake” barcodes onto the mate pairs and prints them all out to one file. It also adds the corresponding fake quality scores.
perl GBS_fastq_RE-multiplexer_v1.pl BarcodeFile R1.fastq R2.fastq Re-barcoded.fastq
BarcodeFile should look like (same as for my demultiplexing script) spaces must be tabs:
# note this could also look like this:
As it does not actually use the names (it just looks at the second column).
Here it is:
I couldn’t believe how expensive the software was for writing barcodes, so I wrote a short program in R to do it for FREE. And, frankly it should be faster and easier if you already have your labels in an Excel file. You don’t really need to understand the program or even R functions to use it, as long as you know how to run an R program.
Setup and Overview:
[UPDATED (see notes below)] – R-code. Start with this (Note I could not upload a .R file, so this is .txt but still an R program).
Input – barcodes128.csv – You need this file to run the program. Save it in your working directory (see comments in R code for how to set this). AND labels.csv – This is a sample file showing the format for your labels. Even though it’s a .csv, it is a single column with each label as a separate row, so there are no actual commas
Output – BarcodesOut.pdf – A sample output: a pdf file for the 0.5″x1.75″ Worth Poly Label WP0517 (Polyester Label Stock), currently in the lab
That’s really all you need to know, everything that follows is extraneous info. If you have any problems, check out the Detailed Instructions, Troubleshooting Tips, or add a comment below. Continue reading
At this google doc spreadsheet you can find a record of curent usage for the 48 barcoded adaptors we have in the lab for WGS. Each adaptor stock is good for a total of 8 uses. The idea is that we should (as much as possible) try to even out the amount we utilize across all 48 alternatives.
Please update the document when you use any of the adaptors.
I’ve been writing posts about the various steps involved in making WGS sequencing libraries for the Biodiversity Centre’s Illumina HiSeq machine and I was tempted to try to explain what the adapters are for. Then I had the bright idea of seeing if somebody else had already done it. Somebody has. Continue reading
As of March 2012 we are using the Bioo Scientific NEXTflex barcoded adapters for WGS sequencing libraries made by ourselves, (well me so far). The set we are currently using comprises 48 barcodes, so we can multiplex up to a 48-plex in one lane on the Illumina HiSeq sequencer.
Below are the sequences of the Illumina adapters and the 48 barcodes we are currently using. Continue reading