Home-brew WGS library multiplexing

There are two main ways to barcode WGS libraries so that they can be run together on a same lane:

– In-line barcodes: unique sequences are located at the very end of one or both adapters. This sequence will be at the very beginning of each read from a given library. This is the barcode system that is normally used fro GBS libraries as well.

– Indices: barcodes are in the middle of one or both adapters. These barcodes are read through an independent round of sequencing. For a paired-end library you would have therefore two rounds of sequencing of your fragment and a third round of sequencing for the index (and I guess a fourth one as well, if you have double indices). This is the system used in most commercial kits.

If you are using the WGS protocol I posted some time ago, you have access to both these options. That protocol uses short adapters that are then extended into full-length adapters during the enrichment step. I designed eight barcoded PE1 adapters and 24 indexed reverse PCR primers (here are the sequences of all WGS libraries oligos). For now all oligos are with me, so please see me if you need them until we find a common space to store them.

The library protocol is basically unchanged. If you want to use in-line barcodes, you just have to use one of the barcoded adapters instead of the regular PE1 adapter. If you prefer indices, use one of the indexed reverse PCR primers during the enrichment step. You can also combine in-line barcodes and indices to pool up to 192 libraries.

Either way, using barcodes means that the beginning of your first sequencing round (with in-line barcode) or your extra sequencing round (with indices) will have reduced diversity. Illumina sequencer use the first 4-6 positions of a sequencing round to calibrate the laser and determine phasing, and it assumes an equal representation of G/T (read by the green laser) vs. A/C (read by the red laser). It is therefore best to use at least four barcodes or indices, in the order in which they are numbered (I designed the in-line barcodes to be base balanced – for the indices, you hav eto thank Illumina). In-line barcodes have the additional problem that they all have to end with a T (which is the over-hang that is used to ligate the adapters to the fragments). To prevent that, they are of different length, between four and seven bp.


Pros and cons of different multiplexing strategies

In-line barcodes: using this protocol, only in-line barcodes allow you to pool the libraries before the enrichments step (saving time and reagents in the following steps). This is especially convenient if you are going to use a depletion protocol on your libraries.

Indices: possibly a bit “cleaner” than in-line barcodes, since eventual unbalances in base representation wouldn’t affect the actual reads. You need to keep libraries separated until after the enrichment step. If you are planning a depletion treatment, you’ll have to wait until after that to barcode and pool your libraries (see notes on primers and adapters in the WGS depletion post).

One possible disadvantage of in-line barcodes is that, since they are part of your actual reads, they eat up a few of your precious bases (so your forward read will be of 93-96bp instead of 100bp). If you sequence your libraries in house and do not use indices, you can ask Ana to use the extra reagents she would need to sequence the indices to add three more cycles to each of the two sequencing runs. We tried that with GBS libraries and it worked quite well.