GBS dual-barcode deplexer

There is a version of the GBS barcode protocol that has barcodes on both adapters.  Although scripts existed for demultiplexing dual enzyme GBS (including PyRad and Stacks), it didn’t seem like any of them let you demultiplex for dual barcodes. For this you need you need to determine the sample identity by both barcodes (i.e. either barcode may not be unique) and you need to strip out barcode sequence.

I modified Baute’s original demultiplexing script to do this. It takes your paired end sequence data and a sample barcode file. The sample barcode file looks like this and the columns are tab separated:

Ha1291   ATCAT   TAGAT
Ha1292 ATCGG TAGAT
etc…

The second column is the read 1 barcode, the third column is the read 2 barcode. There is no header.

It is currently set up for PstI (Read1) and MspI (Read2), but it can be modified easily.
As per the previous version, it looks for reads where the insert is small and the read goes into the adapter sequence. It removes bases in that case. If the either read (minus adapter contamination) is less than 50 bp then both reads are removed entirely. It does not allow any mismatches in barcodes or enzyme cut site sequence.

Here is the script: SCRIPT