2 enzyme GBS UNEAK trickery

Want to run UNEAK on a bunch of samples that you sequenced using 2 enzyme GBS? You came to the right place!

This takes R1.fastq.gz and R2.fastq.gz files and makes a single file containing both reads with the appropriate barcodes put on read 2 and the CGG RE site changed to the Pst sequence. This lets you use all of your data in UNEAK… well at least the first 64 bases of all your data!

This also deals with the fact that we get data that is from one lane but split into many files. See example LaneInfo file*. You will also need an appropriately formatted UNEAK key file.

*Only list READ1 in this file. It will only work if your files are named /dir/dir/dir/ABC_R1.fastq.gz and /dir/dir/dir/ABC_R2.fastq.gz.

so the file would read:
/dir/dir/dir/ ABC_R1.fastq C5CDUACXX 3

This is also made to work with .gz fastq files.

greg@computer$ perl bin/MultiLane_UneakTricker.pl design/UNEAK_KEY_FILE design/LaneInfo.txt /home/greg/project/UNEAK/Illumina

You must have the GBS_Fastq_BarcodeAdder_2Enzyme.pl* in the same bin folder. You also need to make a “tmp” folder.


protip: UNEAK will be fooled by soft links so you can use them instead of copying your data.

Percent reads aligned collector

When you publish next gen sequencing data you have to include the percent reads aligned. The number is easy to get but when you have 200+ samples it’s a pain to collate them together. This script takes a directory with bam files, uses samtools flagstat to get percent reads aligned and then does a little rejiggering of format to put it in a nice list. To run it, enter the directory with the bams and type ‘bash ./percent_counter.bash’

percent_counter.bash NOTE: The script is gzipped.

Allowed File Types At RLR – you can now upload scripts with their usual file extensions

Hello All,

Many of us have been annoyed by the restricted file types that WordPress allows to be uploaded to RLR. It’s especially annoying because all WordPress is doing when it permits or denies an upload is checking the file extension against a list of allowable extensions. {Even the most malicious code could be uploaded to our blog as long as it had a .txt file extension. Whether that code could then be made to execute, however, is far beyond my web-programming grasp – WordPress would treat it as plain text so it may be impossible.}

We’ve been sharing code via RLR by sidestepping the file extension rules and uploading scripts as .txt text files or by compressing files into zip archives or just putting the code itself into posts. Admittedly these were simple solutions, but now it’s even simpler – I just added some of the relevant file extensions to the list that RLR will allow for upload.

I added: “.pl”, “.py”, “.sh”, “.R”, “.r” and “.kml”.

Any file with one of those extensions will upload as plain text, i.e. WordPress will treat it as a text file.

If I’ve omitted something useful let me know.

Please remember that code can simply be copied into the body of a post and that will often be the best way to share it. But, in addition to that presentation, and especially for long scripts, you can now upload the script with its file extension to the RLR media library and put a link to it in your helpful post explaining what it does.