Turning your SNP table into a STRUCTURE input file (Brook)

I suspect there are probably several homemade versions of this kind of script kicking around, but here is a perl script I’ve written for turning your SNP table into a STRUCTURE input file. To use it, you should change the .txt to a .pl after downloading the script. More on STRUCTURE input files (and so much more!) is in the documentation here.

When running this script from a command line, your command should look like this:
perl table2structure.pl input poplist output

Your input SNP table file should be formatted like this (everything tab separated):
#CONTIG SITE REFERENCE arg11B-11 arg14B-7 arg2B-4
plastid_noIR|Consensusfrom_HA383_no_IR 1262 A GG AA AA
plastid_noIR|Consensusfrom_HA383_no_IR 5533 T AA TT TT
plastid_noIR|Consensusfrom_HA383_no_IR 22031 C CT CT CT

The exact format of the header is not important as long as there are three leading columns before your first sample column (if this is different in your SNP table, it should be easy for you or me to modify the script). Your sample names should contain no white space, and the same goes for the genotype calls (e.g. “GG” not “G G” / “arg11B-11” not “arg11B 11”).

You will also need a population ID text file formatted like this (again, tab separated, but no header):
arg11B-11 7
arg2B-4 9
arg14B-7 21

The sample names in the first column should be written exactly as in the header of your SNP table (although they can be in any order). You can use any integer (but no other characters) to identify the population/collection locality for each sample.

The output will be formatted like this:
arg11B-11 7 4 4 1 1 2 3
arg14B-7 21 1 1 2 2 2 3
arg2B-4 9 1 1 2 2 2 3

This particular formatting puts all the genotype information for one individual on a single line (e.g. sample_name popID loc1_allele1 loc1_allele2 loc2_allele1 etc.), so be sure to flag “ONEROWPERIND” when loading the file into STRUCTURE or preparing your mainparams file.

Let me know if you do it differently!