Ask me for the most current version if you want to use any of these!
A few perl scripts that take a SNP table and do the following:
1. Remove unwanted samples and rename the samples
2. Remove sites that do not have enough data
3. Order the sites based on a map
For the sake of editing and uploading I will just paste the whole code at the very bottom of this post. Feel free to modify update or bug me about any of these. They took a bit of time to get together so I hope they can be useful. Greg_B
#as above perl 1_remove_unwanted_samples.pl table_all old2new_names table_my_samples #modify-able, removes sites where
The initial table should look like:
contig site ref 14TB-2.Valign 2OTB-7.Valign plastid_noIR|Consensusfrom_HA383_no_IR 1 G NN NN plastid_noIR|Consensusfrom_HA383_no_IR 2 G NN NN plastid_noIR|Consensusfrom_HA383_no_IR 3 C NN NN plastid_noIR|Consensusfrom_HA383_no_IR 4 G NN NN plastid_noIR|Consensusfrom_HA383_no_IR 5 A NN NN plastid_noIR|Consensusfrom_HA383_no_IR 6 A NN NN ...
For 1_remove_unwanted_columns.pl you need a file that contains the names of the samples you want to keep. You can rename them here also, otherwise modify the script or make both columns the same. The first col is the names as they appear in the table the second is the new names.
ALB Alberta ARL Arikara CON2 Colorado HA Maiz Negro HA369 HA369 ...
For ordering you just need use one of these files, the script may need modification depending on how things match up.
#this counting depends on what your ordered contig file looks like, if there are no entries for unmapped contigs than this will actually tell you something. In this case the ordered contigs I posted have no and no for LG and CM for contigs that are not mapped so those contigs will still be counted as mapped. It would tell you if there were some name matching problem.
#print “this many sites were ordered $c_found\nnot found: $not_found\n”;