SNP table parsing (Greg B.)

Ask me for the most current version if you want to use any of these!

A few perl scripts that take a SNP table and do the following:
1. Remove unwanted samples and rename the samples
2. Remove sites that do not have enough data
3. Order the sites based on a map


For the sake of editing and uploading I will just paste the whole code at the very bottom of this post. Feel free to modify update or bug me about any of these. They took a bit of time to get together so I hope they can be useful. Greg_B

Usage here:

#as above
perl 1_remove_unwanted_samples.pl table_all old2new_names table_my_samples
#modify-able, removes sites where

The initial table should look like:

contig	site	ref	14TB-2.Valign	2OTB-7.Valign
plastid_noIR|Consensusfrom_HA383_no_IR	1	G	NN	NN
plastid_noIR|Consensusfrom_HA383_no_IR	2	G	NN	NN
plastid_noIR|Consensusfrom_HA383_no_IR	3	C	NN	NN
plastid_noIR|Consensusfrom_HA383_no_IR	4	G	NN	NN
plastid_noIR|Consensusfrom_HA383_no_IR	5	A	NN	NN
plastid_noIR|Consensusfrom_HA383_no_IR	6	A	NN	NN
...

For 1_remove_unwanted_columns.pl you need a file that contains the names of the samples you want to keep. You can rename them here also, otherwise modify the script or make both columns the same. The first col is the names as they appear in the table the second is the new names.

ALB	Alberta
ARL	Arikara
CON2	Colorado
HA	Maiz Negro
HA369	HA369
...

For ordering you just need use one of these files, the script may need modification depending on how things match up.

#this counting depends on what your ordered contig file looks like, if there are no entries for unmapped contigs than this will actually tell you something. In this case the ordered contigs I posted have no and no for LG and CM for contigs that are not mapped so those contigs will still be counted as mapped. It would tell you if there were some name matching problem.
#print “this many sites were ordered $c_found\nnot found: $not_found\n”;

Prev Merge SNPs shell script
Back to Population Genomics.
Next …