Rose coded this up as a faster and efficient way to combine all the snp calls into one table. I’ve made a few modifications, hopefully its not broken. Updates are likely in the future.
Copy and save it. Run
chmod u+x bin/merge_ann.sh
to make it exucutable. To run it simply call it
merge_ann.sh
. It will find all the “calls” files inside the folder “snp_calls” and merge them into a file called radML_calls. Warning: this will write over any files in the directory you run it in called tmp, tmp2, tmp3 tmplist, samplelist and a few others.
merge_ann.sh:
#!/bin/bash
#RA
LANG=C
# this part generates the list of positions to which the gentypes will be added
ls snp_calls/ | grep _calls_ | sed s/^/'snp_calls\/'/ > list.calls
rm tmplist*
while read line
do
awk '{print $1 "\t" $2}' $line >> tmplist
sort -f -k1,1 tmplist | uniq > tmplist2
cp tmplist2 tmplist
done < list.calls
cp tmplist merged_calls.list
# not sure about this one
awk '{print $1 "_" $2 "\t" $1 "\t" $2}' merged_calls.list | sort -k1,1 > merged_calls.prelim
while read line
do
awk '{print $1 "_" $2 "\t" $3}' $line | sort -f -k1,1 > tmp1
join -i -a 1 -a 1 -e '-' -o '1.1,2.2' merged_calls.prelim tmp1 > tmp2
join -i -a 1 -a 1 merged_calls.prelim tmp2 > tmp3
cp tmp3 merged_calls.prelim
done < list.calls
sort -nk2,3 tmp3 > tmp4
sed s/'snp_calls\/'// list.calls > samplelist
sed s/.sam// samplelist > samplelist1
awk '{printf ("%s%s", tab, $1); tab=" "} END {print ""}' samplelist1 | sed s/^/"list contig pos "/ > header
cat header tmp4 | tr ' ' '\t' > radML_calls
Prev SNP calling with ML
Back to Population Genomics.
Next Parse table with Perl