Merge SNP calls (Greg B.)

Rose coded this up as a faster and efficient way to combine all the snp calls into one table. I’ve made a few modifications, hopefully its not broken. Updates are likely in the future.


Copy and save it. Run

 chmod u+x bin/merge_ann.sh 

to make it exucutable. To run it simply call it

merge_ann.sh

. It will find all the “calls” files inside the folder “snp_calls” and merge them into a file called radML_calls. Warning: this will write over any files in the directory you run it in called tmp, tmp2, tmp3 tmplist, samplelist and a few others.

merge_ann.sh:

#!/bin/bash
#RA
LANG=C
# this part generates the list of positions to which the gentypes will be added
ls snp_calls/ | grep _calls_ | sed s/^/'snp_calls\/'/ > list.calls
rm tmplist*
while read line
do
	awk '{print $1 "\t" $2}' $line >> tmplist
	sort -f -k1,1 tmplist | uniq > tmplist2
	cp tmplist2 tmplist
done < list.calls
cp tmplist merged_calls.list

# not sure about this one
awk '{print $1 "_" $2 "\t" $1 "\t" $2}' merged_calls.list | sort -k1,1 > merged_calls.prelim
while read line
do
	awk '{print $1 "_" $2 "\t" $3}' $line | sort -f -k1,1 > tmp1
	join -i -a 1 -a 1 -e '-' -o '1.1,2.2' merged_calls.prelim tmp1 > tmp2
	join -i -a 1 -a 1 merged_calls.prelim tmp2 > tmp3
	cp tmp3 merged_calls.prelim
done < list.calls

sort -nk2,3 tmp3 > tmp4
sed s/'snp_calls\/'// list.calls > samplelist
sed s/.sam// samplelist > samplelist1
awk '{printf ("%s%s", tab, $1); tab=" "} END {print ""}' samplelist1 | sed s/^/"list contig pos "/ > header
cat header tmp4 | tr ' ' '\t' > radML_calls

Prev SNP calling with ML
Back to Population Genomics.
Next Parse table with Perl