Filtering unmapped/unaligned reads from SAM files (Rose)

This is a post about some time-saving help Chris Grassa gave me.

STACKS (post coming soon) doesn’t deal well with all of the unaligned reads in SAM files, so I tried using PICARD to remove them. However, PICARD doesn’t like the SAM output of BWA, but Chris G showed me how to use the Unix command awk to do it much more easily. This is his command for my file 1076.sam:

awk ‘{if($0 ~ /^@/) print $0; else if($5>20) print $0}’ 1076.sam > filt/1076.sam

# literally: if the beginning of the line matches ‘@’, print the line; else if column 5 is greater than 20, print the line
Unfortunately, awk doesn’t play nicely with bash, so instead of a bash script, Chris directed me to make a file filtsam.sh containing many lines like this:

awk ‘{if($0 ~ /^@/) print $0; else if($5>20) print $0}’ 1076.sam > filt/1076.sam
awk ‘{if($0 ~ /^@/) print $0; else if($5>20) print $0}’ 1095.sam > filt/1095.sam
awk ‘{if($0 ~ /^@/) print $0; else if($5>20) print $0}’ 1100.sam > filt/1100.sam
awk ‘{if($0 ~ /^@/) print $0; else if($5>20) print $0}’ 1103.sam > filt/1103.sam
awk ‘{if($0 ~ /^@/) print $0; else if($5>20) print $0}’ 1107.sam > filt/1107.sam

(I did this in my favourite script editor in Windows, Notepad++, and FTP’d it to the Linux machine). Then I made it executable (chmod +x filtsam.sh) and ran it (bash filtsam.sh) from the directory containing the input files.

It worked beautifully. Thanks Chris!

Rieseberg Lab Resources

RLR: Technical resources for Rieseberglers

Filtering unmapped/unaligned reads from SAM files (Rose)