bam file to fastq conversion (Chris)

The GSC supplies our raw sequence reads as bam files. Some programs will take unaligned bam files as input (the bwa is one), but many still do not. A much more flexible format is FASTQ. Here is a link to bam2fastq, a simple little conversion program:

http://www.hudsonalpha.org/gsl/software/bam2fastq.php

Continuing my transcriptome assembly narrative, I would log on to the cluster:

ssh -t redbeard3@zoology.ubc.ca -2 cluster

move to the directory that I put the raw data in:

cd anomalus_assembly/

and try to convert the file:

bam2fastq 70CG7AAXX_2_ACCCAG.bam

Only to find that the program isn’t installed! It’s OK, we can just make a little local installation.

First, lets make a ‘bin’ directory, where we can put all of the project’s software:

mkdir bin

we’ll need to fetch the source code:

cd bin

wget http://www.hudsonalpha.org/gsl/software/bam2fastq-1.1.0.tgz

extract it:

tar -xzvf bam2fastq-1.1.0.tgz

compile it:

cd bam2fastq-1.1.0

make

and clean up the mess:

mv bam2fastq ../

cd ..

rm -r bam2fastq-1.1.0*

Now we can get back to that conversion:

cd ..

bin/bam2fastq -o Ano1495#.fq --no-filtered 70CG7AAXX_2_ACCCAG.bam

Oh man, this is taking a while. I should have used ‘screen’….

Alright, it’s done! Now that I have some handy fastQ files, I’ll remove that bam file to save disk space:

ls
prints:
70CG7AAXX_2_ACCCAG.bam Ano1495_1.fq Ano1495_2.fq bin

rm 70CG7AAXX_2_ACCCAG.bam

2 thoughts on “bam file to fastq conversion (Chris)

  1. Great post, Chris – very helpful. A small note (I know you know this, but many others don’t) – for unix processes that unexpectedly take a long time, if you have run the process in the foreground (without an & at the end of the command) simply type ctrl-z (ctrl and z at the same time), which pauses the process, then type
    bg
    which restarts the process in the background. I believe the process will generally continue even if you logout of sciborg – it has for me, so they must have things set up to not terminate processes on logout. However, on other machines or to make sure that it is not terminated when you log out, you can do:
    disown -h [process id]
    or
    disown -r -h
    for all running jobs.

  2. thanks Chris, this worked very smoothly for me! one note: the documentation for bam2fastq (http://www.hudsonalpha.org/gsl/software/bam2fastq.php) is a bit confusing on what is happening with the QC-failed reads.
    “–filtered
    –no-filtered
    Reads that are marked as failing QC checks will (will not) be extracted. [Default: extract filtered reads] ”
    I originally read this that using the option “–no-filtered” would extract all reads (only a problem if the quality information is then discarded), but in fact this option means that only the reads that passed QC checks are included in the new fastq files.

Comments are closed.