Blast2GO

This describes how you can run blast2go on a server using b2gpipe and a local database. This makes blast2go a viable option for annotating large fasta files. Otherwise it is much too slow. The database is currently set up on an AdapTree server. This took a while for me to troubleshoot, so you could run into different problems, but you will hopefully avoid some of the issues I ran into. The b2g Google group is good for troubleshooting. You can find many of these instructions at http://www.blast2go.com/b2glaunch/resources/35-localb2gdb

1. Download blast2go local installation package and blast2go pipe and unzip

wget http://www.blast2go.com/data/blast2go/local_b2g_db_tutorial_0809.zip
wget http://www.blast2go.com/data/blast2go/b2g4pipe_v2.5.zip

2. If you don’t already have MySQL Install a MySQL Database Server:

Download and install a MySQL Database Server for example from
http://dev.mysql.com/downloads or with “sudo apt-get install mysql-server” under for example
Ubuntu (linux).

You can check that it is installed on the server:
> mysql -u <name of user e.g. root> -p
><enter password for root>
mysql>
mysql> exit;

3a. You can run the shell program to set up the database (local_b2g_db_tutorial/download_and_install.sh – this can take a while so set up in byobu for example). Use a text editor to change:

godbname=go_201109-assocdb-data (change to your version)
dbname=b2g
dbuser=root
dbpass=<your root password>
dbhost=localhost
path=<your path to b2g databases>

Alternative to shell program (3b-6b):
3b. Download all necessary database files (e.g. with wget <file_url>):
● http://archive.geneontology.org/latest-full/go_<YYYYMM>-assocdb-data.gz
● ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz
● ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene2accession.gz
● ftp://ftp.pir.georgetown.edu/databases/idmapping/idmapping.tb.gz

4b. Unzip all 4 files (e.g. linux-shell> gzip -d )

5b. Create local database (cd to local_b2g_db_tutorial directory). This can take a while (set up in byobu for example).

mysql -h localhost -p -u root  b2g < b2g_db.sql
mysql -h localhost -p -u root  b2g < go_201109-assocdb-data  (change to your version)

6b. Import the next two files you downloaded from NCBI with:
mysqlimport -u root -p b2g –fields-terminated-by=’\t’  –local <yourpath>/gene2accession
mysqlimport -u root -p b2g –fields-terminated-by=’\t’  –local <yourpath>/gene_info

7. Login to mysql using root to check the database is loading.

mysql -u root -p -A
use b2g;
show tables;

#You should see something like
+——————————-+
| Tables_in_b2g                 |
+——————————-+
| assoc_rel                     |
| association                   |
| association_isoform           |
| association_property          |
| association_qualifier         |
| association_species_qualifier |
| db                            |
| dbxref                        |
| evidence                      |
| evidence_dbxref               |
| gene2accession                |
| gene_info                     |
| gene_product                  |
| gene_product_ancestor         |
| gene_product_count            |
| gene_product_dbxref           |
| gene_product_homology         |
| gene_product_homolset         |
| gene_product_phylotree        |
| gene_product_property         |
| gene_product_seq              |
| gene_product_subset           |
| gene_product_synonym          |
| gi2uniprot                    |
| graph_path                    |
| graph_path2term               |
| homolset                      |
| instance_data                 |
| intersection_of               |
| phylotree                     |
| phylotree_property            |
| relation_composition          |
| relation_properties           |
| seq                           |
| seq_dbxref                    |
| seq_property                  |
| source_audit                  |
| species                       |
| term                          |
| term2term                     |
| term2term_metadata            |
| term_audit                    |
| term_dbxref                   |
| term_definition               |
| term_property                 |
| term_subset                   |
| term_synonym                  |
+——————————-+
47 rows in set (0.01 sec)

mysql> exit;

8. ImportPIR. You must do this even if you used the shell script to set up the database. This can take a while.

java -cp <your path to b2g pipe>/b2g4pipe/blast2go.jar:<your path to b2g pipe>/ext/mysql-connector-java-3.0.11-stable-bin.jar es.blast2go.prog.util.ImportPIR <your path>/idmapping.tb localhost b2g root <your root password>

9. Check that it is loading into the database by:
mysql -u root -p
mysql> use b2g
mysql> select count(gi) from gi2uniprot;

#you should see something like
+———–+
| count(gi) |
+———–+
|  32834398 |
+———–+
1 row in set (0.00 sec)

mysql> exit;

10. Change  b2gPipe.properties file to localhost and b2g database using a text editor.

11. Do blastx against ncbi database (e.g. Chris’s green plant protein database) and get xml output (m7). For example:

blastall -p blastx -d <your formatted protein database> -i <your sequences>  -m7  -e 1e-10 -a 5 -b 20 > <your file.xml>

12. Run mapping and annotation using the local database.

export B2G4PIPEPATH=<your path to b2gpipe>

java -cp $B2G4PIPEPATH/ext/*:$B2G4PIPEPATH/* \
es.blast2go.prog.B2GAnnotPipe \
-in  <your file.xml> \
-out <your outfile prefix> \
-prop $B2G4PIPEPATH/b2gPipe.properties -annot -dat

**note there are java memory limitations. You can either try changing the java max memory using “Xmx” or use this script to break up the xml file into smaller pieces.

wget http://www.blast2go.com/data/blast2go/splitxmlblast_v2.zip

#unpack and run

splitxmlblast <number of sequences/file> <your file.xml>

13. Import the results into the desktop application and run analysis.