Batch Sequence Read Archive Submissions

Getting all of our data uploaded to the SRA is important. It is good to share our data publicly whenever possible, and just as importantly, it provides us with a free off-site backup of our raw data.

We keep a spreadsheet of the lab’s sequencing data here: http://bit.ly/17Z4X1P
*IMPORTANT*: The spreadsheet is curated by a few members of the lab and is not complete. Your data may not be listed here, if it is not, please do your best to add it. You can email me and Sebastien with any questions you might have.

I’ve added a column that indicates the submission status of each sample in the Sunflower Relatives and Wild Sunflowers tabs, so you can tell if your data has already been submitted.

Submitting Your Samples
First, check to make sure your samples are in the spreadsheet. Again, if they are not, please add them, and whatever info you have about them. You can add new columns if you wish.

*** THIS IS OUTDATED INFO NOW, THE SRA HAS RE-VAMPED THEIR SUBMISSION PROCESS. PLEASE SEE MY LATER POST ON SRA SUBMISSIONS. ***

Then, if/once your data is in the spreadsheet, check the submission status column, if it is unsubmitted, you should fill out a batch sample submission form. Here is an example: http://bit.ly/11zeeGz
The first two columns are mandatory, and the rest are not. You can add as many other columns of metadata as you like. The more info about each sample you include, the better.

Once you have a batch sample submission ready, please send it to me so that I can submit it to the SRA. We can talk about what project your samples are part of, and whether or not we need to create a new SRA BioProject to house your samples.

Once the BioSample submission process is complete, we will have BioSample and BioProject IDs to link your raw data to.

Submitting Your Data
Next, you will need to fill out these four SRA submission spreadsheets:
basic submission info
BioProject and BioSample info *
Experiments **, ***
Runs ***

I’ve added explanations for some of the columns, but if you are having trouble getting any of this info, I will help you out, just send me an email. The more you can fill in with these sheets, the better, but don’t worry about getting everything. I will take care of any columns/spaces that you leave blank. I will also md5sum the files myself, so don’t worry about doing that.

Once you’ve filled in all you can, get in contact with me and I will make sure your data is submitted as soon as you would like. I can arrange to have the files made public immediately, or to have them embargoed for a specific number of weeks or months, or until a specific date of your choice.

* Note that the BioSamples (and if necessary, the BioProject) need to be created before you submit the files. This is why the batch sample submission process must be completed before data submission can begin.
** The SRA’s hierarchy of data: BioProject<-BioSample<-Experiment<-Run<-data files
A Run can have multiple data files, an Experiment can have multiple Runs, a BioSample can have multiple associated Experiments, and a BioProject can have multiple BioSamples.
*** “Experiment” refers to a sequencing experiment. Runs, for the SRA, are meant to be defined on a per-experiment/per-sample basis. A Run should never be associated with multiple Experiments or BioSamples. Two samples sequenced at the same time still belong to unique Runs.