Where does all the GBS data go? Pt. 2

An analysis aimed at addressing some questions generated following discussion of a previous post on GBS

Number of fragments produced from a ‘digital digestion’ of the Nov22k22 Sunflower assembly:

Clai: 337,793
EcoRI: 449,770
SacI: 242,163
EcoT22I: 528,709
PstI: 129,993
SalI: 1,210,000
HpaII/MspI: 2,755,916

Here is the size distribution of those fragments (omitting fragments of >5000bp):
All the enzymes
With Msp removed for clarity

Take home message: PstI produces fewer fragments of an appropriate size than other enzymes. It looks like the correlation between total number of fragments and number that are the correct size is probably pretty high.

Now for a double digestion. Pst and Msp fragment sizes and again omitting fragments >5000bp. This looks good. In total fragment numbers (also omitting >5000bp fragments):

Pst+Msp total fragments: 187271
Pst+Msp 500<>300bp: 28930
Pst alone total: 79049
Pst alone 500<>300bp:6815

Take home: Two enzyme digestion could work really well. It may yield more than 4 times more usable fragments. I do think we could aim to get even more sites. Maybe some other RE combination could get us to the 100,000 range. With a couple of million reads per sample this could still yield (in an ideal world) 10x at each site. Send me more enzymes sequences and I can do more of the same.

Edited for clarity and changed the post name

3 thoughts on “Where does all the GBS data go? Pt. 2

  1. Hey Greg, thanks for doing this! When calculating, did you require the fragments from the double digest to have one of each restriction site on either side?

  2. Yes. I think I did the logic correctly…
    For example where x is any base and the RE sites are labeled with the name of the RE.
    Single digest:
    >contigA
    xxxxxPSTxxxxPSTxxxx
    =1 pst fragment 4bp
    Double
    >contigB
    xxxPSTxxxMSPxxxMSPxxxxxxPSTxxx
    =2 fragments, 3bp and 6bp

  3. Hi Greg,
    This is great. Did you write a script that could do this to another genome…say, Eucalyptus? I’m trying curious about whether double enzyme GBS would work better in a small-ish genome too. The PstI results of my collaborators haven’t been great, but they’re pretty unwilling to switch. Now that I’ve got a student of my own about to do some, I’d like to be able to make a strong case for it.
    Thanks!!
    Rose

Comments are closed.