Pipeline Integrating Genome Assembly, Physical Maps, & Genetic Maps

Everything you need should be checked into BitBucket

BitBucket  Sunflower Genome Repository

BitBucket FPC Repository

Workflow

Integrate Physical and Genetic Map for every genetic map.  This will generate 3 tabular placements files (well placed BACs, poorly placed BACs, unplaced BACs) with these columns. Not all of them will be full:

  • PHYSICAL_CONTIG: physical map contig id
  • BAC:  BAC id
  • TAG: tag id
  • CHROMOSOME:  chromosome
  • CM: cm
  • LO_CM: lower cm range
  • HI_CM: upper cm range
  • LO_LINKAGE_GROUP_BIN: lower genetic map linkage group bin for this GENOME_CONTIG & GENOME_SCAFFOLD
  • HI_LINKAGE_GROUP_BIN: upper genetic map linkage group bin for this GENOME_CONTIG & GENOME_SCAFFOLD
  • GENOME_CONTIG: genome assembly contig used in genetic map and has a perfect blast hit to TAG
  • GENOME_SCAFFOLD: genome assembly contig used in the genetic map  and has a perfect blast hit to TAG
  • TAG_START_POS_IN_SCAFF: 1-based position of tag in GENOME_SCAFFOLD
  • TAG_END_POS_IN_SCAFF: 1 based end position of tag in GENOME_SCAFFOLD
  • IS_SINGLECOPY_TAG: whether the tag occurs multiple times in the genome assembly, or if the tag is covered in the read libraries more than expected
  • IS_BAC_PLACED: 1=BAC confidently placed on CHROMOSOME, 0=poorly placed due to conflicts, “”=not sure
  • IS_BAC_PLACED_BY_TAG_LOCUS: 1=placed using minimum tag hit threshold method, 0=poorly placed via minimum tag hit threshold method, “”=not enough tags to attempt placing via minimum tag hit threshold
  • IS_TAG_PLACEMENT_CONFLICT_IN_BAC: 1=BAC has enough tags to attempt the minimum tag hit threshold method of placing BAC onto locus, and this tag locus conflicts with other tag loci in the BAC
  • IS_BAC_PLACED_BY_UNIQUE_TAGSET: 1  = BAC confidently placed on CHROMOSOME via unique tagsets method, 0 = poorly placed due to conflicts via unique tagset method, “” = no unique tagset for this BAC – scaffold pair
  • IS_UNIQUE_TAGSET: whether this tag is part of the unique tagset for the BAC-GENOME_SCAFFOLD pair
  • IS_LOCUS_CONFLICT_WITH_UNIQUE_TAGSET: 1 = BAC-GENOME_SCAFFOLD share a unique tagset and this tag locus conflicts with majority TAG CHROMOSOME in the unique tagset, 0 = BAC-GENOME_SCAFFOLD share a unique tagset and this TAG CHROMOSOME does not conflict with majority TAG CHROMOSOME in the unique tagset, “” = BAC-scaffold do not share a unique tagset
  • IS_BAC_PLACEMENT_CONFLICT_IN_PHYS_CONTIG: 1 = BAC is confidently placed on a CHROMOSOME by any method but it conflicts with the majority CHROMOSOME of the physical map contig
  • IS_CHIMERIC_PHYS_CONTIG: 1 = physical map contig is chimeric, 0 = not chimeric, “” = no placed BACs so don’t know
  • BAC_START_IN_PHYS_CONTIG: FPC coordinate range start
  • BAC_END_IN_PHYS_CONTIG: FPC coordinate range end
  • TAG_GROUP: FPC coordinate ranges for TAG.  Only gives coordinate ranges that intersect the BAC
  • AVE_CM_FOR_PHYS_CONTIG: average cm for physical map contig
  • PHYS_CONTIG_GROUP: comma delimited list of linked PHYSICAL_CONTIG
  • LO_CM_FOR_PHYS_CONTIG_GROUP: lower cm range for PHYS_CONTIG_GROUP
  • HI_CM_FOR_PHYS_CONTIG_GROUP: higher cm range for PHYS_CONTIG_GROUP
  • IS_REVERSE_COMP_SCAFF: whether scaffold should be reverse complemented
  • GENOME_ASSEMBLY: genome assembly name
  • SCORE: alignment score between GENOME_SCAFFOLD and BAC_GROUP
  • BAC_GROUP: group of overlapping bacs that are aligned to GENOME_SCAFFOLD

See

perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/aggregate_indep_integrations.pl --help

Split the FPC Physical Map File for chimerism.

If you haven’t already, create FPC markers using:

perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/CreateFPCMarker.pl ...

and replace … with the following parameters in order:

  • Fullpath to output marker .fw file
  • Fullpath to output marker .ace file
  • Fullpath to output marker .remarks file
  • Fullpath to input tabular integrated physical map – genetic map placements file
  • Fullpath to input .bands file
  • Whether you only care about single copy tags for placing bacs onto chromosomes

In FPC, load your .fpc file, then insert the markers via File > Replace framework markers > select your .fw file. > Save your fpc

Create a newline separated list of BACs that you think are suspect.  You can do this using awk.

awk -F '\t' '{print $2}' <tabular placements file for BACs with too many conflicting tag loci> | sort | uniq > myBadBacList.txt

Create a copy of the directory housing your .fpc, .cor, /Bands/*.bands file.

Then feed them into the following script, which builds the .FPC from scratch using an iterative build – dq – merge process from strict to loose stringencies, then splits the contigs with chimers.

perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/rebuildChimericFPC.pl ...

where .. is replaced with the following parameters in order:

  • Full path to the revised FPC executable (you must download and compile the new FPC from bitbucket link at top)
  • Full path to .bands file
  • Start stringency cutoff (e.g.  1e-75)
  • End stringency cutoff (e.g.  1e-15)
  • The amount to decrease the stringency in each iteration (eg 1e-10)
  • CpM entries (if there are hits many markers on a bac, reduce stringency by this much).  eg (3=1e-10,4=1e-09,5=1e-08)
  • Fullpath and prefix of fpc file (without the .fpc)
  • fullpath to wellplaced bacs tabular placements file
  • fullpath to poorly placed bacs tabular placements file
  • Fullpath to newline separated list of suspect clones causing chimerism in physical map contigs

Run the Physical Map – Genetic Map Integration again, now that the Physical Map Contig IDs have changed

Feed the tabluar integrated file into the pseudomolecule generation program

See

  • perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/ReferenceSeqGenMapAnchor.pl --help

Which Script Do I Run?

If you want to integrate the physical  and genetic maps, then generate some scaffold to bac associations, use

SunflowerGenome/Combo_physical_genetic_map/combo_physical_windowedGeneticMap.pl

To get usage details, run:

perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/combo_physical_windowedGeneticMap.pl --help

To integrate the physical and genetic map from multiple genetic map and aggregate the placements, use

SunflowerGenome/Combo_physical_genetic_map/aggregate_indep_integrations.pl

To get usage details, run:

perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/aggregate_indep_integrations.pl --help

To  Split FPC files for chimeric contigs, use

SunflowerGenome / PhysicalMap / RemoveChimericBacs.pl

To generate pseudomolecules, use

SunflowerGenome/Combo_physical_genetic_map/ReferenceSeqGenMapAnchor.pl

To get usage details, run:

perl -I ./sunflowergenome/Combo_physical_genetic_map -I ./sunflowergenome/Common -I ./sunflowergenome/gMap2 -I ./sunflowergenome/PhysicalMap ./sunflowergenome/Combo_physical_genetic_map/ReferenceSeqGenMapAnchor.pl --help