Is the whole Genome Represented? An ... - ESCMID
-
Upload
khangminh22 -
Category
Documents
-
view
2 -
download
0
Transcript of Is the whole Genome Represented? An ... - ESCMID
Is the whole Genome Represented?
An Investigation to Determine if Plasmids Are Adequately
Represented In The NCTC3000 WGS Project
Sarah Alexander
@NCTC_3000
3
National Collection of Type Cultures
•A unique bacterial strain collection founded in 1920
• Clinical strains – veterinary and medical importance
•Dynamic collection – modern and historical strains
•Type and Reference - ~5200 strains
•Freeze dried, Lenticulated or DNA Format
•Awarded funding - Wellcome Trust Sanger Institute - Sequence 3000 NCTC strains
4
“Generate reference genomes for 3000 bacterial strains within the collection and embed these
genomes in an accessible resource which will enhance the scientific value of the collection”
• Community Resource Project
• High Molecular Weight DNA – extracted from NCTC strains
• DNA quality profile checked on the TapeStation
• DNA sent to the WTSI – PacBio Sequencing
NCTC3000 – Aims & Methods
5
NCTC3000 - Analysis Pipeline
ENA/NCBI
Web/FTP
16S Check
Annotation
Quiver
Circulator
HGAP Assembly
6
NCTC3000 – Data Sharing
• Data regularly uploaded on the WTSI website
• http://www.sanger.ac.uk/resources/downloads/bacteria/nctc/#t_2
@NCTC_3000
7
• Plasmids status of NCTC strains unknown
• Important – plasmids often encode for important phenotypic traits
• Paucity of data on plasmid loss/coverage using PacBio sequencing
• Multiple stages where plasmids can be lost
• Culture and Isolation
• DNA extraction
• Library Preparation
• in silico during assembly
WGS: Is the Whole Genome Represented?
8
WGS: Is the Whole Genome Represented?
• Aim – to determine which NCTC strains may be missing plasmid data from the WGS
• Examined - TapeStation Electropherogram profiles of DNA extracts for evidence of plasmids
• Examined WGS reviewed plasmid number for each strain
• The two datasets were compared for 783 NCTC bacterial strains – 169 different species
10
NCTC13532: WGS Plasmid Output
Species Strain Sample RunsManual Assembly
Chromosome Contig No. Plasmid No.
E. coli NCTC13532 ERS605481 ERR832412 GFF 2 2
11
NCTC13532: TapeStation Electropherogram vs WGS
Species Strain Sample RunsManual Assembly
Chromosome Contig No. Plasmid No.
E. coli NCTC13532 ERS605481 ERR832412 GFF 2 2
Concordant
12
TapeStation verses WGS Data - Plasmids
No. Strains (783) No. Peaks on TapeStation No. Plasmids Identified by WGS
473 0 0
19 ≥1 ≥1
205 0 ≥1
86 ≥1 0
• 60% Strains - showed no evidence of plasmids - by either the TapeStation or WGS
• 2.4% Strains had concordant plasmid data between TapeStation and WGS
13
TapeStation verses WGS Data - Plasmids
No. Strains (783) No. Peaks on TapeStation No. Plasmids Identified by WGS
473 0 0
19 ≥1 ≥1
205 0 ≥1
86 ≥1 0
• 60% Strains - showed no evidence of plasmids - by either the TapeStation or WGS
• 2.4% Strains had concordant plasmid data between TapeStation and WGS
14
TapeStation verses WGS Data - Plasmids
No. Strains (783) No. Peaks on TapeStation No. Plasmids Identified by WGS
473 0 0
19 ≥1 ≥1
205 0 ≥1
86 ≥1 0
• 60% Strains - showed no evidence of plasmids - by either the TapeStation or WGS
• 2.4% Strains had concordant plasmid data between TapeStation and WGS
15
TapeStation verses PacBio - PlasmidsNo. Strains (783) No. Peaks on TapeStation No. Plasmids Identified by WGS
473 0 0
19 ≥1 ≥1
205 0 ≥1
86 ≥1 0
• 26.5% Strains – No TapeStation peaks but had detectable plasmids in the WGS
• The average plasmid size in this group of strains was determine to be 85Kb
TapeStation verses WGS Data - Plasmids
16
No. Strains (783) No. Peaks on TapeStation No. Plasmids Identified by WGS
473 0 0
19 ≥1 ≥1
205 0 ≥1
86 ≥1 0
• 11% Strains – have TapeStation peaks (<10kb) but no detectable plasmids in the WGS
• Likely that in a small minority of bacterial strains – plasmids are not represented in the WGS
• Strains were analysed further
TapeStation verses WGS Data - Plasmids
17
Plasmid Discordant Strains
• 52/86 plasmids discordant strains - were investigated to resolve plasmid status
• Fifty two strains - Plasmid Miniprep extractions performed
• One or more plasmids were recovered from 43 strains (43/52 = 82%)
• Evidence small plasmids – not represented in the WGS
Further Work
• To determine if plasmid loss – library prep or in silico
• To ensure plasmid data is represented in WGS
18
Conclusions
• NCTC3000 - generate reference genomes for the scientific community
• Sequencing 3000 bacterial strains from 86 different families
• Comparison WGS with TapeStation profile - 11% (86/783) of strains missing plasmid data from the
final NCTC3000 dataset
• Plasmids that are smallest in size (average 4.1kb) appear to be at a highest risk of being lost during
the WGS library construction and sequence assembly
• Further work will be performed for all strains within this group to ensure that complete genomic data is
available