15 years of GDR: New data and functionality in the Genome ...

9
Published online 24 October 2018 Nucleic Acids Research, 2019, Vol. 47, Database issue D1137–D1145 doi: 10.1093/nar/gky1000 15 years of GDR: New data and functionality in the Genome Database for Rosaceae Sook Jung 1 , Taein Lee 1 , Chun-Huai Cheng 1 , Katheryn Buble 1 , Ping Zheng 1 , Jing Yu 1 , Jodi Humann 1 , Stephen P. Ficklin 1 , Ksenija Gasic 2 , Kristin Scott 1 , Morgan Frank 1 , Sushan Ru 3 , Heidi Hough 1 , Kate Evans 4 , Cameron Peace 1 , Mercy Olmstead 5 , Lisa W. DeVetter 6 , James McFerson 4 , Michael Coe 7 , Jill L. Wegrzyn 8 , Margaret E. Staton 9 , Albert G. Abbott 10 and Dorrie Main 1,* 1 Department of Horticulture, Washington State University, Pullman, WA 99164-6414, USA, 2 Department of Plant and Environmental Sciences, Clemson University, Clemson, SC 29634-0310, USA, 3 Department of Agronomy and Plant Genetics, University of Minnesota, St Paul, MN 55108, USA, 4 Department of Horticulture, Washington State University Tree Fruit Research and Extension Center, Wenatchee, WA 98801, USA, 5 Horticultural Sciences Department, University of Florida, Gainesville, FL 32611, USA, 6 Department of Horticulture, Washington State University, Northwestern Washington Research and Extension Center, Mount Vernon, WA98273, USA, 7 Cedar Lake Research Group, LLC, Portland, OR 97293, USA, 8 Department of Ecology and Evolutionary Biology, University of Connecticut, Storrs, CT 06269, USA, 9 Department ofEntomology and Plant Pathology, University of Tennessee, Knoxville, TN 37996, USA and 10 Forest Health Research and Extension Center, University of Kentucky, Lexington, KY 40546-0091, USA Received September 01, 2018; Editorial Decision October 02, 2018; Accepted October 09, 2018 ABSTRACT The Genome Database for Rosaceae (GDR, https: /www.rosaceae.org) is an integrated web-based com- munity database resource providing access to pub- licly available genomics, genetics and breeding data and data-mining tools to facilitate basic, transla- tional and applied research in Rosaceae. The vol- ume of data in GDR has increased greatly over the last 5 years. The GDR now houses multiple versions of whole genome assembly and annota- tion data from 14 species, made available by re- cent advances in sequencing technology. Annotated and searchable reference transcriptomes, RefTrans, combining peer-reviewed published RNA-Seq as well as EST datasets, are newly available for major crop species. Significantly more quantitative trait loci, ge- netic maps and markers are available in MapViewer, a new visualization tool that better integrates with other pages in GDR. Pathways can be accessed through the new GDR Cyc Pathways databases, and synteny among the newest genome assemblies from eight species can be viewed through the new synteny browser, SynView. Collated single-nucleotide poly- morphism diversity data and phenotypic data from publicly available breeding datasets are integrated with other relevant data. Also, the new Breeding In- formation Management System allows breeders to upload, manage and analyze their private breeding data within the secure GDR server with an option to release data publicly. INTRODUCTION The Genome Database for Rosaceae (GDR, https://www. rosaceae.org) is the central repository and data-mining resource for genomics, genetics and breeding data of Rosaceae, an economically and nutritionally important crop family that includes almond, apple, apricot, black- berry, cherry, peach, pear, plum, raspberry, rose and straw- berry. Composed of species with a wide variety of form, habit, function and ploidy, the Rosaceae family also pro- vides a useful system for studying plant biology (1). First established in 2003, GDR initially provided web in- terfaces and analysis tools for emerging genomic and ge- netic data such as genus-specific EST unigene sets, link- age maps and genetic markers (2,3). With the availability of whole genome sequence data and large-scale phenotypic and genotypic data, GDR, as reported in our year 10 up- date, markedly expanded with data and functionality (4). In the past 5 years we have witnessed another big change in the volume and type of data generated by the Rosaceae research * To whom correspondence should be addressed. Tel: +1 509 335 2774; Fax: +1 509 335 8690; Email: [email protected] Published by Oxford University Press on behalf of Nucleic Acids Research 2018. This work is written by (a) US Government employee(s) and is in the public domain in the US.

Transcript of 15 years of GDR: New data and functionality in the Genome ...

Published online 24 October 2018 Nucleic Acids Research 2019 Vol 47 Database issue D1137ndashD1145doi 101093nargky1000

15 years of GDR New data and functionality in theGenome Database for RosaceaeSook Jung1 Taein Lee1 Chun-Huai Cheng1 Katheryn Buble1 Ping Zheng1 Jing Yu1Jodi Humann1 Stephen P Ficklin1 Ksenija Gasic 2 Kristin Scott1 Morgan Frank1Sushan Ru 3 Heidi Hough1 Kate Evans4 Cameron Peace1 Mercy Olmstead5 LisaW DeVetter6 James McFerson4 Michael Coe7 Jill L Wegrzyn8 Margaret E Staton 9Albert G Abbott10 and Dorrie Main1

1Department of Horticulture Washington State University Pullman WA 99164-6414 USA 2Department of Plant andEnvironmental Sciences Clemson University Clemson SC 29634-0310 USA 3Department of Agronomy and PlantGenetics University of Minnesota St Paul MN 55108 USA 4Department of Horticulture Washington StateUniversity Tree Fruit Research and Extension Center Wenatchee WA 98801 USA 5Horticultural SciencesDepartment University of Florida Gainesville FL 32611 USA 6Department of Horticulture Washington StateUniversity Northwestern Washington Research and Extension Center Mount Vernon WA 98273 USA 7Cedar LakeResearch Group LLC Portland OR 97293 USA 8Department of Ecology and Evolutionary Biology University ofConnecticut Storrs CT 06269 USA 9Department of Entomology and Plant Pathology University of TennesseeKnoxville TN 37996 USA and 10Forest Health Research and Extension Center University of Kentucky LexingtonKY 40546-0091 USA

Received September 01 2018 Editorial Decision October 02 2018 Accepted October 09 2018

ABSTRACT

The Genome Database for Rosaceae (GDR httpswwwrosaceaeorg) is an integrated web-based com-munity database resource providing access to pub-licly available genomics genetics and breeding dataand data-mining tools to facilitate basic transla-tional and applied research in Rosaceae The vol-ume of data in GDR has increased greatly overthe last 5 years The GDR now houses multipleversions of whole genome assembly and annota-tion data from 14 species made available by re-cent advances in sequencing technology Annotatedand searchable reference transcriptomes RefTranscombining peer-reviewed published RNA-Seq as wellas EST datasets are newly available for major cropspecies Significantly more quantitative trait loci ge-netic maps and markers are available in MapViewera new visualization tool that better integrates withother pages in GDR Pathways can be accessedthrough the new GDR Cyc Pathways databases andsynteny among the newest genome assemblies fromeight species can be viewed through the new syntenybrowser SynView Collated single-nucleotide poly-morphism diversity data and phenotypic data from

publicly available breeding datasets are integratedwith other relevant data Also the new Breeding In-formation Management System allows breeders toupload manage and analyze their private breedingdata within the secure GDR server with an option torelease data publicly

INTRODUCTION

The Genome Database for Rosaceae (GDR httpswwwrosaceaeorg) is the central repository and data-miningresource for genomics genetics and breeding data ofRosaceae an economically and nutritionally importantcrop family that includes almond apple apricot black-berry cherry peach pear plum raspberry rose and straw-berry Composed of species with a wide variety of formhabit function and ploidy the Rosaceae family also pro-vides a useful system for studying plant biology (1)

First established in 2003 GDR initially provided web in-terfaces and analysis tools for emerging genomic and ge-netic data such as genus-specific EST unigene sets link-age maps and genetic markers (23) With the availabilityof whole genome sequence data and large-scale phenotypicand genotypic data GDR as reported in our year 10 up-date markedly expanded with data and functionality (4) Inthe past 5 years we have witnessed another big change in thevolume and type of data generated by the Rosaceae research

To whom correspondence should be addressed Tel +1 509 335 2774 Fax +1 509 335 8690 Email dorriewsuedu

Published by Oxford University Press on behalf of Nucleic Acids Research 2018This work is written by (a) US Government employee(s) and is in the public domain in the US

D1138 Nucleic Acids Research 2019 Vol 47 Database issue

community (Table 1) These data include multiple versionsof whole genome assemblies and annotations for each ma-jor crop species increased RNA-Seq data multiple single-nucleotide polymorphism (SNP) arrays for major crops in-creased publications on quantitative trait loci (QTLs) andgenetic maps especially using SNPs and more breeding pro-grams and projects that utilize SNP genotyping We haveperformed new types of analyses developed the RosaceaeTrait Ontology that is tightly linked to the Plant Trait On-tology (TO) (5) and focused on integration of data acrossdatabases organisms and data types The value-added ef-forts undertaken in data analyses curation and integrationwere combined with development and enhancement of newand existing search interfaces and tools to enable more effi-cient sharing and reuse of the pivotal data generated by thecommunity This report describes the GDR with a focus onnew data and functionalities accumulated and developed inthe past 5 years

DATA AND WEB INTERFACE

The GDR interface has been re-designed to provide easieraccess points to data and tools such as the Major GeneraQuick Start and Tools Quick Start featured on the home-page The Major Genera Quick Start allows users to viewwhich types of data are available for a genus of interestand provides links to access these data Similarly Speciespages under the species navigation menu provide the sameinformation for major species The Tools Quick Start isorganized into genomics genetics and breeding sectionseach section provides links to appropriate pages to accessavailable data and tools New features that can quickly fa-miliarize users to GDR data and functionality include thedynamic data overview page where users can browse thecurrent data types and numbers in GDR and short videotutorials Tutorials are available for site overview speciespages Breeding Information Management System (BIMS)and all of the search pages GDR is implemented in Tri-pal (httpstripalinfo) an open-source resource efficientdatabase platform (67) Below we describe the currentlyavailable data and interfaces with a focus on new features

Genomics data

Whole genome sequence and associated data Currentlyavailable in GDR are multiple versions of whole genomeassemblies and annotations from seven major crops and 14species Fragaria vesca Fragaria times ananassa Fragaria iinu-mae Fragaria nipponica Fragaria nubicola Fragaria orien-talis Malus times domestica Prunus avium Prunus persica Po-tentilla micrantha Pyrus communis Rosa multiflora Rosachinensis and Rubus occidentalis To standardize the namesof genome assemblies and annotations GDR uses andrecommends a naming protocol [Genus] [species] genomev[assembly version]a[annotation-version]

Twenty one genome assemblies are now available in theGDR Four strawberry genome assemblies including thenewest v40 (8) are available for the diploid woodlandstrawberry F vesca which serves as the reference genomefor the Fragaria genus in addition to two genome assem-blies for the octoploid cultivated strawberry Fragaria times

ananassa and four wild diploid Fragaria species (9) Forboth the F vesca genome v11 (10) and v20 (11) additionalannotations are available v11a2 (12) and v20a2 (13) re-spectively The genes from v11a2 have also been alignedto the v20 genome so all three gene annotations are avail-able The draft genome of P micrantha (14) a species thatdoes not develop fleshy fruit but is closely related to Fra-garia is also available For Malus times domestica a de novo as-sembly of a lsquoGolden Deliciousrsquo doubled-haploid apple tree(GDDH13) with 17 pseudomolecules is now available (15)along with three versions of assemblies for the heterozygousapple genome (16) For Prunus in addition to two assem-blies of peach P persica genome v10 (17) and v20 (18) thewhole genome assembly of sweet cherry P avium genomev10 (19) is available For Pyrus the draft genome of the Eu-ropean pear P communis genome v10 (20) is available ForRosa two new assemblies from a double haploid R chinen-sis plant (2122) are available as well as a R multiflora draftgenome v10 (23) For Rubus the chromosome-scale wholegenome assembly of black raspberry R occidentalis genomev11 (24) as well as the draft genome v10 (25) are available

Additional data provided by the GDR team on theseassemblies include computational annotation of predictedgenes with homology to genes of closely related or modelplant species and assignment of InterPro protein domains(26) and GO terms (2728) The GDR team also per-forms synteny analysis to find conserved syntenic regionsamong the newest versions of all publicly available Rosaceaegenomes using MCScanX (29) Some of the Rosaceae wholegenome assembly data are available in NCBI with NCBIrsquosown gene annotation and naming convention To help re-searchers compare two different gene annotation sets GDRperforms BLAT analysis between NCBI annotated genesand genes from the original genome assemblies

Assemblies annotated as above and GDR functional andsynteny annotations can be accessed in their respectivegenome pages gene search pages BLAST server and us-ing graphical viewers such as JBrowse (30) and the TripalSynteny Viewer (httpsgithubcomtripaltripal synview)Each species page provides a summary for the species alongwith a resources sidebar with hyperlinks to various data andtools for the species and has a genome subsection that listsall genome assemblies for the species Individual genomepages provide downloadable files including Generic FileFormat and FASTA formats for an assembly that includeannotated gene predictions homology and positions of re-peats and genetic markers including SNPs Additionallylists of annotated functional terms and Microsoft Excelfiles of protein homologs mapped via BLAST+ (31) areavailable for downloading These files contain hyperlinksto external databases as well as to GDR pages includingJBrowse and gene or marker detail pages when applica-ble The Search Genes and Transcripts page allows users tosearch for specific genes and sequences in the above assem-blies and transcriptome datasets The new search interfaceallows users to conduct various searches by genus speciesdataset genetranscript name genomic location and asso-ciation with computationally inferred functionality such asprotein names GO terms InterPro domains and KEGGpathway terms all in one page This search interface allowsusers to perform a query such as lsquoReturn all genes anno-

Nucleic Acids Research 2019 Vol 47 Database issue D1139

Table 1 Comparison of number of GDR entries between 2008 2013 and 2018 by data type

Number of entries by year

Data type 2008 2013 2018 Details

Genome 0 5 21 Multiple versions of whole genome assemblies andannotations from seven major crops and 14 species

Gene andmRNA

0 236191 genes 528890 genes and 585574mRNAs

Genes and mRNAs from whole genome assembliesand parsed from NCBI nucleotide sequences

Transcript 90337 200467 1065226 RefTrans and1412519 Unigenes

RefTrans for six genus and unigene V5 for six genusand the rosaceae family

Marker 1700 2229311 3285775 Including 127877 for Fragaria 2613842 for Malus470747 for Prunus 1593 for Pyrus 71716 for Rosa and8743 for Rubus

Genetic Map 37 84 313 Including 27 for Fragaria 108 for Malus 168 forPrunus 9 for Pyrus 15 for Rosa and 13 for Rubus

QTL 0 1195 3799 QTL and MTL associated with 392 agronomic traitsMTL 27 52 103Species lt100 516 1967 Data available for 1967 species specific species pages

with hyperlinks to various data and tools for 13 majorspecies

Germplasm 0 8613 14411 Including 1799 for Fragaria 4635 for Malus 7367 forPrunus 465 for Pyrus 145 for Rosa and 42 for Rubus

Phenotype data 0 578 568 878426 878426 measurements include 389191 publiclyavailable data and the rest available for the RosBreedproject team members

Genotype data 0 28 296 10806569 SNP genotype data from 10787946 measurementsusing 12229 markers for 6 species SSR genotype datafrom 18623 measurements using 5044 markers for 10species

Publication 2447 5182 7449 7499 publications on genomic genetic and breedingresearch in Rosaceae

tated with the word lsquoresistancersquo between 10 and 35 Mbon chromosome Pp02 of the peach genome v20rsquo Usingthe search site users can download the results or proceedto the gene details page within the GDR Recently addedfunctionality in the Search Genes and Transcripts page in-cludes lsquocustomized outputrsquo This option allows users to cus-tomize the result table and the downloadable Excel file toinclude various functional annotation results The gene de-tails page has several links in the resources sidebar to displaythe sequence and its motif annotations genome alignmentsand homologies to sequences of other species in the GDRand other databases The alignment details provide links toview a gene in JBrowse The new synteny section lists allthe orthologs and paralogs in other genomes discovered bysynteny analyses and provides hyperlinks to the gene pagesand Synteny Viewer Via the Synteny Viewer page (Figure1A) accessible from the tools navigation menu users canchoose a scaffoldchromosome of one reference genomeand choose multiple other genomes for comparison Theviewer then displays a clickable circular image as well asa table showing all the syntenic blocks between them (Fig-ure 1B) Choosing one block either from the image or thetable leads to a page where all the syntenic genes within theblock are shown with the Expect value (E-value) of theirhomology (Figure 1C) Gene names are linked to a spe-cific gene page (Figure 1D) where orthologsparalogs inother genomes are available among other information onthe genes such as associated function and genomic loca-tion with a link to JBrowse (Figure 1E) Conserved syn-teny data made available in the GDR thus allows usersstarting with one rosaceous genome to explore genes an-chored trait loci and genetic markers within orthologous re-

gions of another rosaceous genome Using JBrowse userscan view all the genomic features aligned to the genomesuch as gene models transcripts repeats SNPs other ge-netic markers and genes from other model plant species Inaddition the multivariant viewer JBrowse plugin has beenimplemented to allow users to view SNPs present amongsequence data from multiple germplasm individuals Forexample RosBREED resequencing data (23 accessions)aligned to the peach v10 genome is available in JBrowsethrough the plugin The GDR has upgraded to the Tri-pal BLAST module (httptripalinfoextensionsmodulestripal-blast-analysis) replacing the old BLAST and batchBLAST tools The new BLAST enables results to link tothe genome scaffolds in JBrowse and to the genetranscriptdetails in the GDR and on the NCBI Predicted genes fromwhole genome sequences were used in the construction ofPlantCyc (metabolic pathway) databases (32) using Path-wayTools (33) Currently four PlantCyc databases are avail-able in the GDR PeachCyc AppleCyc FragariaCyc andRubusCyc

Transcriptome data The GDR team now regularly per-forms an analysis that combines published RNA-Seq anddbEST datasets to create a reference transcriptome (Ref-Trans) for major species or genera and provides puta-tive gene function identified by homology to known pro-teins The RNA-Seq sequences from peer-reviewed pub-lications were downloaded from the NCBI Short ReadArchive (SRA) and subject to quality control using theTrimmomatic (v032 default parameters 34) and customPerl scripts The remaining RNA-Seq reads were assembledde novo with Trinity (v266 35) using default assembly pa-rameters and a minimum coding length of 200 bases Qual-

D1140 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 1 Synteny Viewer in GDR (A) Home page of Synteny Viewer allows users to choose a reference genome and a chromosome and multiple genomesfor comparison Users can also choose a synteny block ID (B) A circular diagram and a table shows the synteny blocks between a chromosome of areference genome and all chromosomes of another genome being compared (C) A bar diagram and a table that shows all the genes in a syntenic blockThe table displays E-value between the matching genes and the gene names have hyperlinks to the gene detail page (D) A gene detail page with resourceside bar and the hyperlink to JBrowse (E) JBrowse around the mRNA of interest with tracks such as gene mRNA SNP and genes parsed from NCBInucleotide database

ity control of the ESTs included vector sequence screen-ing (UniVec CoreftpftpncbinihgovpubUniVec) usingcross match (36) removal of tRNArRNAsnRNA se-quences identified using tblastx (37) and Poly-A tail trim-ming The filtered ESTs were assembled using the CAP3program (P minus90 38) Bowtie (v 233) (39) was applied tomulti-map the RNA-Seq reads and ESTs back to the as-sembled contigs and singlets The contigs and singlets wereclustered into genes using CH-HIT (v464 40) and Corset(v107) (41) with default parameters The longest isoformgreater than 500 nt was selected to represent each Corsetcluster and create the RefTrans sequences The RefTran se-quences are functionally characterized by pairwise compar-ison using the BLASTX algorithm against the Swiss-Protand TrEMBL protein databases Information on the top tenmatches with an E-value of le 1E-06 are recorded and storedin GDR together with the RefTrans sequences InterPro do-mains and Gene Ontology assignments are made using In-terProScan at the EBI through Blast2GO Transcriptomesand their associated annotation are available to downloadin the transcriptome page that can be accessed from eachspecies page to search and download in the Search Genesand Transcripts page to view on the genome in JBrowseand to perform similarity searches in the BLAST serverCurrently available datasets include F vesca GDR RefTransv10 P avium GDR RefTrans v10 P persica GDR Ref-Trans v10 Malus times domestica GDR RefTrans v10 andRubus GDR RefTrans v20 The GDR continues to pro-vide unigene v50 for each genus (Prunus Malus Fragaria

Rosa and Pyrus) composed of singlets and contigs assem-bled from the publicly available Rosaceae ESTs downloadedfrom dbEST at NCBI Unigene v50 for the entire Rosaceaeis composed of the assembled contigs and singlets for thefive genera

GDR also hosts user-provided assembled transcriptssuch as a rose transcriptome which are orthologous openreading frame sequences of roses used to construct the Wa-gRhSNP 68K Axiom SNP array (42) black cherry Prunusserotina unigene v10 from Pennsylvania State Universityand assembled transcript sequences of red raspberry Rubusidaeus (43) In addition the GDR provides links to RNA-Seq and DNA datasets available in the NCBI SRA forRosaceae genus or major species

Genetic maps genetic diversity data markers and trait loci

Genetic maps With continuous effort on curating peer-reviewed published data GDR now contains 313 geneticmaps for Rosaceae species including 168 for Prunus 108for Malus 27 for Fragaria 15 for Rosa 13 for Rubus and9 for Pyrus The data associated with genetic maps in-clude mapped positions of molecular markers QTLs andheritable phenotypic markers as well as mapping popu-lation(s) and publication(s) GDR has a new graphic in-terface MapViewer (httptripalinfoextensionsmodulestripalmap) to view genetic maps MapViewer allows usersto view and compare maps from different populations andspecies facilitating information transfer from well-studiedspecies to less-studied ones These comparisons are espe-

Nucleic Acids Research 2019 Vol 47 Database issue D1141

cially useful in Rosaceae due to well conserved syntenyamong the genomes of Rosaceae genera (144) While thefunctionality of MapViewer is similar to CMap (45) a com-monly used tool in biological databases including the GDRMapViewer is much more integrated with other pages suchas the map marker and QTL pages In addition MapViewerallows users to zoom into specific regions of a linkage groupchoose types of markers to be displayed and change the col-ors of the markers that are displayed

Genetic diversity data The GDR contains DNA polymor-phism data from various published genetic diversity studiesand public breeding projects such as RosBREED (46) Cur-rently data from 24 diversity projects are available thirteenfrom Prunus ten from MalusPyrus and one from Mala-comeles (false serviceberry) The GDR provides separatesearch pages for the SNP and the SSR genotypic data toprovide appropriate query and downloadable result tablesdepending on the data type There are 19 and 5 datasetsavailable for SSR and SNP genotype search pages respec-tively In the SNP genotype search page users can filterresults by dataset name species germplasm name SNPname genomic location andor gene name (Figure 2A)Users can also upload a file with germplasm names Thisfiltering allows users to perform tailored querying such asfinding SNP polymorphisms around a gene of interest in achosen set of germplasm The results table provides SNPname genomic location allele and genotypic data of allof the germplasm chosen in the order of SNP location inthe genome so that users can view the genotype of eachgermplasm along the chromosome (Figure 2B) Users candownload the genotypes for all markers displayed in the re-sults page or the genotypes for only the markers that arepolymorphic within the germplasm set chosen (Figure 2B)

Genetic marker and SNP array data The GDR pro-vides details on more than 3 million genetic markersused in genetic map development genetic diversity studiesgenome wide association studies and SNP array develop-ment Marker annotations include marker aliases sourcegermplasm source description primer sequences poly-merase chain reaction conditions literature references andmap position where available For SNPs the marker de-tails also include SNP array name SNP array ID dbSNPID alleles flanking sequences and probes SNP marker dataavailable from GDR include those from array developmentprojects such as the 9K (47) 20K (48) and 480K (49) arraysfor apple IRSC cherry 6K array (50) IPSC peach 9K array(51) 90K array for cultivated strawberry (52) and 68K arrayfor rose (42) The SNP array data are available to downloadin Microsoft Excel format from the genome pages to view inJBrowse as well as to query in the Marker Search page TheGDR provides a new lsquoSNP Marker Searchrsquo page in additionto the existing lsquoMarker Searchrsquo and lsquoSearch Nearby Mark-ersrsquo pages The search filter feature in the lsquoMarker Searchrsquopage includes marker name marker type the species fromwhich the marker is developed the species to which themarker is mapped trait name and map position in the ge-netic map and genome Filtering by trait name is a new fea-ture that allows users to search for markers that are nearandor within QTLs using the associated trait name The

table in the results page shows marker name alias markertype species genetic map location and genome locationThe downloaded file contains the same information as wellas the citation The lsquoSNP Marker Searchrsquo is designed so thatusers can filter using array information as well as SNP nameand genomic location The results table is also specific forSNP with alleles SNP array information genome locationand flanking sequences In both search pages users can up-load a file of marker names for querying Another search in-terface lsquoSearch Nearby Markersrsquo allows users to find mark-ers near a targeted locus

Trait locus data and Rosaceae Trait Ontology The GDRcontains detailed data on published trait loci QTLs andMTLs (Mendelian Trait Loci) These data include 3799QTLs and 103 MTLs that are associated with 392 horticul-tural traits such as powdery mildew disease resistance fruitskin color ripening time and volatile organic compoundcontent These trait terms constitute the Rosaceae Trait On-tology The Rosaceae Trait Ontology has been built to stan-dardize trait names and abbreviations for all trait data en-tered into the GDR and to connect trait terms to TO (5)with the goal of data integration across databases organ-isms and data types Each of these trait terms is either anexisting TO term or a child term of one One trait term canbelong to multiple Root TO terms New Rosaceae Trait On-tology terms which do not exist in TO have been submit-ted to the Trait Ontology consortium for inclusion In ad-dition to the Rosaceae Trait Ontology QTLs are annotatedin the GDR with aliases curator-assigned QTL labels pub-lished symbols trait names taxa trait descriptions screen-ing methods map positions associated markers statisticalvalues datasets contact information and references Thesearch page for trait loci allows searching by trait locus type(QTL or MTL) species trait category (root TO term) traitname published symbol and GDR-assigned label

Breeding data

Breeding data stored in the GDR include phenotypic geno-typic germplasm and pedigree data from the RosBREEDproject (46) the Washington State University Apple Breed-ing Program (53) and private breeding programs Publiclyavailable breeding data can be queried using the lsquoSearchTrait Evaluationrsquo and lsquoSearch Genotypersquo pages The lsquoSearchTrait Evaluationrsquo page provides two tabs for querying qual-itative or quantitative traits In each tab users can filter thedata by crop dataset name and trait cut-off values of up tothree trait descriptors The result table and downloadablefile has germplasm name species the trait values chosenand the dataset name The germplasm and the dataset namein the results table are linked to the detail page where otherassociated data can be accessed The germplasm page has aresource sidebar for genotypic and phenotypic data as wellas an overview and associated images where the data canbe viewed and downloaded In addition the public breed-ing data can be queried and downloaded using the BIMSBIMS is a new Tripal module (httptripalinfoextensionsmodulestripal-bims) we developed to provide breeders andbreeding project teams with tools to store manage archiveand analyze their private or public breeding data BIMS is

D1142 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 2 SNP Genotype Search Page in GDR (A) Users can search SNP genotype data by dataset name species germplasm name SNP name genomelocation andor gene name Users can also upload a file with a list of germplasm names (B) Search result table that shows SNP name genome locationallele and the genotype data of all the germplasm chosen in the order of SNP location in the genome The red square highlights the options to downloadthe genotype for all the marker displayed in the result page or the genotype data that are polymorphic in the germplasm set chosen

compatible with Field Book (54) an open-source App forandroid phones and tablets that allows researchers to re-place hard-copy field books or spreadsheet files BIMS al-lows breeders to upload data collected with the Field Bookapp directly to their private secure database in the GDRinto a consistent format Having BIMS in the GDR givesbreeders the opportunity to integrate their data with pub-licly available data and easily release some or all of their datapublicly as needed Currently the sweet cherry phenotypicdata from the RosBREED project is available for public ac-cess through BIMS In BIMS an accordion menu on theleft side provides quick access to various functionalities ThelsquoData Importrsquo section provides data templates for users toenter their data and upload the data file themselves (Figure3A) The lsquoSearchrsquo section allows users to search and save thelist of germplasm individuals (lsquoaccessionsrsquo) using any com-bination of properties and trait cut-off values name triallocation cross parent and trait values (Figure 3B) Whena filter is applied to choose accessions the rightmost sec-tion shows the number of accessions belonging to the fil-tered dataset When a trait descriptor is chosen as a filterthe middle section shows a histogram along with the statis-tical values such as maximum minimum mean and stan-dard deviation to show users the distribution of data pointswithin the dataset chosen (Figure 3B) The list of the ac-cession names chosen can be viewed and downloaded ina table with an option of adding more data on the acces-sions such as parents cross number and trait values (Figure3C) The list of accessions can be saved in user accounts andcan be used to retrieve any data associated with the acces-sions The lsquoData Analysisrsquo section allows users to choosetwo datasets using the categories or saved accession listsand compare the trait statistics between the two datasets(Figure 3D) This analysis function allows users to comparevarious traits of two sets of accessions such as progeniesfrom two different crosses

Community resources and data submission

The GDR continues to provide community-based resourcesunder the community navigation menu including pages forthe US RosEXEC RosIGI and conference employmentnotices and mailing lists The US RosEXEC (Rosaceae Ge-nomics Genetics and Breeding Executive Committee) andRosIGI (Rosaceae International Genomics Initiative) serveas communication and coordination focal points for the re-search community with the former also operating as theSteering Committee for the GDR The US RosEXEC andRosIGI pages provide official documents minutes of quar-terly meetings membership lists and subcommittee infor-mation Several mailing lists in addition to the GDR mail-ing list are available to serve the community with informa-tion for specific interests or purposes and the archives canbe viewed through the message board sites

While the GDR team actively curates data from pub-lications we encourage authors to officially archive theirdatasets by submitting their data at the time of manuscriptsubmission The Data Submission page under the Data nav-igation menu provides various data templates for genes ge-netic maps QTL maps as well as genotypic and phenotypicdata In addition the page has a link to a page where the rec-ommended list of files for whole genome data submissionare available

CONCLUSION AND FUTURE DIRECTION

Recent availability of multiple genome assemblies and an-notation data from seven major crops and 14 species in theRosaceae family opened up the opportunity to investigatethe evolution and biological basis of a wide range of plantforms and functions as well as to share knowledge amongmajor rosaceous crops to improve their performance Withthe goal of facilitating these efforts the GDR focused onintegrating the wealth of the new genomic data with tran-scriptomic genetic map genetic marker trait locus pheno-typic and genotypic data To accommodate scientistsrsquo data-

Nucleic Acids Research 2019 Vol 47 Database issue D1143

Figure 3 BIMS in GDR (A) lsquoTemplate Listrsquo subsection in lsquoData Importrsquo section provide downloadable templates for users to enter various breedingdata (B) Searchrsquo section allows users to search and save the list of accessions using any combination of properties and trait cut of values name triallocation cross parent and trait values The middle section shows the statistical information on the filtered dataset for the trait chosen and the right sectionshows the number of accessions filtered so far (C) A page with the search result table Users can add more columns in the table using lsquoColumn optionsrsquoand savedownload the result table (D) lsquoData Analysisrsquo section that allows users to choose two datasets using the categories or saved accession lists andcompare the trait statistics between the two datasets

mining needs that came with these new types and large vol-ume of data we made various new web interfaces availableThe new web interfaces are either new Tripal modules thatwe developed such as MapViewer BIMS Chado LoaderChado Data Display and Chado Search modules (55) orTripal modules that other database teams developed suchas the Synteny Viewer and Tripal BLAST The open-sourcedatabase platform Tripal allows us to meet emerging de-mands for storage querying and display of new data typesmore efficiently and quickly

During the last 5 years we also made advances in curat-ing data with more ontologies and incorporating ontologiesin data query pages to facilitate data-sharing across differ-ent data types species and databases Future effort will in-clude further developing MapViewer to integrate genomedata and genetic maps providing an enhanced querying in-terface expanding analysis capabilities in BIMS throughaccess to additional functionality integrating Galaxy anal-ysis platform through Tripal Galaxy module (httpswwwdrupalorgprojecttripal galaxy) providing access toHigh Performance Computing and enabling cross-Tripaldatabases querying capability using Tripal Elasticsearch(56) and Tripal Exchange (httptripalinfotutorialsv3xweb-services)

Use of the GDR the community resource for Rosaceaegenomics genetics and breeding research has continuedto grow over the last 5 years Between September 1 2013and August 31 2018 the GDR had 256075 visits by 95259unique visitors from 190 countries who accessed 1547877pages The GDR is part of AgBioData (57) (httpswwwagbiodataorg) a consortium working to improve stan-

dards and sustainability of genomics genetics and breedingdatabases and further enable agricultural science

ACKNOWLEDGEMENTS

The authors acknowledge with thanks their fundingsources the Rosaceae research community for providingdata support and feedback the Tripal and GMOD com-munity of developers for developing and sharing Tripalmodules and code the AgBioData Consortium and USLand Grant Universities for support

FUNDING

USDA National Institute of Food and Agriculture Spe-cialty Crop Research Initiative projects [2014-51181-2237 2014-51181-22378] USDA National Institute ofFood and Agriculture National Research Support Project10 NSF PGRP award 444573 NSF CIF21 DIBBs award1443040 Washington Tree Fruit Research CommissionWashington State University Clemson University Fundingfor open access charge Federal Grant USDA NIFA SCRIGrantConflict of interest statement None declared

REFERENCES1 ShulaevV KorbanSS SosinskiB AbbottAG AldwinckleHS

FoltaKM IezzoniA MainD ArusP DandekarAM et al(2008) Multiple models for rosaceae genomics Plant Physiol 147985ndash1003

D1144 Nucleic Acids Research 2019 Vol 47 Database issue

2 JungS JesuduraiC StatonM DuZ FicklinS ChoIAbbottA TomkinsJ and MainD (2004) GDR (Genome Databasefor Rosaceae) integrated web resources for Rosaceae genomics andgenetics research BMC Bioinformatics 5 130

3 JungS StatonM LeeT BlendaA SvancaraR AbbottA andMainD (2008) GDR (Genome Database for Rosaceae) integratedweb-database for Rosaceae genomics and genetics data Nucleic AcidsRes 36 D1034ndashD1040

4 JungS FicklinSP LeeT ChengC-H BlendaA ZhengP YuJBombarelyA ChoI RuS et al (2014) The Genome Database forRosaceae (GDR) year 10 update Nucleic Acids Res 42D1237ndashD1244

5 CooperL MeierA LaporteM-A ElserJL MungallCSinnBT CavaliereD CarbonS DunnNA SmithB et al (2018)The Planteome database an integrated resource for referenceontologies plant genomics and phenomics Nucleic Acids Res 46D1168ndashD1180

6 SandersonL-A FicklinSP ChengC-H JungS FeltusFABettKE and MainD (2013) Tripal v11 a standards-based toolkitfor construction of online genetic and genomic databases Database(Oxford) 2013 bat075

7 FicklinSP SandersonL-A ChengC-H StatonME LeeTChoI-H JungS BettKE and MainD (2011) Tripal aconstruction toolkit for online genome databases Database(Oxford) 2011 bar044

8 EdgerPP VanBurenR ColleM PoortenTJ WaiCMNiederhuthCE AlgerEI OuS AcharyaCB WangJ et al(2018) Single-molecule sequencing and optical mapping yields animproved genome of woodland strawberry (Fragaria vesca) withchromosome-scale contiguity Gigascience 7 1ndash7

9 HirakawaH ShirasawaK KosugiS TashiroK NakayamaSYamadaM KoharaM WatanabeA KishidaY FujishiroT et al(2014) Dissection of the octoploid strawberry genome by deepsequencing of the genomes of fragaria species DNA Res 21169ndash181

10 ShulaevV SargentDJ CrowhurstRN MocklerTC FolkertsODelcherAL JaiswalP MockaitisK ListonA ManeSP et al(2010) The genome of woodland strawberry (Fragaria vesca) NatGenet 43 109ndash116

11 TennessenJA GovindarajuluR AshmanT-L and ListonA(2014) Evolutionary origins and dynamics of octoploid strawberrysubgenomes revealed by dense targeted capture linkage mapsGenome Biol Evol 6 3295ndash3313

12 DarwishO ShahanR LiuZ SlovinJP and AlkharoufNW(2015) Re-annotation of the woodland strawberry (Fragaria vesca)genome BMC Genomics 16 29

13 LiY WeiW FengJ LuoH PiM LiuZ and KangC (2018)Genome re-annotation of the wild strawberry Fragaria vesca usingextensive Illumina- and SMRT-based RNA-seq datasets DNA Res25 61ndash70

14 ButiM MorettoM BarghiniE MascagniF NataliL BrilliMLomsadzeA SonegoP GiongoL AlongeM et al (2018) Thegenome sequence and transcriptome of Potentilla micrantha andtheir comparison to Fragaria vesca (the woodland strawberry)Gigascience 7 1ndash14

15 DaccordN CeltonJ-M LinsmithG BeckerC ChoisneNSchijlenE van de GeestH BiancoL MichelettiD VelascoRet al (2017) High-quality de novo assembly of the apple genome andmethylome dynamics of early fruit development Nat Genet 491099ndash1106

16 VelascoR ZharkikhA AffourtitJ DhingraA CestaroAKalyanaramanA FontanaP BhatnagarSK TroggioM PrussDet al (2010) The genome of the domesticated apple (Malus timesdomestica Borkh) Nat Genet 42 833ndash839

17 The International Peach Genome Initiative VerdeI AbbottAGScalabrinS JungS ShuS MarroniF ZhebentyayevaTDettoriMT GrimwoodJ et al (2013) The high-quality draftgenome of peach (Prunus persica) identifies unique patterns ofgenetic diversity domestication and genome evolution Nat Genet45 487ndash494

18 VerdeI JenkinsJ DondiniL MicaliS PagliaraniGVendraminE ParisR AraminiV GazzaL RossiniL et al(2017) The Peach v20 release high-resolution linkage mapping and

deep resequencing improve chromosome-scale assembly andcontiguity BMC Genomics 18 225

19 ShirasawaK IsuzugawaK IkenagaM SaitoY YamamotoTHirakawaH and IsobeS (2017) The genome sequence of sweetcherry (Prunus avium) for use in genomics-assisted breeding DNARes 24 499ndash508

20 ChagneD CrowhurstRN PindoM ThrimawithanaA DengCIrelandH FiersM DzierzonH CestaroA FontanaP et al(2014) The draft genome sequence of European pear (Pyruscommunis L rsquoBartlettrsquo) PLoS One 9 e92644

21 HibrandL RuttinkT HamamaL KirovI LakhwaniDZhouN-N BourkeP DaccordN LeusL SchulzD et al (2018)A high-quality sequence of Rosa chinensis to elucidate genomestructure and ornamental traits Nat Plants 4 473ndash484

22 RaymondO GouzyJ JustJ BadouinH VerdenaudMLemainqueA VergneP MojaS ChoisneN PontC et al (2018)The Rosa genome provides new insights into the domestication ofmodern roses Nat Genet 50 772ndash777

23 NakamuraN HirakawaH SatoS OtagakiS MatsumotoSTabataS and TanakaY (2018) Genome structure of Rosa multifloraa wild ancestor of cultivated roses DNA Res 25 113ndash121

24 JibranR DzierzonH BassilN BushakraJM EdgerPPSullivanS FinnCE DossettM ViningKJ VanBurenR et al(2018) Chromosome-scale scaffolding of the black raspberry (Rubusoccidentalis L) genome based on chromatin interaction data HorticRes 5 8

25 VanBurenR BryantD BushakraJM ViningKJ EdgerPPRowleyER PriestHD MichaelTP LyonsE FilichkinSAet al (2016) The genome of black raspberry (Rubus occidentalis)Plant J 87 535ndash547

26 FinnRD AttwoodTK BabbittPC BatemanA BorkPBridgeAJ ChangH-Y DosztanyiZ El-GebaliS FraserM et al(2017) InterPro in 2017ndashndashbeyond protein family and domainannotations Nucleic Acids Res 45 D190ndashD199

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene Ontology tool for the unification of biology NatGenet 25 25ndash29

28 The Gene Ontology Consortium (2017) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338

29 WangY TangH DeBarryJD TanX LiJ WangX LeeT -hJinH MarlerB GuoH et al (2012) MCScanX a toolkit fordetection and evolutionary analysis of gene synteny and collinearityNucleic Acids Res 40 e49

30 BuelsR YaoE DieshCM HayesRD Munoz-TorresMHeltG GoodsteinDM ElsikCG LewisSE SteinL et al(2016) JBrowse a dynamic web platform for genome visualizationand analysis Genome Biol 17 66

31 CamachoC CoulourisG AvagyanV MaN PapadopoulosJBealerK and MaddenTL (2009) BLAST+ architecture andapplications BMC Bioinformatics 10 421

32 SchlapferP ZhangP WangC KimT BanfM ChaeLDreherK ChavaliAK Nilo-PoyancoR BernardT et al (2017)Genome-wide prediction of metabolic enzymes pathways and geneclusters in plants Plant Physiol 173 2041ndash2059

33 CaspiR DreherK and KarpPD (2013) The challenge ofconstructing classifying and representing metabolic pathwaysFEMS Microbiol Lett 345 85ndash93

34 BolgerAM LohseM and UsadelB (2014) Trimmomatic a flexibletrimmer for Illumina sequence data Bioinformatics 30 2114ndash2120

35 GrabherrMG HaasBJ YassourM LevinJZ ThompsonDAAmitI AdiconisX FanL RaychowdhuryR ZengQ et al(2011) Full-length transcriptome assembly from RNA-Seq datawithout a reference genome Nat Biotechnol 29 644ndash652

36 GordonD AbajianC and GreenP (1998) Consed a graphical toolfor sequence finishing Genome Res 8 195ndash202

37 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

38 HuangX and MadanA (1999) CAP3 A DNA sequence assemblyprogram Genome Res 9 868ndash877

39 LangmeadB TrapnellC PopM and SalzbergSL (2009) Ultrafastand memory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10 R25

Nucleic Acids Research 2019 Vol 47 Database issue D1145

40 FuL NiuB ZhuZ WuS and LiW (2012) CD-HIT acceleratedfor clustering the next-generation sequencing data Bioinformatics28 3150ndash3152

41 DavidsonNM and OshlackA (2014) Corset enabling differentialgene expression analysis for de novo assembled transcriptomesGenome Biol 15 410

42 Koning-BoucoiranCFS EsselinkGD VukosavljevM van rsquotWestendeWPC GitongaVW KrensFA VoorripsRE van deWegWE SchulzD DebenerT et al (2015) Using RNA-Seq toassemble a rose transcriptome with more than 13000 full-lengthexpressed genes and to develop the WagRhSNP 68k Axiom SNParray for rose (Rosa L) Front Plant Sci 6 249

43 FuentesL MonsalveL Morales-QuintanaL ValdenegroMMartınezJ-P DefilippiBG and Gonzalez-AgueroM (2015)Differential expression of ethylene biosynthesis genes in drupelets andreceptacle of raspberry (Rubus idaeus) J Plant Physiol 179100ndash105

44 JungS CestaroA TroggioM MainD ZhengP ChoIFoltaKM SosinskiB AbbottA CeltonJ-M et al (2012) Wholegenome comparisons of Fragaria Prunus and Malus reveal differentmodes of evolution between Rosaceous subfamilies BMC Genomics13 129

45 Youens-ClarkK FagaB YapIV SteinL and WareD (2009)CMap 101 a comparative mapping application for the InternetBioinformatics 25 3040ndash3042

46 IezzoniA PeaceC MainD BassilN CoeM FinnC GasicKLubyJ HokansonS McFersonJ et al (2017) RosBREED2progress and future plans to enable DNA-informed breeding in theRosaceae Acta Hortic doi1017660ActaHortic201711722

47 ChagneD CrowhurstRN TroggioM DaveyMW GilmoreBLawleyC VanderzandeS HellensRP KumarS CestaroA et al(2012) Genome-wide SNP detection validation and development ofan 8K SNP array for apple PLoS One 7 e31745

48 BiancoL CestaroA SargentDJ BanchiE DerdakS DiGuardoM SalviS JansenJ ViolaR GutI et al (2014)Development and validation of a 20K single nucleotidepolymorphism (SNP) whole genome genotyping array for apple(Malus times domestica Borkh) PLoS One 9 e110377

49 BiancoL CestaroA LinsmithG MurantyH DenanceCTheronA PoncetC MichelettiD KerschbamerE Di PierroEAet al (2016) Development and validation of the Axiom(reg)Apple480K SNP genotyping array Plant J 86 62ndash74

50 PeaceC BassilN MainD FicklinS RosyaraUR StegmeirTSeboltA GilmoreB LawleyC MocklerTC et al (2012)Development and evaluation of a genome-wide 6K SNP array fordiploid sweet cherry and tetraploid sour cherry PLoS One 7 e48305

51 VerdeI BassilN ScalabrinS GilmoreB LawleyCT GasicKMichelettiD RosyaraUR CattonaroF VendraminE et al(2012) Development and evaluation of a 9K SNP array for peach byinternationally coordinated SNP detection and validation in breedinggermplasm PLoS One 7 e35668

52 BassilNV DavisTM ZhangH FicklinS MittmannMWebsterT MahoneyL WoodD AlperinES RosyaraUR et al(2015) Development and preliminary evaluation of a 90 K AxiomregSNP array for the allo-octoploid cultivated strawberry Fragaria timesananassa BMC Genomics 16 155

53 EvansK (2013) Apple Breeding in Pacific Northwest Acta Hortic976 75ndash78

54 RifeTW and PolandJA (2014) Field Book an open-sourceapplication for field data collection on android Crop Sci 541624ndash1627

55 JungS LeeT ChengC-H FicklinS YuJ HumannJ andMainD (2017) Extension modules for storage visualization andquerying of genomic genetic and breeding data in Tripal databasesDatabase (Oxford) 2017 bax092

56 ChenM HenryN AlmsaeedA ZhouX WegrzynJ FicklinSand StatonM (2017) New extension software modules to enhancesearching and display of transcriptome data in Tripal databasesDatabase (Oxford) 2017 bax052

57 HarperL CampbellJ CannonEKS JungS PoelchauMWallsR AndorfC ArnaudE BerardiniTZ BirkettC et al(2018) AgBioData consortium recommendations for sustainablegenomics and genetics databases for agriculture Database (Oxford)2018 bay088

D1138 Nucleic Acids Research 2019 Vol 47 Database issue

community (Table 1) These data include multiple versionsof whole genome assemblies and annotations for each ma-jor crop species increased RNA-Seq data multiple single-nucleotide polymorphism (SNP) arrays for major crops in-creased publications on quantitative trait loci (QTLs) andgenetic maps especially using SNPs and more breeding pro-grams and projects that utilize SNP genotyping We haveperformed new types of analyses developed the RosaceaeTrait Ontology that is tightly linked to the Plant Trait On-tology (TO) (5) and focused on integration of data acrossdatabases organisms and data types The value-added ef-forts undertaken in data analyses curation and integrationwere combined with development and enhancement of newand existing search interfaces and tools to enable more effi-cient sharing and reuse of the pivotal data generated by thecommunity This report describes the GDR with a focus onnew data and functionalities accumulated and developed inthe past 5 years

DATA AND WEB INTERFACE

The GDR interface has been re-designed to provide easieraccess points to data and tools such as the Major GeneraQuick Start and Tools Quick Start featured on the home-page The Major Genera Quick Start allows users to viewwhich types of data are available for a genus of interestand provides links to access these data Similarly Speciespages under the species navigation menu provide the sameinformation for major species The Tools Quick Start isorganized into genomics genetics and breeding sectionseach section provides links to appropriate pages to accessavailable data and tools New features that can quickly fa-miliarize users to GDR data and functionality include thedynamic data overview page where users can browse thecurrent data types and numbers in GDR and short videotutorials Tutorials are available for site overview speciespages Breeding Information Management System (BIMS)and all of the search pages GDR is implemented in Tri-pal (httpstripalinfo) an open-source resource efficientdatabase platform (67) Below we describe the currentlyavailable data and interfaces with a focus on new features

Genomics data

Whole genome sequence and associated data Currentlyavailable in GDR are multiple versions of whole genomeassemblies and annotations from seven major crops and 14species Fragaria vesca Fragaria times ananassa Fragaria iinu-mae Fragaria nipponica Fragaria nubicola Fragaria orien-talis Malus times domestica Prunus avium Prunus persica Po-tentilla micrantha Pyrus communis Rosa multiflora Rosachinensis and Rubus occidentalis To standardize the namesof genome assemblies and annotations GDR uses andrecommends a naming protocol [Genus] [species] genomev[assembly version]a[annotation-version]

Twenty one genome assemblies are now available in theGDR Four strawberry genome assemblies including thenewest v40 (8) are available for the diploid woodlandstrawberry F vesca which serves as the reference genomefor the Fragaria genus in addition to two genome assem-blies for the octoploid cultivated strawberry Fragaria times

ananassa and four wild diploid Fragaria species (9) Forboth the F vesca genome v11 (10) and v20 (11) additionalannotations are available v11a2 (12) and v20a2 (13) re-spectively The genes from v11a2 have also been alignedto the v20 genome so all three gene annotations are avail-able The draft genome of P micrantha (14) a species thatdoes not develop fleshy fruit but is closely related to Fra-garia is also available For Malus times domestica a de novo as-sembly of a lsquoGolden Deliciousrsquo doubled-haploid apple tree(GDDH13) with 17 pseudomolecules is now available (15)along with three versions of assemblies for the heterozygousapple genome (16) For Prunus in addition to two assem-blies of peach P persica genome v10 (17) and v20 (18) thewhole genome assembly of sweet cherry P avium genomev10 (19) is available For Pyrus the draft genome of the Eu-ropean pear P communis genome v10 (20) is available ForRosa two new assemblies from a double haploid R chinen-sis plant (2122) are available as well as a R multiflora draftgenome v10 (23) For Rubus the chromosome-scale wholegenome assembly of black raspberry R occidentalis genomev11 (24) as well as the draft genome v10 (25) are available

Additional data provided by the GDR team on theseassemblies include computational annotation of predictedgenes with homology to genes of closely related or modelplant species and assignment of InterPro protein domains(26) and GO terms (2728) The GDR team also per-forms synteny analysis to find conserved syntenic regionsamong the newest versions of all publicly available Rosaceaegenomes using MCScanX (29) Some of the Rosaceae wholegenome assembly data are available in NCBI with NCBIrsquosown gene annotation and naming convention To help re-searchers compare two different gene annotation sets GDRperforms BLAT analysis between NCBI annotated genesand genes from the original genome assemblies

Assemblies annotated as above and GDR functional andsynteny annotations can be accessed in their respectivegenome pages gene search pages BLAST server and us-ing graphical viewers such as JBrowse (30) and the TripalSynteny Viewer (httpsgithubcomtripaltripal synview)Each species page provides a summary for the species alongwith a resources sidebar with hyperlinks to various data andtools for the species and has a genome subsection that listsall genome assemblies for the species Individual genomepages provide downloadable files including Generic FileFormat and FASTA formats for an assembly that includeannotated gene predictions homology and positions of re-peats and genetic markers including SNPs Additionallylists of annotated functional terms and Microsoft Excelfiles of protein homologs mapped via BLAST+ (31) areavailable for downloading These files contain hyperlinksto external databases as well as to GDR pages includingJBrowse and gene or marker detail pages when applica-ble The Search Genes and Transcripts page allows users tosearch for specific genes and sequences in the above assem-blies and transcriptome datasets The new search interfaceallows users to conduct various searches by genus speciesdataset genetranscript name genomic location and asso-ciation with computationally inferred functionality such asprotein names GO terms InterPro domains and KEGGpathway terms all in one page This search interface allowsusers to perform a query such as lsquoReturn all genes anno-

Nucleic Acids Research 2019 Vol 47 Database issue D1139

Table 1 Comparison of number of GDR entries between 2008 2013 and 2018 by data type

Number of entries by year

Data type 2008 2013 2018 Details

Genome 0 5 21 Multiple versions of whole genome assemblies andannotations from seven major crops and 14 species

Gene andmRNA

0 236191 genes 528890 genes and 585574mRNAs

Genes and mRNAs from whole genome assembliesand parsed from NCBI nucleotide sequences

Transcript 90337 200467 1065226 RefTrans and1412519 Unigenes

RefTrans for six genus and unigene V5 for six genusand the rosaceae family

Marker 1700 2229311 3285775 Including 127877 for Fragaria 2613842 for Malus470747 for Prunus 1593 for Pyrus 71716 for Rosa and8743 for Rubus

Genetic Map 37 84 313 Including 27 for Fragaria 108 for Malus 168 forPrunus 9 for Pyrus 15 for Rosa and 13 for Rubus

QTL 0 1195 3799 QTL and MTL associated with 392 agronomic traitsMTL 27 52 103Species lt100 516 1967 Data available for 1967 species specific species pages

with hyperlinks to various data and tools for 13 majorspecies

Germplasm 0 8613 14411 Including 1799 for Fragaria 4635 for Malus 7367 forPrunus 465 for Pyrus 145 for Rosa and 42 for Rubus

Phenotype data 0 578 568 878426 878426 measurements include 389191 publiclyavailable data and the rest available for the RosBreedproject team members

Genotype data 0 28 296 10806569 SNP genotype data from 10787946 measurementsusing 12229 markers for 6 species SSR genotype datafrom 18623 measurements using 5044 markers for 10species

Publication 2447 5182 7449 7499 publications on genomic genetic and breedingresearch in Rosaceae

tated with the word lsquoresistancersquo between 10 and 35 Mbon chromosome Pp02 of the peach genome v20rsquo Usingthe search site users can download the results or proceedto the gene details page within the GDR Recently addedfunctionality in the Search Genes and Transcripts page in-cludes lsquocustomized outputrsquo This option allows users to cus-tomize the result table and the downloadable Excel file toinclude various functional annotation results The gene de-tails page has several links in the resources sidebar to displaythe sequence and its motif annotations genome alignmentsand homologies to sequences of other species in the GDRand other databases The alignment details provide links toview a gene in JBrowse The new synteny section lists allthe orthologs and paralogs in other genomes discovered bysynteny analyses and provides hyperlinks to the gene pagesand Synteny Viewer Via the Synteny Viewer page (Figure1A) accessible from the tools navigation menu users canchoose a scaffoldchromosome of one reference genomeand choose multiple other genomes for comparison Theviewer then displays a clickable circular image as well asa table showing all the syntenic blocks between them (Fig-ure 1B) Choosing one block either from the image or thetable leads to a page where all the syntenic genes within theblock are shown with the Expect value (E-value) of theirhomology (Figure 1C) Gene names are linked to a spe-cific gene page (Figure 1D) where orthologsparalogs inother genomes are available among other information onthe genes such as associated function and genomic loca-tion with a link to JBrowse (Figure 1E) Conserved syn-teny data made available in the GDR thus allows usersstarting with one rosaceous genome to explore genes an-chored trait loci and genetic markers within orthologous re-

gions of another rosaceous genome Using JBrowse userscan view all the genomic features aligned to the genomesuch as gene models transcripts repeats SNPs other ge-netic markers and genes from other model plant species Inaddition the multivariant viewer JBrowse plugin has beenimplemented to allow users to view SNPs present amongsequence data from multiple germplasm individuals Forexample RosBREED resequencing data (23 accessions)aligned to the peach v10 genome is available in JBrowsethrough the plugin The GDR has upgraded to the Tri-pal BLAST module (httptripalinfoextensionsmodulestripal-blast-analysis) replacing the old BLAST and batchBLAST tools The new BLAST enables results to link tothe genome scaffolds in JBrowse and to the genetranscriptdetails in the GDR and on the NCBI Predicted genes fromwhole genome sequences were used in the construction ofPlantCyc (metabolic pathway) databases (32) using Path-wayTools (33) Currently four PlantCyc databases are avail-able in the GDR PeachCyc AppleCyc FragariaCyc andRubusCyc

Transcriptome data The GDR team now regularly per-forms an analysis that combines published RNA-Seq anddbEST datasets to create a reference transcriptome (Ref-Trans) for major species or genera and provides puta-tive gene function identified by homology to known pro-teins The RNA-Seq sequences from peer-reviewed pub-lications were downloaded from the NCBI Short ReadArchive (SRA) and subject to quality control using theTrimmomatic (v032 default parameters 34) and customPerl scripts The remaining RNA-Seq reads were assembledde novo with Trinity (v266 35) using default assembly pa-rameters and a minimum coding length of 200 bases Qual-

D1140 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 1 Synteny Viewer in GDR (A) Home page of Synteny Viewer allows users to choose a reference genome and a chromosome and multiple genomesfor comparison Users can also choose a synteny block ID (B) A circular diagram and a table shows the synteny blocks between a chromosome of areference genome and all chromosomes of another genome being compared (C) A bar diagram and a table that shows all the genes in a syntenic blockThe table displays E-value between the matching genes and the gene names have hyperlinks to the gene detail page (D) A gene detail page with resourceside bar and the hyperlink to JBrowse (E) JBrowse around the mRNA of interest with tracks such as gene mRNA SNP and genes parsed from NCBInucleotide database

ity control of the ESTs included vector sequence screen-ing (UniVec CoreftpftpncbinihgovpubUniVec) usingcross match (36) removal of tRNArRNAsnRNA se-quences identified using tblastx (37) and Poly-A tail trim-ming The filtered ESTs were assembled using the CAP3program (P minus90 38) Bowtie (v 233) (39) was applied tomulti-map the RNA-Seq reads and ESTs back to the as-sembled contigs and singlets The contigs and singlets wereclustered into genes using CH-HIT (v464 40) and Corset(v107) (41) with default parameters The longest isoformgreater than 500 nt was selected to represent each Corsetcluster and create the RefTrans sequences The RefTran se-quences are functionally characterized by pairwise compar-ison using the BLASTX algorithm against the Swiss-Protand TrEMBL protein databases Information on the top tenmatches with an E-value of le 1E-06 are recorded and storedin GDR together with the RefTrans sequences InterPro do-mains and Gene Ontology assignments are made using In-terProScan at the EBI through Blast2GO Transcriptomesand their associated annotation are available to downloadin the transcriptome page that can be accessed from eachspecies page to search and download in the Search Genesand Transcripts page to view on the genome in JBrowseand to perform similarity searches in the BLAST serverCurrently available datasets include F vesca GDR RefTransv10 P avium GDR RefTrans v10 P persica GDR Ref-Trans v10 Malus times domestica GDR RefTrans v10 andRubus GDR RefTrans v20 The GDR continues to pro-vide unigene v50 for each genus (Prunus Malus Fragaria

Rosa and Pyrus) composed of singlets and contigs assem-bled from the publicly available Rosaceae ESTs downloadedfrom dbEST at NCBI Unigene v50 for the entire Rosaceaeis composed of the assembled contigs and singlets for thefive genera

GDR also hosts user-provided assembled transcriptssuch as a rose transcriptome which are orthologous openreading frame sequences of roses used to construct the Wa-gRhSNP 68K Axiom SNP array (42) black cherry Prunusserotina unigene v10 from Pennsylvania State Universityand assembled transcript sequences of red raspberry Rubusidaeus (43) In addition the GDR provides links to RNA-Seq and DNA datasets available in the NCBI SRA forRosaceae genus or major species

Genetic maps genetic diversity data markers and trait loci

Genetic maps With continuous effort on curating peer-reviewed published data GDR now contains 313 geneticmaps for Rosaceae species including 168 for Prunus 108for Malus 27 for Fragaria 15 for Rosa 13 for Rubus and9 for Pyrus The data associated with genetic maps in-clude mapped positions of molecular markers QTLs andheritable phenotypic markers as well as mapping popu-lation(s) and publication(s) GDR has a new graphic in-terface MapViewer (httptripalinfoextensionsmodulestripalmap) to view genetic maps MapViewer allows usersto view and compare maps from different populations andspecies facilitating information transfer from well-studiedspecies to less-studied ones These comparisons are espe-

Nucleic Acids Research 2019 Vol 47 Database issue D1141

cially useful in Rosaceae due to well conserved syntenyamong the genomes of Rosaceae genera (144) While thefunctionality of MapViewer is similar to CMap (45) a com-monly used tool in biological databases including the GDRMapViewer is much more integrated with other pages suchas the map marker and QTL pages In addition MapViewerallows users to zoom into specific regions of a linkage groupchoose types of markers to be displayed and change the col-ors of the markers that are displayed

Genetic diversity data The GDR contains DNA polymor-phism data from various published genetic diversity studiesand public breeding projects such as RosBREED (46) Cur-rently data from 24 diversity projects are available thirteenfrom Prunus ten from MalusPyrus and one from Mala-comeles (false serviceberry) The GDR provides separatesearch pages for the SNP and the SSR genotypic data toprovide appropriate query and downloadable result tablesdepending on the data type There are 19 and 5 datasetsavailable for SSR and SNP genotype search pages respec-tively In the SNP genotype search page users can filterresults by dataset name species germplasm name SNPname genomic location andor gene name (Figure 2A)Users can also upload a file with germplasm names Thisfiltering allows users to perform tailored querying such asfinding SNP polymorphisms around a gene of interest in achosen set of germplasm The results table provides SNPname genomic location allele and genotypic data of allof the germplasm chosen in the order of SNP location inthe genome so that users can view the genotype of eachgermplasm along the chromosome (Figure 2B) Users candownload the genotypes for all markers displayed in the re-sults page or the genotypes for only the markers that arepolymorphic within the germplasm set chosen (Figure 2B)

Genetic marker and SNP array data The GDR pro-vides details on more than 3 million genetic markersused in genetic map development genetic diversity studiesgenome wide association studies and SNP array develop-ment Marker annotations include marker aliases sourcegermplasm source description primer sequences poly-merase chain reaction conditions literature references andmap position where available For SNPs the marker de-tails also include SNP array name SNP array ID dbSNPID alleles flanking sequences and probes SNP marker dataavailable from GDR include those from array developmentprojects such as the 9K (47) 20K (48) and 480K (49) arraysfor apple IRSC cherry 6K array (50) IPSC peach 9K array(51) 90K array for cultivated strawberry (52) and 68K arrayfor rose (42) The SNP array data are available to downloadin Microsoft Excel format from the genome pages to view inJBrowse as well as to query in the Marker Search page TheGDR provides a new lsquoSNP Marker Searchrsquo page in additionto the existing lsquoMarker Searchrsquo and lsquoSearch Nearby Mark-ersrsquo pages The search filter feature in the lsquoMarker Searchrsquopage includes marker name marker type the species fromwhich the marker is developed the species to which themarker is mapped trait name and map position in the ge-netic map and genome Filtering by trait name is a new fea-ture that allows users to search for markers that are nearandor within QTLs using the associated trait name The

table in the results page shows marker name alias markertype species genetic map location and genome locationThe downloaded file contains the same information as wellas the citation The lsquoSNP Marker Searchrsquo is designed so thatusers can filter using array information as well as SNP nameand genomic location The results table is also specific forSNP with alleles SNP array information genome locationand flanking sequences In both search pages users can up-load a file of marker names for querying Another search in-terface lsquoSearch Nearby Markersrsquo allows users to find mark-ers near a targeted locus

Trait locus data and Rosaceae Trait Ontology The GDRcontains detailed data on published trait loci QTLs andMTLs (Mendelian Trait Loci) These data include 3799QTLs and 103 MTLs that are associated with 392 horticul-tural traits such as powdery mildew disease resistance fruitskin color ripening time and volatile organic compoundcontent These trait terms constitute the Rosaceae Trait On-tology The Rosaceae Trait Ontology has been built to stan-dardize trait names and abbreviations for all trait data en-tered into the GDR and to connect trait terms to TO (5)with the goal of data integration across databases organ-isms and data types Each of these trait terms is either anexisting TO term or a child term of one One trait term canbelong to multiple Root TO terms New Rosaceae Trait On-tology terms which do not exist in TO have been submit-ted to the Trait Ontology consortium for inclusion In ad-dition to the Rosaceae Trait Ontology QTLs are annotatedin the GDR with aliases curator-assigned QTL labels pub-lished symbols trait names taxa trait descriptions screen-ing methods map positions associated markers statisticalvalues datasets contact information and references Thesearch page for trait loci allows searching by trait locus type(QTL or MTL) species trait category (root TO term) traitname published symbol and GDR-assigned label

Breeding data

Breeding data stored in the GDR include phenotypic geno-typic germplasm and pedigree data from the RosBREEDproject (46) the Washington State University Apple Breed-ing Program (53) and private breeding programs Publiclyavailable breeding data can be queried using the lsquoSearchTrait Evaluationrsquo and lsquoSearch Genotypersquo pages The lsquoSearchTrait Evaluationrsquo page provides two tabs for querying qual-itative or quantitative traits In each tab users can filter thedata by crop dataset name and trait cut-off values of up tothree trait descriptors The result table and downloadablefile has germplasm name species the trait values chosenand the dataset name The germplasm and the dataset namein the results table are linked to the detail page where otherassociated data can be accessed The germplasm page has aresource sidebar for genotypic and phenotypic data as wellas an overview and associated images where the data canbe viewed and downloaded In addition the public breed-ing data can be queried and downloaded using the BIMSBIMS is a new Tripal module (httptripalinfoextensionsmodulestripal-bims) we developed to provide breeders andbreeding project teams with tools to store manage archiveand analyze their private or public breeding data BIMS is

D1142 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 2 SNP Genotype Search Page in GDR (A) Users can search SNP genotype data by dataset name species germplasm name SNP name genomelocation andor gene name Users can also upload a file with a list of germplasm names (B) Search result table that shows SNP name genome locationallele and the genotype data of all the germplasm chosen in the order of SNP location in the genome The red square highlights the options to downloadthe genotype for all the marker displayed in the result page or the genotype data that are polymorphic in the germplasm set chosen

compatible with Field Book (54) an open-source App forandroid phones and tablets that allows researchers to re-place hard-copy field books or spreadsheet files BIMS al-lows breeders to upload data collected with the Field Bookapp directly to their private secure database in the GDRinto a consistent format Having BIMS in the GDR givesbreeders the opportunity to integrate their data with pub-licly available data and easily release some or all of their datapublicly as needed Currently the sweet cherry phenotypicdata from the RosBREED project is available for public ac-cess through BIMS In BIMS an accordion menu on theleft side provides quick access to various functionalities ThelsquoData Importrsquo section provides data templates for users toenter their data and upload the data file themselves (Figure3A) The lsquoSearchrsquo section allows users to search and save thelist of germplasm individuals (lsquoaccessionsrsquo) using any com-bination of properties and trait cut-off values name triallocation cross parent and trait values (Figure 3B) Whena filter is applied to choose accessions the rightmost sec-tion shows the number of accessions belonging to the fil-tered dataset When a trait descriptor is chosen as a filterthe middle section shows a histogram along with the statis-tical values such as maximum minimum mean and stan-dard deviation to show users the distribution of data pointswithin the dataset chosen (Figure 3B) The list of the ac-cession names chosen can be viewed and downloaded ina table with an option of adding more data on the acces-sions such as parents cross number and trait values (Figure3C) The list of accessions can be saved in user accounts andcan be used to retrieve any data associated with the acces-sions The lsquoData Analysisrsquo section allows users to choosetwo datasets using the categories or saved accession listsand compare the trait statistics between the two datasets(Figure 3D) This analysis function allows users to comparevarious traits of two sets of accessions such as progeniesfrom two different crosses

Community resources and data submission

The GDR continues to provide community-based resourcesunder the community navigation menu including pages forthe US RosEXEC RosIGI and conference employmentnotices and mailing lists The US RosEXEC (Rosaceae Ge-nomics Genetics and Breeding Executive Committee) andRosIGI (Rosaceae International Genomics Initiative) serveas communication and coordination focal points for the re-search community with the former also operating as theSteering Committee for the GDR The US RosEXEC andRosIGI pages provide official documents minutes of quar-terly meetings membership lists and subcommittee infor-mation Several mailing lists in addition to the GDR mail-ing list are available to serve the community with informa-tion for specific interests or purposes and the archives canbe viewed through the message board sites

While the GDR team actively curates data from pub-lications we encourage authors to officially archive theirdatasets by submitting their data at the time of manuscriptsubmission The Data Submission page under the Data nav-igation menu provides various data templates for genes ge-netic maps QTL maps as well as genotypic and phenotypicdata In addition the page has a link to a page where the rec-ommended list of files for whole genome data submissionare available

CONCLUSION AND FUTURE DIRECTION

Recent availability of multiple genome assemblies and an-notation data from seven major crops and 14 species in theRosaceae family opened up the opportunity to investigatethe evolution and biological basis of a wide range of plantforms and functions as well as to share knowledge amongmajor rosaceous crops to improve their performance Withthe goal of facilitating these efforts the GDR focused onintegrating the wealth of the new genomic data with tran-scriptomic genetic map genetic marker trait locus pheno-typic and genotypic data To accommodate scientistsrsquo data-

Nucleic Acids Research 2019 Vol 47 Database issue D1143

Figure 3 BIMS in GDR (A) lsquoTemplate Listrsquo subsection in lsquoData Importrsquo section provide downloadable templates for users to enter various breedingdata (B) Searchrsquo section allows users to search and save the list of accessions using any combination of properties and trait cut of values name triallocation cross parent and trait values The middle section shows the statistical information on the filtered dataset for the trait chosen and the right sectionshows the number of accessions filtered so far (C) A page with the search result table Users can add more columns in the table using lsquoColumn optionsrsquoand savedownload the result table (D) lsquoData Analysisrsquo section that allows users to choose two datasets using the categories or saved accession lists andcompare the trait statistics between the two datasets

mining needs that came with these new types and large vol-ume of data we made various new web interfaces availableThe new web interfaces are either new Tripal modules thatwe developed such as MapViewer BIMS Chado LoaderChado Data Display and Chado Search modules (55) orTripal modules that other database teams developed suchas the Synteny Viewer and Tripal BLAST The open-sourcedatabase platform Tripal allows us to meet emerging de-mands for storage querying and display of new data typesmore efficiently and quickly

During the last 5 years we also made advances in curat-ing data with more ontologies and incorporating ontologiesin data query pages to facilitate data-sharing across differ-ent data types species and databases Future effort will in-clude further developing MapViewer to integrate genomedata and genetic maps providing an enhanced querying in-terface expanding analysis capabilities in BIMS throughaccess to additional functionality integrating Galaxy anal-ysis platform through Tripal Galaxy module (httpswwwdrupalorgprojecttripal galaxy) providing access toHigh Performance Computing and enabling cross-Tripaldatabases querying capability using Tripal Elasticsearch(56) and Tripal Exchange (httptripalinfotutorialsv3xweb-services)

Use of the GDR the community resource for Rosaceaegenomics genetics and breeding research has continuedto grow over the last 5 years Between September 1 2013and August 31 2018 the GDR had 256075 visits by 95259unique visitors from 190 countries who accessed 1547877pages The GDR is part of AgBioData (57) (httpswwwagbiodataorg) a consortium working to improve stan-

dards and sustainability of genomics genetics and breedingdatabases and further enable agricultural science

ACKNOWLEDGEMENTS

The authors acknowledge with thanks their fundingsources the Rosaceae research community for providingdata support and feedback the Tripal and GMOD com-munity of developers for developing and sharing Tripalmodules and code the AgBioData Consortium and USLand Grant Universities for support

FUNDING

USDA National Institute of Food and Agriculture Spe-cialty Crop Research Initiative projects [2014-51181-2237 2014-51181-22378] USDA National Institute ofFood and Agriculture National Research Support Project10 NSF PGRP award 444573 NSF CIF21 DIBBs award1443040 Washington Tree Fruit Research CommissionWashington State University Clemson University Fundingfor open access charge Federal Grant USDA NIFA SCRIGrantConflict of interest statement None declared

REFERENCES1 ShulaevV KorbanSS SosinskiB AbbottAG AldwinckleHS

FoltaKM IezzoniA MainD ArusP DandekarAM et al(2008) Multiple models for rosaceae genomics Plant Physiol 147985ndash1003

D1144 Nucleic Acids Research 2019 Vol 47 Database issue

2 JungS JesuduraiC StatonM DuZ FicklinS ChoIAbbottA TomkinsJ and MainD (2004) GDR (Genome Databasefor Rosaceae) integrated web resources for Rosaceae genomics andgenetics research BMC Bioinformatics 5 130

3 JungS StatonM LeeT BlendaA SvancaraR AbbottA andMainD (2008) GDR (Genome Database for Rosaceae) integratedweb-database for Rosaceae genomics and genetics data Nucleic AcidsRes 36 D1034ndashD1040

4 JungS FicklinSP LeeT ChengC-H BlendaA ZhengP YuJBombarelyA ChoI RuS et al (2014) The Genome Database forRosaceae (GDR) year 10 update Nucleic Acids Res 42D1237ndashD1244

5 CooperL MeierA LaporteM-A ElserJL MungallCSinnBT CavaliereD CarbonS DunnNA SmithB et al (2018)The Planteome database an integrated resource for referenceontologies plant genomics and phenomics Nucleic Acids Res 46D1168ndashD1180

6 SandersonL-A FicklinSP ChengC-H JungS FeltusFABettKE and MainD (2013) Tripal v11 a standards-based toolkitfor construction of online genetic and genomic databases Database(Oxford) 2013 bat075

7 FicklinSP SandersonL-A ChengC-H StatonME LeeTChoI-H JungS BettKE and MainD (2011) Tripal aconstruction toolkit for online genome databases Database(Oxford) 2011 bar044

8 EdgerPP VanBurenR ColleM PoortenTJ WaiCMNiederhuthCE AlgerEI OuS AcharyaCB WangJ et al(2018) Single-molecule sequencing and optical mapping yields animproved genome of woodland strawberry (Fragaria vesca) withchromosome-scale contiguity Gigascience 7 1ndash7

9 HirakawaH ShirasawaK KosugiS TashiroK NakayamaSYamadaM KoharaM WatanabeA KishidaY FujishiroT et al(2014) Dissection of the octoploid strawberry genome by deepsequencing of the genomes of fragaria species DNA Res 21169ndash181

10 ShulaevV SargentDJ CrowhurstRN MocklerTC FolkertsODelcherAL JaiswalP MockaitisK ListonA ManeSP et al(2010) The genome of woodland strawberry (Fragaria vesca) NatGenet 43 109ndash116

11 TennessenJA GovindarajuluR AshmanT-L and ListonA(2014) Evolutionary origins and dynamics of octoploid strawberrysubgenomes revealed by dense targeted capture linkage mapsGenome Biol Evol 6 3295ndash3313

12 DarwishO ShahanR LiuZ SlovinJP and AlkharoufNW(2015) Re-annotation of the woodland strawberry (Fragaria vesca)genome BMC Genomics 16 29

13 LiY WeiW FengJ LuoH PiM LiuZ and KangC (2018)Genome re-annotation of the wild strawberry Fragaria vesca usingextensive Illumina- and SMRT-based RNA-seq datasets DNA Res25 61ndash70

14 ButiM MorettoM BarghiniE MascagniF NataliL BrilliMLomsadzeA SonegoP GiongoL AlongeM et al (2018) Thegenome sequence and transcriptome of Potentilla micrantha andtheir comparison to Fragaria vesca (the woodland strawberry)Gigascience 7 1ndash14

15 DaccordN CeltonJ-M LinsmithG BeckerC ChoisneNSchijlenE van de GeestH BiancoL MichelettiD VelascoRet al (2017) High-quality de novo assembly of the apple genome andmethylome dynamics of early fruit development Nat Genet 491099ndash1106

16 VelascoR ZharkikhA AffourtitJ DhingraA CestaroAKalyanaramanA FontanaP BhatnagarSK TroggioM PrussDet al (2010) The genome of the domesticated apple (Malus timesdomestica Borkh) Nat Genet 42 833ndash839

17 The International Peach Genome Initiative VerdeI AbbottAGScalabrinS JungS ShuS MarroniF ZhebentyayevaTDettoriMT GrimwoodJ et al (2013) The high-quality draftgenome of peach (Prunus persica) identifies unique patterns ofgenetic diversity domestication and genome evolution Nat Genet45 487ndash494

18 VerdeI JenkinsJ DondiniL MicaliS PagliaraniGVendraminE ParisR AraminiV GazzaL RossiniL et al(2017) The Peach v20 release high-resolution linkage mapping and

deep resequencing improve chromosome-scale assembly andcontiguity BMC Genomics 18 225

19 ShirasawaK IsuzugawaK IkenagaM SaitoY YamamotoTHirakawaH and IsobeS (2017) The genome sequence of sweetcherry (Prunus avium) for use in genomics-assisted breeding DNARes 24 499ndash508

20 ChagneD CrowhurstRN PindoM ThrimawithanaA DengCIrelandH FiersM DzierzonH CestaroA FontanaP et al(2014) The draft genome sequence of European pear (Pyruscommunis L rsquoBartlettrsquo) PLoS One 9 e92644

21 HibrandL RuttinkT HamamaL KirovI LakhwaniDZhouN-N BourkeP DaccordN LeusL SchulzD et al (2018)A high-quality sequence of Rosa chinensis to elucidate genomestructure and ornamental traits Nat Plants 4 473ndash484

22 RaymondO GouzyJ JustJ BadouinH VerdenaudMLemainqueA VergneP MojaS ChoisneN PontC et al (2018)The Rosa genome provides new insights into the domestication ofmodern roses Nat Genet 50 772ndash777

23 NakamuraN HirakawaH SatoS OtagakiS MatsumotoSTabataS and TanakaY (2018) Genome structure of Rosa multifloraa wild ancestor of cultivated roses DNA Res 25 113ndash121

24 JibranR DzierzonH BassilN BushakraJM EdgerPPSullivanS FinnCE DossettM ViningKJ VanBurenR et al(2018) Chromosome-scale scaffolding of the black raspberry (Rubusoccidentalis L) genome based on chromatin interaction data HorticRes 5 8

25 VanBurenR BryantD BushakraJM ViningKJ EdgerPPRowleyER PriestHD MichaelTP LyonsE FilichkinSAet al (2016) The genome of black raspberry (Rubus occidentalis)Plant J 87 535ndash547

26 FinnRD AttwoodTK BabbittPC BatemanA BorkPBridgeAJ ChangH-Y DosztanyiZ El-GebaliS FraserM et al(2017) InterPro in 2017ndashndashbeyond protein family and domainannotations Nucleic Acids Res 45 D190ndashD199

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene Ontology tool for the unification of biology NatGenet 25 25ndash29

28 The Gene Ontology Consortium (2017) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338

29 WangY TangH DeBarryJD TanX LiJ WangX LeeT -hJinH MarlerB GuoH et al (2012) MCScanX a toolkit fordetection and evolutionary analysis of gene synteny and collinearityNucleic Acids Res 40 e49

30 BuelsR YaoE DieshCM HayesRD Munoz-TorresMHeltG GoodsteinDM ElsikCG LewisSE SteinL et al(2016) JBrowse a dynamic web platform for genome visualizationand analysis Genome Biol 17 66

31 CamachoC CoulourisG AvagyanV MaN PapadopoulosJBealerK and MaddenTL (2009) BLAST+ architecture andapplications BMC Bioinformatics 10 421

32 SchlapferP ZhangP WangC KimT BanfM ChaeLDreherK ChavaliAK Nilo-PoyancoR BernardT et al (2017)Genome-wide prediction of metabolic enzymes pathways and geneclusters in plants Plant Physiol 173 2041ndash2059

33 CaspiR DreherK and KarpPD (2013) The challenge ofconstructing classifying and representing metabolic pathwaysFEMS Microbiol Lett 345 85ndash93

34 BolgerAM LohseM and UsadelB (2014) Trimmomatic a flexibletrimmer for Illumina sequence data Bioinformatics 30 2114ndash2120

35 GrabherrMG HaasBJ YassourM LevinJZ ThompsonDAAmitI AdiconisX FanL RaychowdhuryR ZengQ et al(2011) Full-length transcriptome assembly from RNA-Seq datawithout a reference genome Nat Biotechnol 29 644ndash652

36 GordonD AbajianC and GreenP (1998) Consed a graphical toolfor sequence finishing Genome Res 8 195ndash202

37 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

38 HuangX and MadanA (1999) CAP3 A DNA sequence assemblyprogram Genome Res 9 868ndash877

39 LangmeadB TrapnellC PopM and SalzbergSL (2009) Ultrafastand memory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10 R25

Nucleic Acids Research 2019 Vol 47 Database issue D1145

40 FuL NiuB ZhuZ WuS and LiW (2012) CD-HIT acceleratedfor clustering the next-generation sequencing data Bioinformatics28 3150ndash3152

41 DavidsonNM and OshlackA (2014) Corset enabling differentialgene expression analysis for de novo assembled transcriptomesGenome Biol 15 410

42 Koning-BoucoiranCFS EsselinkGD VukosavljevM van rsquotWestendeWPC GitongaVW KrensFA VoorripsRE van deWegWE SchulzD DebenerT et al (2015) Using RNA-Seq toassemble a rose transcriptome with more than 13000 full-lengthexpressed genes and to develop the WagRhSNP 68k Axiom SNParray for rose (Rosa L) Front Plant Sci 6 249

43 FuentesL MonsalveL Morales-QuintanaL ValdenegroMMartınezJ-P DefilippiBG and Gonzalez-AgueroM (2015)Differential expression of ethylene biosynthesis genes in drupelets andreceptacle of raspberry (Rubus idaeus) J Plant Physiol 179100ndash105

44 JungS CestaroA TroggioM MainD ZhengP ChoIFoltaKM SosinskiB AbbottA CeltonJ-M et al (2012) Wholegenome comparisons of Fragaria Prunus and Malus reveal differentmodes of evolution between Rosaceous subfamilies BMC Genomics13 129

45 Youens-ClarkK FagaB YapIV SteinL and WareD (2009)CMap 101 a comparative mapping application for the InternetBioinformatics 25 3040ndash3042

46 IezzoniA PeaceC MainD BassilN CoeM FinnC GasicKLubyJ HokansonS McFersonJ et al (2017) RosBREED2progress and future plans to enable DNA-informed breeding in theRosaceae Acta Hortic doi1017660ActaHortic201711722

47 ChagneD CrowhurstRN TroggioM DaveyMW GilmoreBLawleyC VanderzandeS HellensRP KumarS CestaroA et al(2012) Genome-wide SNP detection validation and development ofan 8K SNP array for apple PLoS One 7 e31745

48 BiancoL CestaroA SargentDJ BanchiE DerdakS DiGuardoM SalviS JansenJ ViolaR GutI et al (2014)Development and validation of a 20K single nucleotidepolymorphism (SNP) whole genome genotyping array for apple(Malus times domestica Borkh) PLoS One 9 e110377

49 BiancoL CestaroA LinsmithG MurantyH DenanceCTheronA PoncetC MichelettiD KerschbamerE Di PierroEAet al (2016) Development and validation of the Axiom(reg)Apple480K SNP genotyping array Plant J 86 62ndash74

50 PeaceC BassilN MainD FicklinS RosyaraUR StegmeirTSeboltA GilmoreB LawleyC MocklerTC et al (2012)Development and evaluation of a genome-wide 6K SNP array fordiploid sweet cherry and tetraploid sour cherry PLoS One 7 e48305

51 VerdeI BassilN ScalabrinS GilmoreB LawleyCT GasicKMichelettiD RosyaraUR CattonaroF VendraminE et al(2012) Development and evaluation of a 9K SNP array for peach byinternationally coordinated SNP detection and validation in breedinggermplasm PLoS One 7 e35668

52 BassilNV DavisTM ZhangH FicklinS MittmannMWebsterT MahoneyL WoodD AlperinES RosyaraUR et al(2015) Development and preliminary evaluation of a 90 K AxiomregSNP array for the allo-octoploid cultivated strawberry Fragaria timesananassa BMC Genomics 16 155

53 EvansK (2013) Apple Breeding in Pacific Northwest Acta Hortic976 75ndash78

54 RifeTW and PolandJA (2014) Field Book an open-sourceapplication for field data collection on android Crop Sci 541624ndash1627

55 JungS LeeT ChengC-H FicklinS YuJ HumannJ andMainD (2017) Extension modules for storage visualization andquerying of genomic genetic and breeding data in Tripal databasesDatabase (Oxford) 2017 bax092

56 ChenM HenryN AlmsaeedA ZhouX WegrzynJ FicklinSand StatonM (2017) New extension software modules to enhancesearching and display of transcriptome data in Tripal databasesDatabase (Oxford) 2017 bax052

57 HarperL CampbellJ CannonEKS JungS PoelchauMWallsR AndorfC ArnaudE BerardiniTZ BirkettC et al(2018) AgBioData consortium recommendations for sustainablegenomics and genetics databases for agriculture Database (Oxford)2018 bay088

Nucleic Acids Research 2019 Vol 47 Database issue D1139

Table 1 Comparison of number of GDR entries between 2008 2013 and 2018 by data type

Number of entries by year

Data type 2008 2013 2018 Details

Genome 0 5 21 Multiple versions of whole genome assemblies andannotations from seven major crops and 14 species

Gene andmRNA

0 236191 genes 528890 genes and 585574mRNAs

Genes and mRNAs from whole genome assembliesand parsed from NCBI nucleotide sequences

Transcript 90337 200467 1065226 RefTrans and1412519 Unigenes

RefTrans for six genus and unigene V5 for six genusand the rosaceae family

Marker 1700 2229311 3285775 Including 127877 for Fragaria 2613842 for Malus470747 for Prunus 1593 for Pyrus 71716 for Rosa and8743 for Rubus

Genetic Map 37 84 313 Including 27 for Fragaria 108 for Malus 168 forPrunus 9 for Pyrus 15 for Rosa and 13 for Rubus

QTL 0 1195 3799 QTL and MTL associated with 392 agronomic traitsMTL 27 52 103Species lt100 516 1967 Data available for 1967 species specific species pages

with hyperlinks to various data and tools for 13 majorspecies

Germplasm 0 8613 14411 Including 1799 for Fragaria 4635 for Malus 7367 forPrunus 465 for Pyrus 145 for Rosa and 42 for Rubus

Phenotype data 0 578 568 878426 878426 measurements include 389191 publiclyavailable data and the rest available for the RosBreedproject team members

Genotype data 0 28 296 10806569 SNP genotype data from 10787946 measurementsusing 12229 markers for 6 species SSR genotype datafrom 18623 measurements using 5044 markers for 10species

Publication 2447 5182 7449 7499 publications on genomic genetic and breedingresearch in Rosaceae

tated with the word lsquoresistancersquo between 10 and 35 Mbon chromosome Pp02 of the peach genome v20rsquo Usingthe search site users can download the results or proceedto the gene details page within the GDR Recently addedfunctionality in the Search Genes and Transcripts page in-cludes lsquocustomized outputrsquo This option allows users to cus-tomize the result table and the downloadable Excel file toinclude various functional annotation results The gene de-tails page has several links in the resources sidebar to displaythe sequence and its motif annotations genome alignmentsand homologies to sequences of other species in the GDRand other databases The alignment details provide links toview a gene in JBrowse The new synteny section lists allthe orthologs and paralogs in other genomes discovered bysynteny analyses and provides hyperlinks to the gene pagesand Synteny Viewer Via the Synteny Viewer page (Figure1A) accessible from the tools navigation menu users canchoose a scaffoldchromosome of one reference genomeand choose multiple other genomes for comparison Theviewer then displays a clickable circular image as well asa table showing all the syntenic blocks between them (Fig-ure 1B) Choosing one block either from the image or thetable leads to a page where all the syntenic genes within theblock are shown with the Expect value (E-value) of theirhomology (Figure 1C) Gene names are linked to a spe-cific gene page (Figure 1D) where orthologsparalogs inother genomes are available among other information onthe genes such as associated function and genomic loca-tion with a link to JBrowse (Figure 1E) Conserved syn-teny data made available in the GDR thus allows usersstarting with one rosaceous genome to explore genes an-chored trait loci and genetic markers within orthologous re-

gions of another rosaceous genome Using JBrowse userscan view all the genomic features aligned to the genomesuch as gene models transcripts repeats SNPs other ge-netic markers and genes from other model plant species Inaddition the multivariant viewer JBrowse plugin has beenimplemented to allow users to view SNPs present amongsequence data from multiple germplasm individuals Forexample RosBREED resequencing data (23 accessions)aligned to the peach v10 genome is available in JBrowsethrough the plugin The GDR has upgraded to the Tri-pal BLAST module (httptripalinfoextensionsmodulestripal-blast-analysis) replacing the old BLAST and batchBLAST tools The new BLAST enables results to link tothe genome scaffolds in JBrowse and to the genetranscriptdetails in the GDR and on the NCBI Predicted genes fromwhole genome sequences were used in the construction ofPlantCyc (metabolic pathway) databases (32) using Path-wayTools (33) Currently four PlantCyc databases are avail-able in the GDR PeachCyc AppleCyc FragariaCyc andRubusCyc

Transcriptome data The GDR team now regularly per-forms an analysis that combines published RNA-Seq anddbEST datasets to create a reference transcriptome (Ref-Trans) for major species or genera and provides puta-tive gene function identified by homology to known pro-teins The RNA-Seq sequences from peer-reviewed pub-lications were downloaded from the NCBI Short ReadArchive (SRA) and subject to quality control using theTrimmomatic (v032 default parameters 34) and customPerl scripts The remaining RNA-Seq reads were assembledde novo with Trinity (v266 35) using default assembly pa-rameters and a minimum coding length of 200 bases Qual-

D1140 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 1 Synteny Viewer in GDR (A) Home page of Synteny Viewer allows users to choose a reference genome and a chromosome and multiple genomesfor comparison Users can also choose a synteny block ID (B) A circular diagram and a table shows the synteny blocks between a chromosome of areference genome and all chromosomes of another genome being compared (C) A bar diagram and a table that shows all the genes in a syntenic blockThe table displays E-value between the matching genes and the gene names have hyperlinks to the gene detail page (D) A gene detail page with resourceside bar and the hyperlink to JBrowse (E) JBrowse around the mRNA of interest with tracks such as gene mRNA SNP and genes parsed from NCBInucleotide database

ity control of the ESTs included vector sequence screen-ing (UniVec CoreftpftpncbinihgovpubUniVec) usingcross match (36) removal of tRNArRNAsnRNA se-quences identified using tblastx (37) and Poly-A tail trim-ming The filtered ESTs were assembled using the CAP3program (P minus90 38) Bowtie (v 233) (39) was applied tomulti-map the RNA-Seq reads and ESTs back to the as-sembled contigs and singlets The contigs and singlets wereclustered into genes using CH-HIT (v464 40) and Corset(v107) (41) with default parameters The longest isoformgreater than 500 nt was selected to represent each Corsetcluster and create the RefTrans sequences The RefTran se-quences are functionally characterized by pairwise compar-ison using the BLASTX algorithm against the Swiss-Protand TrEMBL protein databases Information on the top tenmatches with an E-value of le 1E-06 are recorded and storedin GDR together with the RefTrans sequences InterPro do-mains and Gene Ontology assignments are made using In-terProScan at the EBI through Blast2GO Transcriptomesand their associated annotation are available to downloadin the transcriptome page that can be accessed from eachspecies page to search and download in the Search Genesand Transcripts page to view on the genome in JBrowseand to perform similarity searches in the BLAST serverCurrently available datasets include F vesca GDR RefTransv10 P avium GDR RefTrans v10 P persica GDR Ref-Trans v10 Malus times domestica GDR RefTrans v10 andRubus GDR RefTrans v20 The GDR continues to pro-vide unigene v50 for each genus (Prunus Malus Fragaria

Rosa and Pyrus) composed of singlets and contigs assem-bled from the publicly available Rosaceae ESTs downloadedfrom dbEST at NCBI Unigene v50 for the entire Rosaceaeis composed of the assembled contigs and singlets for thefive genera

GDR also hosts user-provided assembled transcriptssuch as a rose transcriptome which are orthologous openreading frame sequences of roses used to construct the Wa-gRhSNP 68K Axiom SNP array (42) black cherry Prunusserotina unigene v10 from Pennsylvania State Universityand assembled transcript sequences of red raspberry Rubusidaeus (43) In addition the GDR provides links to RNA-Seq and DNA datasets available in the NCBI SRA forRosaceae genus or major species

Genetic maps genetic diversity data markers and trait loci

Genetic maps With continuous effort on curating peer-reviewed published data GDR now contains 313 geneticmaps for Rosaceae species including 168 for Prunus 108for Malus 27 for Fragaria 15 for Rosa 13 for Rubus and9 for Pyrus The data associated with genetic maps in-clude mapped positions of molecular markers QTLs andheritable phenotypic markers as well as mapping popu-lation(s) and publication(s) GDR has a new graphic in-terface MapViewer (httptripalinfoextensionsmodulestripalmap) to view genetic maps MapViewer allows usersto view and compare maps from different populations andspecies facilitating information transfer from well-studiedspecies to less-studied ones These comparisons are espe-

Nucleic Acids Research 2019 Vol 47 Database issue D1141

cially useful in Rosaceae due to well conserved syntenyamong the genomes of Rosaceae genera (144) While thefunctionality of MapViewer is similar to CMap (45) a com-monly used tool in biological databases including the GDRMapViewer is much more integrated with other pages suchas the map marker and QTL pages In addition MapViewerallows users to zoom into specific regions of a linkage groupchoose types of markers to be displayed and change the col-ors of the markers that are displayed

Genetic diversity data The GDR contains DNA polymor-phism data from various published genetic diversity studiesand public breeding projects such as RosBREED (46) Cur-rently data from 24 diversity projects are available thirteenfrom Prunus ten from MalusPyrus and one from Mala-comeles (false serviceberry) The GDR provides separatesearch pages for the SNP and the SSR genotypic data toprovide appropriate query and downloadable result tablesdepending on the data type There are 19 and 5 datasetsavailable for SSR and SNP genotype search pages respec-tively In the SNP genotype search page users can filterresults by dataset name species germplasm name SNPname genomic location andor gene name (Figure 2A)Users can also upload a file with germplasm names Thisfiltering allows users to perform tailored querying such asfinding SNP polymorphisms around a gene of interest in achosen set of germplasm The results table provides SNPname genomic location allele and genotypic data of allof the germplasm chosen in the order of SNP location inthe genome so that users can view the genotype of eachgermplasm along the chromosome (Figure 2B) Users candownload the genotypes for all markers displayed in the re-sults page or the genotypes for only the markers that arepolymorphic within the germplasm set chosen (Figure 2B)

Genetic marker and SNP array data The GDR pro-vides details on more than 3 million genetic markersused in genetic map development genetic diversity studiesgenome wide association studies and SNP array develop-ment Marker annotations include marker aliases sourcegermplasm source description primer sequences poly-merase chain reaction conditions literature references andmap position where available For SNPs the marker de-tails also include SNP array name SNP array ID dbSNPID alleles flanking sequences and probes SNP marker dataavailable from GDR include those from array developmentprojects such as the 9K (47) 20K (48) and 480K (49) arraysfor apple IRSC cherry 6K array (50) IPSC peach 9K array(51) 90K array for cultivated strawberry (52) and 68K arrayfor rose (42) The SNP array data are available to downloadin Microsoft Excel format from the genome pages to view inJBrowse as well as to query in the Marker Search page TheGDR provides a new lsquoSNP Marker Searchrsquo page in additionto the existing lsquoMarker Searchrsquo and lsquoSearch Nearby Mark-ersrsquo pages The search filter feature in the lsquoMarker Searchrsquopage includes marker name marker type the species fromwhich the marker is developed the species to which themarker is mapped trait name and map position in the ge-netic map and genome Filtering by trait name is a new fea-ture that allows users to search for markers that are nearandor within QTLs using the associated trait name The

table in the results page shows marker name alias markertype species genetic map location and genome locationThe downloaded file contains the same information as wellas the citation The lsquoSNP Marker Searchrsquo is designed so thatusers can filter using array information as well as SNP nameand genomic location The results table is also specific forSNP with alleles SNP array information genome locationand flanking sequences In both search pages users can up-load a file of marker names for querying Another search in-terface lsquoSearch Nearby Markersrsquo allows users to find mark-ers near a targeted locus

Trait locus data and Rosaceae Trait Ontology The GDRcontains detailed data on published trait loci QTLs andMTLs (Mendelian Trait Loci) These data include 3799QTLs and 103 MTLs that are associated with 392 horticul-tural traits such as powdery mildew disease resistance fruitskin color ripening time and volatile organic compoundcontent These trait terms constitute the Rosaceae Trait On-tology The Rosaceae Trait Ontology has been built to stan-dardize trait names and abbreviations for all trait data en-tered into the GDR and to connect trait terms to TO (5)with the goal of data integration across databases organ-isms and data types Each of these trait terms is either anexisting TO term or a child term of one One trait term canbelong to multiple Root TO terms New Rosaceae Trait On-tology terms which do not exist in TO have been submit-ted to the Trait Ontology consortium for inclusion In ad-dition to the Rosaceae Trait Ontology QTLs are annotatedin the GDR with aliases curator-assigned QTL labels pub-lished symbols trait names taxa trait descriptions screen-ing methods map positions associated markers statisticalvalues datasets contact information and references Thesearch page for trait loci allows searching by trait locus type(QTL or MTL) species trait category (root TO term) traitname published symbol and GDR-assigned label

Breeding data

Breeding data stored in the GDR include phenotypic geno-typic germplasm and pedigree data from the RosBREEDproject (46) the Washington State University Apple Breed-ing Program (53) and private breeding programs Publiclyavailable breeding data can be queried using the lsquoSearchTrait Evaluationrsquo and lsquoSearch Genotypersquo pages The lsquoSearchTrait Evaluationrsquo page provides two tabs for querying qual-itative or quantitative traits In each tab users can filter thedata by crop dataset name and trait cut-off values of up tothree trait descriptors The result table and downloadablefile has germplasm name species the trait values chosenand the dataset name The germplasm and the dataset namein the results table are linked to the detail page where otherassociated data can be accessed The germplasm page has aresource sidebar for genotypic and phenotypic data as wellas an overview and associated images where the data canbe viewed and downloaded In addition the public breed-ing data can be queried and downloaded using the BIMSBIMS is a new Tripal module (httptripalinfoextensionsmodulestripal-bims) we developed to provide breeders andbreeding project teams with tools to store manage archiveand analyze their private or public breeding data BIMS is

D1142 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 2 SNP Genotype Search Page in GDR (A) Users can search SNP genotype data by dataset name species germplasm name SNP name genomelocation andor gene name Users can also upload a file with a list of germplasm names (B) Search result table that shows SNP name genome locationallele and the genotype data of all the germplasm chosen in the order of SNP location in the genome The red square highlights the options to downloadthe genotype for all the marker displayed in the result page or the genotype data that are polymorphic in the germplasm set chosen

compatible with Field Book (54) an open-source App forandroid phones and tablets that allows researchers to re-place hard-copy field books or spreadsheet files BIMS al-lows breeders to upload data collected with the Field Bookapp directly to their private secure database in the GDRinto a consistent format Having BIMS in the GDR givesbreeders the opportunity to integrate their data with pub-licly available data and easily release some or all of their datapublicly as needed Currently the sweet cherry phenotypicdata from the RosBREED project is available for public ac-cess through BIMS In BIMS an accordion menu on theleft side provides quick access to various functionalities ThelsquoData Importrsquo section provides data templates for users toenter their data and upload the data file themselves (Figure3A) The lsquoSearchrsquo section allows users to search and save thelist of germplasm individuals (lsquoaccessionsrsquo) using any com-bination of properties and trait cut-off values name triallocation cross parent and trait values (Figure 3B) Whena filter is applied to choose accessions the rightmost sec-tion shows the number of accessions belonging to the fil-tered dataset When a trait descriptor is chosen as a filterthe middle section shows a histogram along with the statis-tical values such as maximum minimum mean and stan-dard deviation to show users the distribution of data pointswithin the dataset chosen (Figure 3B) The list of the ac-cession names chosen can be viewed and downloaded ina table with an option of adding more data on the acces-sions such as parents cross number and trait values (Figure3C) The list of accessions can be saved in user accounts andcan be used to retrieve any data associated with the acces-sions The lsquoData Analysisrsquo section allows users to choosetwo datasets using the categories or saved accession listsand compare the trait statistics between the two datasets(Figure 3D) This analysis function allows users to comparevarious traits of two sets of accessions such as progeniesfrom two different crosses

Community resources and data submission

The GDR continues to provide community-based resourcesunder the community navigation menu including pages forthe US RosEXEC RosIGI and conference employmentnotices and mailing lists The US RosEXEC (Rosaceae Ge-nomics Genetics and Breeding Executive Committee) andRosIGI (Rosaceae International Genomics Initiative) serveas communication and coordination focal points for the re-search community with the former also operating as theSteering Committee for the GDR The US RosEXEC andRosIGI pages provide official documents minutes of quar-terly meetings membership lists and subcommittee infor-mation Several mailing lists in addition to the GDR mail-ing list are available to serve the community with informa-tion for specific interests or purposes and the archives canbe viewed through the message board sites

While the GDR team actively curates data from pub-lications we encourage authors to officially archive theirdatasets by submitting their data at the time of manuscriptsubmission The Data Submission page under the Data nav-igation menu provides various data templates for genes ge-netic maps QTL maps as well as genotypic and phenotypicdata In addition the page has a link to a page where the rec-ommended list of files for whole genome data submissionare available

CONCLUSION AND FUTURE DIRECTION

Recent availability of multiple genome assemblies and an-notation data from seven major crops and 14 species in theRosaceae family opened up the opportunity to investigatethe evolution and biological basis of a wide range of plantforms and functions as well as to share knowledge amongmajor rosaceous crops to improve their performance Withthe goal of facilitating these efforts the GDR focused onintegrating the wealth of the new genomic data with tran-scriptomic genetic map genetic marker trait locus pheno-typic and genotypic data To accommodate scientistsrsquo data-

Nucleic Acids Research 2019 Vol 47 Database issue D1143

Figure 3 BIMS in GDR (A) lsquoTemplate Listrsquo subsection in lsquoData Importrsquo section provide downloadable templates for users to enter various breedingdata (B) Searchrsquo section allows users to search and save the list of accessions using any combination of properties and trait cut of values name triallocation cross parent and trait values The middle section shows the statistical information on the filtered dataset for the trait chosen and the right sectionshows the number of accessions filtered so far (C) A page with the search result table Users can add more columns in the table using lsquoColumn optionsrsquoand savedownload the result table (D) lsquoData Analysisrsquo section that allows users to choose two datasets using the categories or saved accession lists andcompare the trait statistics between the two datasets

mining needs that came with these new types and large vol-ume of data we made various new web interfaces availableThe new web interfaces are either new Tripal modules thatwe developed such as MapViewer BIMS Chado LoaderChado Data Display and Chado Search modules (55) orTripal modules that other database teams developed suchas the Synteny Viewer and Tripal BLAST The open-sourcedatabase platform Tripal allows us to meet emerging de-mands for storage querying and display of new data typesmore efficiently and quickly

During the last 5 years we also made advances in curat-ing data with more ontologies and incorporating ontologiesin data query pages to facilitate data-sharing across differ-ent data types species and databases Future effort will in-clude further developing MapViewer to integrate genomedata and genetic maps providing an enhanced querying in-terface expanding analysis capabilities in BIMS throughaccess to additional functionality integrating Galaxy anal-ysis platform through Tripal Galaxy module (httpswwwdrupalorgprojecttripal galaxy) providing access toHigh Performance Computing and enabling cross-Tripaldatabases querying capability using Tripal Elasticsearch(56) and Tripal Exchange (httptripalinfotutorialsv3xweb-services)

Use of the GDR the community resource for Rosaceaegenomics genetics and breeding research has continuedto grow over the last 5 years Between September 1 2013and August 31 2018 the GDR had 256075 visits by 95259unique visitors from 190 countries who accessed 1547877pages The GDR is part of AgBioData (57) (httpswwwagbiodataorg) a consortium working to improve stan-

dards and sustainability of genomics genetics and breedingdatabases and further enable agricultural science

ACKNOWLEDGEMENTS

The authors acknowledge with thanks their fundingsources the Rosaceae research community for providingdata support and feedback the Tripal and GMOD com-munity of developers for developing and sharing Tripalmodules and code the AgBioData Consortium and USLand Grant Universities for support

FUNDING

USDA National Institute of Food and Agriculture Spe-cialty Crop Research Initiative projects [2014-51181-2237 2014-51181-22378] USDA National Institute ofFood and Agriculture National Research Support Project10 NSF PGRP award 444573 NSF CIF21 DIBBs award1443040 Washington Tree Fruit Research CommissionWashington State University Clemson University Fundingfor open access charge Federal Grant USDA NIFA SCRIGrantConflict of interest statement None declared

REFERENCES1 ShulaevV KorbanSS SosinskiB AbbottAG AldwinckleHS

FoltaKM IezzoniA MainD ArusP DandekarAM et al(2008) Multiple models for rosaceae genomics Plant Physiol 147985ndash1003

D1144 Nucleic Acids Research 2019 Vol 47 Database issue

2 JungS JesuduraiC StatonM DuZ FicklinS ChoIAbbottA TomkinsJ and MainD (2004) GDR (Genome Databasefor Rosaceae) integrated web resources for Rosaceae genomics andgenetics research BMC Bioinformatics 5 130

3 JungS StatonM LeeT BlendaA SvancaraR AbbottA andMainD (2008) GDR (Genome Database for Rosaceae) integratedweb-database for Rosaceae genomics and genetics data Nucleic AcidsRes 36 D1034ndashD1040

4 JungS FicklinSP LeeT ChengC-H BlendaA ZhengP YuJBombarelyA ChoI RuS et al (2014) The Genome Database forRosaceae (GDR) year 10 update Nucleic Acids Res 42D1237ndashD1244

5 CooperL MeierA LaporteM-A ElserJL MungallCSinnBT CavaliereD CarbonS DunnNA SmithB et al (2018)The Planteome database an integrated resource for referenceontologies plant genomics and phenomics Nucleic Acids Res 46D1168ndashD1180

6 SandersonL-A FicklinSP ChengC-H JungS FeltusFABettKE and MainD (2013) Tripal v11 a standards-based toolkitfor construction of online genetic and genomic databases Database(Oxford) 2013 bat075

7 FicklinSP SandersonL-A ChengC-H StatonME LeeTChoI-H JungS BettKE and MainD (2011) Tripal aconstruction toolkit for online genome databases Database(Oxford) 2011 bar044

8 EdgerPP VanBurenR ColleM PoortenTJ WaiCMNiederhuthCE AlgerEI OuS AcharyaCB WangJ et al(2018) Single-molecule sequencing and optical mapping yields animproved genome of woodland strawberry (Fragaria vesca) withchromosome-scale contiguity Gigascience 7 1ndash7

9 HirakawaH ShirasawaK KosugiS TashiroK NakayamaSYamadaM KoharaM WatanabeA KishidaY FujishiroT et al(2014) Dissection of the octoploid strawberry genome by deepsequencing of the genomes of fragaria species DNA Res 21169ndash181

10 ShulaevV SargentDJ CrowhurstRN MocklerTC FolkertsODelcherAL JaiswalP MockaitisK ListonA ManeSP et al(2010) The genome of woodland strawberry (Fragaria vesca) NatGenet 43 109ndash116

11 TennessenJA GovindarajuluR AshmanT-L and ListonA(2014) Evolutionary origins and dynamics of octoploid strawberrysubgenomes revealed by dense targeted capture linkage mapsGenome Biol Evol 6 3295ndash3313

12 DarwishO ShahanR LiuZ SlovinJP and AlkharoufNW(2015) Re-annotation of the woodland strawberry (Fragaria vesca)genome BMC Genomics 16 29

13 LiY WeiW FengJ LuoH PiM LiuZ and KangC (2018)Genome re-annotation of the wild strawberry Fragaria vesca usingextensive Illumina- and SMRT-based RNA-seq datasets DNA Res25 61ndash70

14 ButiM MorettoM BarghiniE MascagniF NataliL BrilliMLomsadzeA SonegoP GiongoL AlongeM et al (2018) Thegenome sequence and transcriptome of Potentilla micrantha andtheir comparison to Fragaria vesca (the woodland strawberry)Gigascience 7 1ndash14

15 DaccordN CeltonJ-M LinsmithG BeckerC ChoisneNSchijlenE van de GeestH BiancoL MichelettiD VelascoRet al (2017) High-quality de novo assembly of the apple genome andmethylome dynamics of early fruit development Nat Genet 491099ndash1106

16 VelascoR ZharkikhA AffourtitJ DhingraA CestaroAKalyanaramanA FontanaP BhatnagarSK TroggioM PrussDet al (2010) The genome of the domesticated apple (Malus timesdomestica Borkh) Nat Genet 42 833ndash839

17 The International Peach Genome Initiative VerdeI AbbottAGScalabrinS JungS ShuS MarroniF ZhebentyayevaTDettoriMT GrimwoodJ et al (2013) The high-quality draftgenome of peach (Prunus persica) identifies unique patterns ofgenetic diversity domestication and genome evolution Nat Genet45 487ndash494

18 VerdeI JenkinsJ DondiniL MicaliS PagliaraniGVendraminE ParisR AraminiV GazzaL RossiniL et al(2017) The Peach v20 release high-resolution linkage mapping and

deep resequencing improve chromosome-scale assembly andcontiguity BMC Genomics 18 225

19 ShirasawaK IsuzugawaK IkenagaM SaitoY YamamotoTHirakawaH and IsobeS (2017) The genome sequence of sweetcherry (Prunus avium) for use in genomics-assisted breeding DNARes 24 499ndash508

20 ChagneD CrowhurstRN PindoM ThrimawithanaA DengCIrelandH FiersM DzierzonH CestaroA FontanaP et al(2014) The draft genome sequence of European pear (Pyruscommunis L rsquoBartlettrsquo) PLoS One 9 e92644

21 HibrandL RuttinkT HamamaL KirovI LakhwaniDZhouN-N BourkeP DaccordN LeusL SchulzD et al (2018)A high-quality sequence of Rosa chinensis to elucidate genomestructure and ornamental traits Nat Plants 4 473ndash484

22 RaymondO GouzyJ JustJ BadouinH VerdenaudMLemainqueA VergneP MojaS ChoisneN PontC et al (2018)The Rosa genome provides new insights into the domestication ofmodern roses Nat Genet 50 772ndash777

23 NakamuraN HirakawaH SatoS OtagakiS MatsumotoSTabataS and TanakaY (2018) Genome structure of Rosa multifloraa wild ancestor of cultivated roses DNA Res 25 113ndash121

24 JibranR DzierzonH BassilN BushakraJM EdgerPPSullivanS FinnCE DossettM ViningKJ VanBurenR et al(2018) Chromosome-scale scaffolding of the black raspberry (Rubusoccidentalis L) genome based on chromatin interaction data HorticRes 5 8

25 VanBurenR BryantD BushakraJM ViningKJ EdgerPPRowleyER PriestHD MichaelTP LyonsE FilichkinSAet al (2016) The genome of black raspberry (Rubus occidentalis)Plant J 87 535ndash547

26 FinnRD AttwoodTK BabbittPC BatemanA BorkPBridgeAJ ChangH-Y DosztanyiZ El-GebaliS FraserM et al(2017) InterPro in 2017ndashndashbeyond protein family and domainannotations Nucleic Acids Res 45 D190ndashD199

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene Ontology tool for the unification of biology NatGenet 25 25ndash29

28 The Gene Ontology Consortium (2017) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338

29 WangY TangH DeBarryJD TanX LiJ WangX LeeT -hJinH MarlerB GuoH et al (2012) MCScanX a toolkit fordetection and evolutionary analysis of gene synteny and collinearityNucleic Acids Res 40 e49

30 BuelsR YaoE DieshCM HayesRD Munoz-TorresMHeltG GoodsteinDM ElsikCG LewisSE SteinL et al(2016) JBrowse a dynamic web platform for genome visualizationand analysis Genome Biol 17 66

31 CamachoC CoulourisG AvagyanV MaN PapadopoulosJBealerK and MaddenTL (2009) BLAST+ architecture andapplications BMC Bioinformatics 10 421

32 SchlapferP ZhangP WangC KimT BanfM ChaeLDreherK ChavaliAK Nilo-PoyancoR BernardT et al (2017)Genome-wide prediction of metabolic enzymes pathways and geneclusters in plants Plant Physiol 173 2041ndash2059

33 CaspiR DreherK and KarpPD (2013) The challenge ofconstructing classifying and representing metabolic pathwaysFEMS Microbiol Lett 345 85ndash93

34 BolgerAM LohseM and UsadelB (2014) Trimmomatic a flexibletrimmer for Illumina sequence data Bioinformatics 30 2114ndash2120

35 GrabherrMG HaasBJ YassourM LevinJZ ThompsonDAAmitI AdiconisX FanL RaychowdhuryR ZengQ et al(2011) Full-length transcriptome assembly from RNA-Seq datawithout a reference genome Nat Biotechnol 29 644ndash652

36 GordonD AbajianC and GreenP (1998) Consed a graphical toolfor sequence finishing Genome Res 8 195ndash202

37 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

38 HuangX and MadanA (1999) CAP3 A DNA sequence assemblyprogram Genome Res 9 868ndash877

39 LangmeadB TrapnellC PopM and SalzbergSL (2009) Ultrafastand memory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10 R25

Nucleic Acids Research 2019 Vol 47 Database issue D1145

40 FuL NiuB ZhuZ WuS and LiW (2012) CD-HIT acceleratedfor clustering the next-generation sequencing data Bioinformatics28 3150ndash3152

41 DavidsonNM and OshlackA (2014) Corset enabling differentialgene expression analysis for de novo assembled transcriptomesGenome Biol 15 410

42 Koning-BoucoiranCFS EsselinkGD VukosavljevM van rsquotWestendeWPC GitongaVW KrensFA VoorripsRE van deWegWE SchulzD DebenerT et al (2015) Using RNA-Seq toassemble a rose transcriptome with more than 13000 full-lengthexpressed genes and to develop the WagRhSNP 68k Axiom SNParray for rose (Rosa L) Front Plant Sci 6 249

43 FuentesL MonsalveL Morales-QuintanaL ValdenegroMMartınezJ-P DefilippiBG and Gonzalez-AgueroM (2015)Differential expression of ethylene biosynthesis genes in drupelets andreceptacle of raspberry (Rubus idaeus) J Plant Physiol 179100ndash105

44 JungS CestaroA TroggioM MainD ZhengP ChoIFoltaKM SosinskiB AbbottA CeltonJ-M et al (2012) Wholegenome comparisons of Fragaria Prunus and Malus reveal differentmodes of evolution between Rosaceous subfamilies BMC Genomics13 129

45 Youens-ClarkK FagaB YapIV SteinL and WareD (2009)CMap 101 a comparative mapping application for the InternetBioinformatics 25 3040ndash3042

46 IezzoniA PeaceC MainD BassilN CoeM FinnC GasicKLubyJ HokansonS McFersonJ et al (2017) RosBREED2progress and future plans to enable DNA-informed breeding in theRosaceae Acta Hortic doi1017660ActaHortic201711722

47 ChagneD CrowhurstRN TroggioM DaveyMW GilmoreBLawleyC VanderzandeS HellensRP KumarS CestaroA et al(2012) Genome-wide SNP detection validation and development ofan 8K SNP array for apple PLoS One 7 e31745

48 BiancoL CestaroA SargentDJ BanchiE DerdakS DiGuardoM SalviS JansenJ ViolaR GutI et al (2014)Development and validation of a 20K single nucleotidepolymorphism (SNP) whole genome genotyping array for apple(Malus times domestica Borkh) PLoS One 9 e110377

49 BiancoL CestaroA LinsmithG MurantyH DenanceCTheronA PoncetC MichelettiD KerschbamerE Di PierroEAet al (2016) Development and validation of the Axiom(reg)Apple480K SNP genotyping array Plant J 86 62ndash74

50 PeaceC BassilN MainD FicklinS RosyaraUR StegmeirTSeboltA GilmoreB LawleyC MocklerTC et al (2012)Development and evaluation of a genome-wide 6K SNP array fordiploid sweet cherry and tetraploid sour cherry PLoS One 7 e48305

51 VerdeI BassilN ScalabrinS GilmoreB LawleyCT GasicKMichelettiD RosyaraUR CattonaroF VendraminE et al(2012) Development and evaluation of a 9K SNP array for peach byinternationally coordinated SNP detection and validation in breedinggermplasm PLoS One 7 e35668

52 BassilNV DavisTM ZhangH FicklinS MittmannMWebsterT MahoneyL WoodD AlperinES RosyaraUR et al(2015) Development and preliminary evaluation of a 90 K AxiomregSNP array for the allo-octoploid cultivated strawberry Fragaria timesananassa BMC Genomics 16 155

53 EvansK (2013) Apple Breeding in Pacific Northwest Acta Hortic976 75ndash78

54 RifeTW and PolandJA (2014) Field Book an open-sourceapplication for field data collection on android Crop Sci 541624ndash1627

55 JungS LeeT ChengC-H FicklinS YuJ HumannJ andMainD (2017) Extension modules for storage visualization andquerying of genomic genetic and breeding data in Tripal databasesDatabase (Oxford) 2017 bax092

56 ChenM HenryN AlmsaeedA ZhouX WegrzynJ FicklinSand StatonM (2017) New extension software modules to enhancesearching and display of transcriptome data in Tripal databasesDatabase (Oxford) 2017 bax052

57 HarperL CampbellJ CannonEKS JungS PoelchauMWallsR AndorfC ArnaudE BerardiniTZ BirkettC et al(2018) AgBioData consortium recommendations for sustainablegenomics and genetics databases for agriculture Database (Oxford)2018 bay088

D1140 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 1 Synteny Viewer in GDR (A) Home page of Synteny Viewer allows users to choose a reference genome and a chromosome and multiple genomesfor comparison Users can also choose a synteny block ID (B) A circular diagram and a table shows the synteny blocks between a chromosome of areference genome and all chromosomes of another genome being compared (C) A bar diagram and a table that shows all the genes in a syntenic blockThe table displays E-value between the matching genes and the gene names have hyperlinks to the gene detail page (D) A gene detail page with resourceside bar and the hyperlink to JBrowse (E) JBrowse around the mRNA of interest with tracks such as gene mRNA SNP and genes parsed from NCBInucleotide database

ity control of the ESTs included vector sequence screen-ing (UniVec CoreftpftpncbinihgovpubUniVec) usingcross match (36) removal of tRNArRNAsnRNA se-quences identified using tblastx (37) and Poly-A tail trim-ming The filtered ESTs were assembled using the CAP3program (P minus90 38) Bowtie (v 233) (39) was applied tomulti-map the RNA-Seq reads and ESTs back to the as-sembled contigs and singlets The contigs and singlets wereclustered into genes using CH-HIT (v464 40) and Corset(v107) (41) with default parameters The longest isoformgreater than 500 nt was selected to represent each Corsetcluster and create the RefTrans sequences The RefTran se-quences are functionally characterized by pairwise compar-ison using the BLASTX algorithm against the Swiss-Protand TrEMBL protein databases Information on the top tenmatches with an E-value of le 1E-06 are recorded and storedin GDR together with the RefTrans sequences InterPro do-mains and Gene Ontology assignments are made using In-terProScan at the EBI through Blast2GO Transcriptomesand their associated annotation are available to downloadin the transcriptome page that can be accessed from eachspecies page to search and download in the Search Genesand Transcripts page to view on the genome in JBrowseand to perform similarity searches in the BLAST serverCurrently available datasets include F vesca GDR RefTransv10 P avium GDR RefTrans v10 P persica GDR Ref-Trans v10 Malus times domestica GDR RefTrans v10 andRubus GDR RefTrans v20 The GDR continues to pro-vide unigene v50 for each genus (Prunus Malus Fragaria

Rosa and Pyrus) composed of singlets and contigs assem-bled from the publicly available Rosaceae ESTs downloadedfrom dbEST at NCBI Unigene v50 for the entire Rosaceaeis composed of the assembled contigs and singlets for thefive genera

GDR also hosts user-provided assembled transcriptssuch as a rose transcriptome which are orthologous openreading frame sequences of roses used to construct the Wa-gRhSNP 68K Axiom SNP array (42) black cherry Prunusserotina unigene v10 from Pennsylvania State Universityand assembled transcript sequences of red raspberry Rubusidaeus (43) In addition the GDR provides links to RNA-Seq and DNA datasets available in the NCBI SRA forRosaceae genus or major species

Genetic maps genetic diversity data markers and trait loci

Genetic maps With continuous effort on curating peer-reviewed published data GDR now contains 313 geneticmaps for Rosaceae species including 168 for Prunus 108for Malus 27 for Fragaria 15 for Rosa 13 for Rubus and9 for Pyrus The data associated with genetic maps in-clude mapped positions of molecular markers QTLs andheritable phenotypic markers as well as mapping popu-lation(s) and publication(s) GDR has a new graphic in-terface MapViewer (httptripalinfoextensionsmodulestripalmap) to view genetic maps MapViewer allows usersto view and compare maps from different populations andspecies facilitating information transfer from well-studiedspecies to less-studied ones These comparisons are espe-

Nucleic Acids Research 2019 Vol 47 Database issue D1141

cially useful in Rosaceae due to well conserved syntenyamong the genomes of Rosaceae genera (144) While thefunctionality of MapViewer is similar to CMap (45) a com-monly used tool in biological databases including the GDRMapViewer is much more integrated with other pages suchas the map marker and QTL pages In addition MapViewerallows users to zoom into specific regions of a linkage groupchoose types of markers to be displayed and change the col-ors of the markers that are displayed

Genetic diversity data The GDR contains DNA polymor-phism data from various published genetic diversity studiesand public breeding projects such as RosBREED (46) Cur-rently data from 24 diversity projects are available thirteenfrom Prunus ten from MalusPyrus and one from Mala-comeles (false serviceberry) The GDR provides separatesearch pages for the SNP and the SSR genotypic data toprovide appropriate query and downloadable result tablesdepending on the data type There are 19 and 5 datasetsavailable for SSR and SNP genotype search pages respec-tively In the SNP genotype search page users can filterresults by dataset name species germplasm name SNPname genomic location andor gene name (Figure 2A)Users can also upload a file with germplasm names Thisfiltering allows users to perform tailored querying such asfinding SNP polymorphisms around a gene of interest in achosen set of germplasm The results table provides SNPname genomic location allele and genotypic data of allof the germplasm chosen in the order of SNP location inthe genome so that users can view the genotype of eachgermplasm along the chromosome (Figure 2B) Users candownload the genotypes for all markers displayed in the re-sults page or the genotypes for only the markers that arepolymorphic within the germplasm set chosen (Figure 2B)

Genetic marker and SNP array data The GDR pro-vides details on more than 3 million genetic markersused in genetic map development genetic diversity studiesgenome wide association studies and SNP array develop-ment Marker annotations include marker aliases sourcegermplasm source description primer sequences poly-merase chain reaction conditions literature references andmap position where available For SNPs the marker de-tails also include SNP array name SNP array ID dbSNPID alleles flanking sequences and probes SNP marker dataavailable from GDR include those from array developmentprojects such as the 9K (47) 20K (48) and 480K (49) arraysfor apple IRSC cherry 6K array (50) IPSC peach 9K array(51) 90K array for cultivated strawberry (52) and 68K arrayfor rose (42) The SNP array data are available to downloadin Microsoft Excel format from the genome pages to view inJBrowse as well as to query in the Marker Search page TheGDR provides a new lsquoSNP Marker Searchrsquo page in additionto the existing lsquoMarker Searchrsquo and lsquoSearch Nearby Mark-ersrsquo pages The search filter feature in the lsquoMarker Searchrsquopage includes marker name marker type the species fromwhich the marker is developed the species to which themarker is mapped trait name and map position in the ge-netic map and genome Filtering by trait name is a new fea-ture that allows users to search for markers that are nearandor within QTLs using the associated trait name The

table in the results page shows marker name alias markertype species genetic map location and genome locationThe downloaded file contains the same information as wellas the citation The lsquoSNP Marker Searchrsquo is designed so thatusers can filter using array information as well as SNP nameand genomic location The results table is also specific forSNP with alleles SNP array information genome locationand flanking sequences In both search pages users can up-load a file of marker names for querying Another search in-terface lsquoSearch Nearby Markersrsquo allows users to find mark-ers near a targeted locus

Trait locus data and Rosaceae Trait Ontology The GDRcontains detailed data on published trait loci QTLs andMTLs (Mendelian Trait Loci) These data include 3799QTLs and 103 MTLs that are associated with 392 horticul-tural traits such as powdery mildew disease resistance fruitskin color ripening time and volatile organic compoundcontent These trait terms constitute the Rosaceae Trait On-tology The Rosaceae Trait Ontology has been built to stan-dardize trait names and abbreviations for all trait data en-tered into the GDR and to connect trait terms to TO (5)with the goal of data integration across databases organ-isms and data types Each of these trait terms is either anexisting TO term or a child term of one One trait term canbelong to multiple Root TO terms New Rosaceae Trait On-tology terms which do not exist in TO have been submit-ted to the Trait Ontology consortium for inclusion In ad-dition to the Rosaceae Trait Ontology QTLs are annotatedin the GDR with aliases curator-assigned QTL labels pub-lished symbols trait names taxa trait descriptions screen-ing methods map positions associated markers statisticalvalues datasets contact information and references Thesearch page for trait loci allows searching by trait locus type(QTL or MTL) species trait category (root TO term) traitname published symbol and GDR-assigned label

Breeding data

Breeding data stored in the GDR include phenotypic geno-typic germplasm and pedigree data from the RosBREEDproject (46) the Washington State University Apple Breed-ing Program (53) and private breeding programs Publiclyavailable breeding data can be queried using the lsquoSearchTrait Evaluationrsquo and lsquoSearch Genotypersquo pages The lsquoSearchTrait Evaluationrsquo page provides two tabs for querying qual-itative or quantitative traits In each tab users can filter thedata by crop dataset name and trait cut-off values of up tothree trait descriptors The result table and downloadablefile has germplasm name species the trait values chosenand the dataset name The germplasm and the dataset namein the results table are linked to the detail page where otherassociated data can be accessed The germplasm page has aresource sidebar for genotypic and phenotypic data as wellas an overview and associated images where the data canbe viewed and downloaded In addition the public breed-ing data can be queried and downloaded using the BIMSBIMS is a new Tripal module (httptripalinfoextensionsmodulestripal-bims) we developed to provide breeders andbreeding project teams with tools to store manage archiveand analyze their private or public breeding data BIMS is

D1142 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 2 SNP Genotype Search Page in GDR (A) Users can search SNP genotype data by dataset name species germplasm name SNP name genomelocation andor gene name Users can also upload a file with a list of germplasm names (B) Search result table that shows SNP name genome locationallele and the genotype data of all the germplasm chosen in the order of SNP location in the genome The red square highlights the options to downloadthe genotype for all the marker displayed in the result page or the genotype data that are polymorphic in the germplasm set chosen

compatible with Field Book (54) an open-source App forandroid phones and tablets that allows researchers to re-place hard-copy field books or spreadsheet files BIMS al-lows breeders to upload data collected with the Field Bookapp directly to their private secure database in the GDRinto a consistent format Having BIMS in the GDR givesbreeders the opportunity to integrate their data with pub-licly available data and easily release some or all of their datapublicly as needed Currently the sweet cherry phenotypicdata from the RosBREED project is available for public ac-cess through BIMS In BIMS an accordion menu on theleft side provides quick access to various functionalities ThelsquoData Importrsquo section provides data templates for users toenter their data and upload the data file themselves (Figure3A) The lsquoSearchrsquo section allows users to search and save thelist of germplasm individuals (lsquoaccessionsrsquo) using any com-bination of properties and trait cut-off values name triallocation cross parent and trait values (Figure 3B) Whena filter is applied to choose accessions the rightmost sec-tion shows the number of accessions belonging to the fil-tered dataset When a trait descriptor is chosen as a filterthe middle section shows a histogram along with the statis-tical values such as maximum minimum mean and stan-dard deviation to show users the distribution of data pointswithin the dataset chosen (Figure 3B) The list of the ac-cession names chosen can be viewed and downloaded ina table with an option of adding more data on the acces-sions such as parents cross number and trait values (Figure3C) The list of accessions can be saved in user accounts andcan be used to retrieve any data associated with the acces-sions The lsquoData Analysisrsquo section allows users to choosetwo datasets using the categories or saved accession listsand compare the trait statistics between the two datasets(Figure 3D) This analysis function allows users to comparevarious traits of two sets of accessions such as progeniesfrom two different crosses

Community resources and data submission

The GDR continues to provide community-based resourcesunder the community navigation menu including pages forthe US RosEXEC RosIGI and conference employmentnotices and mailing lists The US RosEXEC (Rosaceae Ge-nomics Genetics and Breeding Executive Committee) andRosIGI (Rosaceae International Genomics Initiative) serveas communication and coordination focal points for the re-search community with the former also operating as theSteering Committee for the GDR The US RosEXEC andRosIGI pages provide official documents minutes of quar-terly meetings membership lists and subcommittee infor-mation Several mailing lists in addition to the GDR mail-ing list are available to serve the community with informa-tion for specific interests or purposes and the archives canbe viewed through the message board sites

While the GDR team actively curates data from pub-lications we encourage authors to officially archive theirdatasets by submitting their data at the time of manuscriptsubmission The Data Submission page under the Data nav-igation menu provides various data templates for genes ge-netic maps QTL maps as well as genotypic and phenotypicdata In addition the page has a link to a page where the rec-ommended list of files for whole genome data submissionare available

CONCLUSION AND FUTURE DIRECTION

Recent availability of multiple genome assemblies and an-notation data from seven major crops and 14 species in theRosaceae family opened up the opportunity to investigatethe evolution and biological basis of a wide range of plantforms and functions as well as to share knowledge amongmajor rosaceous crops to improve their performance Withthe goal of facilitating these efforts the GDR focused onintegrating the wealth of the new genomic data with tran-scriptomic genetic map genetic marker trait locus pheno-typic and genotypic data To accommodate scientistsrsquo data-

Nucleic Acids Research 2019 Vol 47 Database issue D1143

Figure 3 BIMS in GDR (A) lsquoTemplate Listrsquo subsection in lsquoData Importrsquo section provide downloadable templates for users to enter various breedingdata (B) Searchrsquo section allows users to search and save the list of accessions using any combination of properties and trait cut of values name triallocation cross parent and trait values The middle section shows the statistical information on the filtered dataset for the trait chosen and the right sectionshows the number of accessions filtered so far (C) A page with the search result table Users can add more columns in the table using lsquoColumn optionsrsquoand savedownload the result table (D) lsquoData Analysisrsquo section that allows users to choose two datasets using the categories or saved accession lists andcompare the trait statistics between the two datasets

mining needs that came with these new types and large vol-ume of data we made various new web interfaces availableThe new web interfaces are either new Tripal modules thatwe developed such as MapViewer BIMS Chado LoaderChado Data Display and Chado Search modules (55) orTripal modules that other database teams developed suchas the Synteny Viewer and Tripal BLAST The open-sourcedatabase platform Tripal allows us to meet emerging de-mands for storage querying and display of new data typesmore efficiently and quickly

During the last 5 years we also made advances in curat-ing data with more ontologies and incorporating ontologiesin data query pages to facilitate data-sharing across differ-ent data types species and databases Future effort will in-clude further developing MapViewer to integrate genomedata and genetic maps providing an enhanced querying in-terface expanding analysis capabilities in BIMS throughaccess to additional functionality integrating Galaxy anal-ysis platform through Tripal Galaxy module (httpswwwdrupalorgprojecttripal galaxy) providing access toHigh Performance Computing and enabling cross-Tripaldatabases querying capability using Tripal Elasticsearch(56) and Tripal Exchange (httptripalinfotutorialsv3xweb-services)

Use of the GDR the community resource for Rosaceaegenomics genetics and breeding research has continuedto grow over the last 5 years Between September 1 2013and August 31 2018 the GDR had 256075 visits by 95259unique visitors from 190 countries who accessed 1547877pages The GDR is part of AgBioData (57) (httpswwwagbiodataorg) a consortium working to improve stan-

dards and sustainability of genomics genetics and breedingdatabases and further enable agricultural science

ACKNOWLEDGEMENTS

The authors acknowledge with thanks their fundingsources the Rosaceae research community for providingdata support and feedback the Tripal and GMOD com-munity of developers for developing and sharing Tripalmodules and code the AgBioData Consortium and USLand Grant Universities for support

FUNDING

USDA National Institute of Food and Agriculture Spe-cialty Crop Research Initiative projects [2014-51181-2237 2014-51181-22378] USDA National Institute ofFood and Agriculture National Research Support Project10 NSF PGRP award 444573 NSF CIF21 DIBBs award1443040 Washington Tree Fruit Research CommissionWashington State University Clemson University Fundingfor open access charge Federal Grant USDA NIFA SCRIGrantConflict of interest statement None declared

REFERENCES1 ShulaevV KorbanSS SosinskiB AbbottAG AldwinckleHS

FoltaKM IezzoniA MainD ArusP DandekarAM et al(2008) Multiple models for rosaceae genomics Plant Physiol 147985ndash1003

D1144 Nucleic Acids Research 2019 Vol 47 Database issue

2 JungS JesuduraiC StatonM DuZ FicklinS ChoIAbbottA TomkinsJ and MainD (2004) GDR (Genome Databasefor Rosaceae) integrated web resources for Rosaceae genomics andgenetics research BMC Bioinformatics 5 130

3 JungS StatonM LeeT BlendaA SvancaraR AbbottA andMainD (2008) GDR (Genome Database for Rosaceae) integratedweb-database for Rosaceae genomics and genetics data Nucleic AcidsRes 36 D1034ndashD1040

4 JungS FicklinSP LeeT ChengC-H BlendaA ZhengP YuJBombarelyA ChoI RuS et al (2014) The Genome Database forRosaceae (GDR) year 10 update Nucleic Acids Res 42D1237ndashD1244

5 CooperL MeierA LaporteM-A ElserJL MungallCSinnBT CavaliereD CarbonS DunnNA SmithB et al (2018)The Planteome database an integrated resource for referenceontologies plant genomics and phenomics Nucleic Acids Res 46D1168ndashD1180

6 SandersonL-A FicklinSP ChengC-H JungS FeltusFABettKE and MainD (2013) Tripal v11 a standards-based toolkitfor construction of online genetic and genomic databases Database(Oxford) 2013 bat075

7 FicklinSP SandersonL-A ChengC-H StatonME LeeTChoI-H JungS BettKE and MainD (2011) Tripal aconstruction toolkit for online genome databases Database(Oxford) 2011 bar044

8 EdgerPP VanBurenR ColleM PoortenTJ WaiCMNiederhuthCE AlgerEI OuS AcharyaCB WangJ et al(2018) Single-molecule sequencing and optical mapping yields animproved genome of woodland strawberry (Fragaria vesca) withchromosome-scale contiguity Gigascience 7 1ndash7

9 HirakawaH ShirasawaK KosugiS TashiroK NakayamaSYamadaM KoharaM WatanabeA KishidaY FujishiroT et al(2014) Dissection of the octoploid strawberry genome by deepsequencing of the genomes of fragaria species DNA Res 21169ndash181

10 ShulaevV SargentDJ CrowhurstRN MocklerTC FolkertsODelcherAL JaiswalP MockaitisK ListonA ManeSP et al(2010) The genome of woodland strawberry (Fragaria vesca) NatGenet 43 109ndash116

11 TennessenJA GovindarajuluR AshmanT-L and ListonA(2014) Evolutionary origins and dynamics of octoploid strawberrysubgenomes revealed by dense targeted capture linkage mapsGenome Biol Evol 6 3295ndash3313

12 DarwishO ShahanR LiuZ SlovinJP and AlkharoufNW(2015) Re-annotation of the woodland strawberry (Fragaria vesca)genome BMC Genomics 16 29

13 LiY WeiW FengJ LuoH PiM LiuZ and KangC (2018)Genome re-annotation of the wild strawberry Fragaria vesca usingextensive Illumina- and SMRT-based RNA-seq datasets DNA Res25 61ndash70

14 ButiM MorettoM BarghiniE MascagniF NataliL BrilliMLomsadzeA SonegoP GiongoL AlongeM et al (2018) Thegenome sequence and transcriptome of Potentilla micrantha andtheir comparison to Fragaria vesca (the woodland strawberry)Gigascience 7 1ndash14

15 DaccordN CeltonJ-M LinsmithG BeckerC ChoisneNSchijlenE van de GeestH BiancoL MichelettiD VelascoRet al (2017) High-quality de novo assembly of the apple genome andmethylome dynamics of early fruit development Nat Genet 491099ndash1106

16 VelascoR ZharkikhA AffourtitJ DhingraA CestaroAKalyanaramanA FontanaP BhatnagarSK TroggioM PrussDet al (2010) The genome of the domesticated apple (Malus timesdomestica Borkh) Nat Genet 42 833ndash839

17 The International Peach Genome Initiative VerdeI AbbottAGScalabrinS JungS ShuS MarroniF ZhebentyayevaTDettoriMT GrimwoodJ et al (2013) The high-quality draftgenome of peach (Prunus persica) identifies unique patterns ofgenetic diversity domestication and genome evolution Nat Genet45 487ndash494

18 VerdeI JenkinsJ DondiniL MicaliS PagliaraniGVendraminE ParisR AraminiV GazzaL RossiniL et al(2017) The Peach v20 release high-resolution linkage mapping and

deep resequencing improve chromosome-scale assembly andcontiguity BMC Genomics 18 225

19 ShirasawaK IsuzugawaK IkenagaM SaitoY YamamotoTHirakawaH and IsobeS (2017) The genome sequence of sweetcherry (Prunus avium) for use in genomics-assisted breeding DNARes 24 499ndash508

20 ChagneD CrowhurstRN PindoM ThrimawithanaA DengCIrelandH FiersM DzierzonH CestaroA FontanaP et al(2014) The draft genome sequence of European pear (Pyruscommunis L rsquoBartlettrsquo) PLoS One 9 e92644

21 HibrandL RuttinkT HamamaL KirovI LakhwaniDZhouN-N BourkeP DaccordN LeusL SchulzD et al (2018)A high-quality sequence of Rosa chinensis to elucidate genomestructure and ornamental traits Nat Plants 4 473ndash484

22 RaymondO GouzyJ JustJ BadouinH VerdenaudMLemainqueA VergneP MojaS ChoisneN PontC et al (2018)The Rosa genome provides new insights into the domestication ofmodern roses Nat Genet 50 772ndash777

23 NakamuraN HirakawaH SatoS OtagakiS MatsumotoSTabataS and TanakaY (2018) Genome structure of Rosa multifloraa wild ancestor of cultivated roses DNA Res 25 113ndash121

24 JibranR DzierzonH BassilN BushakraJM EdgerPPSullivanS FinnCE DossettM ViningKJ VanBurenR et al(2018) Chromosome-scale scaffolding of the black raspberry (Rubusoccidentalis L) genome based on chromatin interaction data HorticRes 5 8

25 VanBurenR BryantD BushakraJM ViningKJ EdgerPPRowleyER PriestHD MichaelTP LyonsE FilichkinSAet al (2016) The genome of black raspberry (Rubus occidentalis)Plant J 87 535ndash547

26 FinnRD AttwoodTK BabbittPC BatemanA BorkPBridgeAJ ChangH-Y DosztanyiZ El-GebaliS FraserM et al(2017) InterPro in 2017ndashndashbeyond protein family and domainannotations Nucleic Acids Res 45 D190ndashD199

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene Ontology tool for the unification of biology NatGenet 25 25ndash29

28 The Gene Ontology Consortium (2017) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338

29 WangY TangH DeBarryJD TanX LiJ WangX LeeT -hJinH MarlerB GuoH et al (2012) MCScanX a toolkit fordetection and evolutionary analysis of gene synteny and collinearityNucleic Acids Res 40 e49

30 BuelsR YaoE DieshCM HayesRD Munoz-TorresMHeltG GoodsteinDM ElsikCG LewisSE SteinL et al(2016) JBrowse a dynamic web platform for genome visualizationand analysis Genome Biol 17 66

31 CamachoC CoulourisG AvagyanV MaN PapadopoulosJBealerK and MaddenTL (2009) BLAST+ architecture andapplications BMC Bioinformatics 10 421

32 SchlapferP ZhangP WangC KimT BanfM ChaeLDreherK ChavaliAK Nilo-PoyancoR BernardT et al (2017)Genome-wide prediction of metabolic enzymes pathways and geneclusters in plants Plant Physiol 173 2041ndash2059

33 CaspiR DreherK and KarpPD (2013) The challenge ofconstructing classifying and representing metabolic pathwaysFEMS Microbiol Lett 345 85ndash93

34 BolgerAM LohseM and UsadelB (2014) Trimmomatic a flexibletrimmer for Illumina sequence data Bioinformatics 30 2114ndash2120

35 GrabherrMG HaasBJ YassourM LevinJZ ThompsonDAAmitI AdiconisX FanL RaychowdhuryR ZengQ et al(2011) Full-length transcriptome assembly from RNA-Seq datawithout a reference genome Nat Biotechnol 29 644ndash652

36 GordonD AbajianC and GreenP (1998) Consed a graphical toolfor sequence finishing Genome Res 8 195ndash202

37 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

38 HuangX and MadanA (1999) CAP3 A DNA sequence assemblyprogram Genome Res 9 868ndash877

39 LangmeadB TrapnellC PopM and SalzbergSL (2009) Ultrafastand memory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10 R25

Nucleic Acids Research 2019 Vol 47 Database issue D1145

40 FuL NiuB ZhuZ WuS and LiW (2012) CD-HIT acceleratedfor clustering the next-generation sequencing data Bioinformatics28 3150ndash3152

41 DavidsonNM and OshlackA (2014) Corset enabling differentialgene expression analysis for de novo assembled transcriptomesGenome Biol 15 410

42 Koning-BoucoiranCFS EsselinkGD VukosavljevM van rsquotWestendeWPC GitongaVW KrensFA VoorripsRE van deWegWE SchulzD DebenerT et al (2015) Using RNA-Seq toassemble a rose transcriptome with more than 13000 full-lengthexpressed genes and to develop the WagRhSNP 68k Axiom SNParray for rose (Rosa L) Front Plant Sci 6 249

43 FuentesL MonsalveL Morales-QuintanaL ValdenegroMMartınezJ-P DefilippiBG and Gonzalez-AgueroM (2015)Differential expression of ethylene biosynthesis genes in drupelets andreceptacle of raspberry (Rubus idaeus) J Plant Physiol 179100ndash105

44 JungS CestaroA TroggioM MainD ZhengP ChoIFoltaKM SosinskiB AbbottA CeltonJ-M et al (2012) Wholegenome comparisons of Fragaria Prunus and Malus reveal differentmodes of evolution between Rosaceous subfamilies BMC Genomics13 129

45 Youens-ClarkK FagaB YapIV SteinL and WareD (2009)CMap 101 a comparative mapping application for the InternetBioinformatics 25 3040ndash3042

46 IezzoniA PeaceC MainD BassilN CoeM FinnC GasicKLubyJ HokansonS McFersonJ et al (2017) RosBREED2progress and future plans to enable DNA-informed breeding in theRosaceae Acta Hortic doi1017660ActaHortic201711722

47 ChagneD CrowhurstRN TroggioM DaveyMW GilmoreBLawleyC VanderzandeS HellensRP KumarS CestaroA et al(2012) Genome-wide SNP detection validation and development ofan 8K SNP array for apple PLoS One 7 e31745

48 BiancoL CestaroA SargentDJ BanchiE DerdakS DiGuardoM SalviS JansenJ ViolaR GutI et al (2014)Development and validation of a 20K single nucleotidepolymorphism (SNP) whole genome genotyping array for apple(Malus times domestica Borkh) PLoS One 9 e110377

49 BiancoL CestaroA LinsmithG MurantyH DenanceCTheronA PoncetC MichelettiD KerschbamerE Di PierroEAet al (2016) Development and validation of the Axiom(reg)Apple480K SNP genotyping array Plant J 86 62ndash74

50 PeaceC BassilN MainD FicklinS RosyaraUR StegmeirTSeboltA GilmoreB LawleyC MocklerTC et al (2012)Development and evaluation of a genome-wide 6K SNP array fordiploid sweet cherry and tetraploid sour cherry PLoS One 7 e48305

51 VerdeI BassilN ScalabrinS GilmoreB LawleyCT GasicKMichelettiD RosyaraUR CattonaroF VendraminE et al(2012) Development and evaluation of a 9K SNP array for peach byinternationally coordinated SNP detection and validation in breedinggermplasm PLoS One 7 e35668

52 BassilNV DavisTM ZhangH FicklinS MittmannMWebsterT MahoneyL WoodD AlperinES RosyaraUR et al(2015) Development and preliminary evaluation of a 90 K AxiomregSNP array for the allo-octoploid cultivated strawberry Fragaria timesananassa BMC Genomics 16 155

53 EvansK (2013) Apple Breeding in Pacific Northwest Acta Hortic976 75ndash78

54 RifeTW and PolandJA (2014) Field Book an open-sourceapplication for field data collection on android Crop Sci 541624ndash1627

55 JungS LeeT ChengC-H FicklinS YuJ HumannJ andMainD (2017) Extension modules for storage visualization andquerying of genomic genetic and breeding data in Tripal databasesDatabase (Oxford) 2017 bax092

56 ChenM HenryN AlmsaeedA ZhouX WegrzynJ FicklinSand StatonM (2017) New extension software modules to enhancesearching and display of transcriptome data in Tripal databasesDatabase (Oxford) 2017 bax052

57 HarperL CampbellJ CannonEKS JungS PoelchauMWallsR AndorfC ArnaudE BerardiniTZ BirkettC et al(2018) AgBioData consortium recommendations for sustainablegenomics and genetics databases for agriculture Database (Oxford)2018 bay088

Nucleic Acids Research 2019 Vol 47 Database issue D1141

cially useful in Rosaceae due to well conserved syntenyamong the genomes of Rosaceae genera (144) While thefunctionality of MapViewer is similar to CMap (45) a com-monly used tool in biological databases including the GDRMapViewer is much more integrated with other pages suchas the map marker and QTL pages In addition MapViewerallows users to zoom into specific regions of a linkage groupchoose types of markers to be displayed and change the col-ors of the markers that are displayed

Genetic diversity data The GDR contains DNA polymor-phism data from various published genetic diversity studiesand public breeding projects such as RosBREED (46) Cur-rently data from 24 diversity projects are available thirteenfrom Prunus ten from MalusPyrus and one from Mala-comeles (false serviceberry) The GDR provides separatesearch pages for the SNP and the SSR genotypic data toprovide appropriate query and downloadable result tablesdepending on the data type There are 19 and 5 datasetsavailable for SSR and SNP genotype search pages respec-tively In the SNP genotype search page users can filterresults by dataset name species germplasm name SNPname genomic location andor gene name (Figure 2A)Users can also upload a file with germplasm names Thisfiltering allows users to perform tailored querying such asfinding SNP polymorphisms around a gene of interest in achosen set of germplasm The results table provides SNPname genomic location allele and genotypic data of allof the germplasm chosen in the order of SNP location inthe genome so that users can view the genotype of eachgermplasm along the chromosome (Figure 2B) Users candownload the genotypes for all markers displayed in the re-sults page or the genotypes for only the markers that arepolymorphic within the germplasm set chosen (Figure 2B)

Genetic marker and SNP array data The GDR pro-vides details on more than 3 million genetic markersused in genetic map development genetic diversity studiesgenome wide association studies and SNP array develop-ment Marker annotations include marker aliases sourcegermplasm source description primer sequences poly-merase chain reaction conditions literature references andmap position where available For SNPs the marker de-tails also include SNP array name SNP array ID dbSNPID alleles flanking sequences and probes SNP marker dataavailable from GDR include those from array developmentprojects such as the 9K (47) 20K (48) and 480K (49) arraysfor apple IRSC cherry 6K array (50) IPSC peach 9K array(51) 90K array for cultivated strawberry (52) and 68K arrayfor rose (42) The SNP array data are available to downloadin Microsoft Excel format from the genome pages to view inJBrowse as well as to query in the Marker Search page TheGDR provides a new lsquoSNP Marker Searchrsquo page in additionto the existing lsquoMarker Searchrsquo and lsquoSearch Nearby Mark-ersrsquo pages The search filter feature in the lsquoMarker Searchrsquopage includes marker name marker type the species fromwhich the marker is developed the species to which themarker is mapped trait name and map position in the ge-netic map and genome Filtering by trait name is a new fea-ture that allows users to search for markers that are nearandor within QTLs using the associated trait name The

table in the results page shows marker name alias markertype species genetic map location and genome locationThe downloaded file contains the same information as wellas the citation The lsquoSNP Marker Searchrsquo is designed so thatusers can filter using array information as well as SNP nameand genomic location The results table is also specific forSNP with alleles SNP array information genome locationand flanking sequences In both search pages users can up-load a file of marker names for querying Another search in-terface lsquoSearch Nearby Markersrsquo allows users to find mark-ers near a targeted locus

Trait locus data and Rosaceae Trait Ontology The GDRcontains detailed data on published trait loci QTLs andMTLs (Mendelian Trait Loci) These data include 3799QTLs and 103 MTLs that are associated with 392 horticul-tural traits such as powdery mildew disease resistance fruitskin color ripening time and volatile organic compoundcontent These trait terms constitute the Rosaceae Trait On-tology The Rosaceae Trait Ontology has been built to stan-dardize trait names and abbreviations for all trait data en-tered into the GDR and to connect trait terms to TO (5)with the goal of data integration across databases organ-isms and data types Each of these trait terms is either anexisting TO term or a child term of one One trait term canbelong to multiple Root TO terms New Rosaceae Trait On-tology terms which do not exist in TO have been submit-ted to the Trait Ontology consortium for inclusion In ad-dition to the Rosaceae Trait Ontology QTLs are annotatedin the GDR with aliases curator-assigned QTL labels pub-lished symbols trait names taxa trait descriptions screen-ing methods map positions associated markers statisticalvalues datasets contact information and references Thesearch page for trait loci allows searching by trait locus type(QTL or MTL) species trait category (root TO term) traitname published symbol and GDR-assigned label

Breeding data

Breeding data stored in the GDR include phenotypic geno-typic germplasm and pedigree data from the RosBREEDproject (46) the Washington State University Apple Breed-ing Program (53) and private breeding programs Publiclyavailable breeding data can be queried using the lsquoSearchTrait Evaluationrsquo and lsquoSearch Genotypersquo pages The lsquoSearchTrait Evaluationrsquo page provides two tabs for querying qual-itative or quantitative traits In each tab users can filter thedata by crop dataset name and trait cut-off values of up tothree trait descriptors The result table and downloadablefile has germplasm name species the trait values chosenand the dataset name The germplasm and the dataset namein the results table are linked to the detail page where otherassociated data can be accessed The germplasm page has aresource sidebar for genotypic and phenotypic data as wellas an overview and associated images where the data canbe viewed and downloaded In addition the public breed-ing data can be queried and downloaded using the BIMSBIMS is a new Tripal module (httptripalinfoextensionsmodulestripal-bims) we developed to provide breeders andbreeding project teams with tools to store manage archiveand analyze their private or public breeding data BIMS is

D1142 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 2 SNP Genotype Search Page in GDR (A) Users can search SNP genotype data by dataset name species germplasm name SNP name genomelocation andor gene name Users can also upload a file with a list of germplasm names (B) Search result table that shows SNP name genome locationallele and the genotype data of all the germplasm chosen in the order of SNP location in the genome The red square highlights the options to downloadthe genotype for all the marker displayed in the result page or the genotype data that are polymorphic in the germplasm set chosen

compatible with Field Book (54) an open-source App forandroid phones and tablets that allows researchers to re-place hard-copy field books or spreadsheet files BIMS al-lows breeders to upload data collected with the Field Bookapp directly to their private secure database in the GDRinto a consistent format Having BIMS in the GDR givesbreeders the opportunity to integrate their data with pub-licly available data and easily release some or all of their datapublicly as needed Currently the sweet cherry phenotypicdata from the RosBREED project is available for public ac-cess through BIMS In BIMS an accordion menu on theleft side provides quick access to various functionalities ThelsquoData Importrsquo section provides data templates for users toenter their data and upload the data file themselves (Figure3A) The lsquoSearchrsquo section allows users to search and save thelist of germplasm individuals (lsquoaccessionsrsquo) using any com-bination of properties and trait cut-off values name triallocation cross parent and trait values (Figure 3B) Whena filter is applied to choose accessions the rightmost sec-tion shows the number of accessions belonging to the fil-tered dataset When a trait descriptor is chosen as a filterthe middle section shows a histogram along with the statis-tical values such as maximum minimum mean and stan-dard deviation to show users the distribution of data pointswithin the dataset chosen (Figure 3B) The list of the ac-cession names chosen can be viewed and downloaded ina table with an option of adding more data on the acces-sions such as parents cross number and trait values (Figure3C) The list of accessions can be saved in user accounts andcan be used to retrieve any data associated with the acces-sions The lsquoData Analysisrsquo section allows users to choosetwo datasets using the categories or saved accession listsand compare the trait statistics between the two datasets(Figure 3D) This analysis function allows users to comparevarious traits of two sets of accessions such as progeniesfrom two different crosses

Community resources and data submission

The GDR continues to provide community-based resourcesunder the community navigation menu including pages forthe US RosEXEC RosIGI and conference employmentnotices and mailing lists The US RosEXEC (Rosaceae Ge-nomics Genetics and Breeding Executive Committee) andRosIGI (Rosaceae International Genomics Initiative) serveas communication and coordination focal points for the re-search community with the former also operating as theSteering Committee for the GDR The US RosEXEC andRosIGI pages provide official documents minutes of quar-terly meetings membership lists and subcommittee infor-mation Several mailing lists in addition to the GDR mail-ing list are available to serve the community with informa-tion for specific interests or purposes and the archives canbe viewed through the message board sites

While the GDR team actively curates data from pub-lications we encourage authors to officially archive theirdatasets by submitting their data at the time of manuscriptsubmission The Data Submission page under the Data nav-igation menu provides various data templates for genes ge-netic maps QTL maps as well as genotypic and phenotypicdata In addition the page has a link to a page where the rec-ommended list of files for whole genome data submissionare available

CONCLUSION AND FUTURE DIRECTION

Recent availability of multiple genome assemblies and an-notation data from seven major crops and 14 species in theRosaceae family opened up the opportunity to investigatethe evolution and biological basis of a wide range of plantforms and functions as well as to share knowledge amongmajor rosaceous crops to improve their performance Withthe goal of facilitating these efforts the GDR focused onintegrating the wealth of the new genomic data with tran-scriptomic genetic map genetic marker trait locus pheno-typic and genotypic data To accommodate scientistsrsquo data-

Nucleic Acids Research 2019 Vol 47 Database issue D1143

Figure 3 BIMS in GDR (A) lsquoTemplate Listrsquo subsection in lsquoData Importrsquo section provide downloadable templates for users to enter various breedingdata (B) Searchrsquo section allows users to search and save the list of accessions using any combination of properties and trait cut of values name triallocation cross parent and trait values The middle section shows the statistical information on the filtered dataset for the trait chosen and the right sectionshows the number of accessions filtered so far (C) A page with the search result table Users can add more columns in the table using lsquoColumn optionsrsquoand savedownload the result table (D) lsquoData Analysisrsquo section that allows users to choose two datasets using the categories or saved accession lists andcompare the trait statistics between the two datasets

mining needs that came with these new types and large vol-ume of data we made various new web interfaces availableThe new web interfaces are either new Tripal modules thatwe developed such as MapViewer BIMS Chado LoaderChado Data Display and Chado Search modules (55) orTripal modules that other database teams developed suchas the Synteny Viewer and Tripal BLAST The open-sourcedatabase platform Tripal allows us to meet emerging de-mands for storage querying and display of new data typesmore efficiently and quickly

During the last 5 years we also made advances in curat-ing data with more ontologies and incorporating ontologiesin data query pages to facilitate data-sharing across differ-ent data types species and databases Future effort will in-clude further developing MapViewer to integrate genomedata and genetic maps providing an enhanced querying in-terface expanding analysis capabilities in BIMS throughaccess to additional functionality integrating Galaxy anal-ysis platform through Tripal Galaxy module (httpswwwdrupalorgprojecttripal galaxy) providing access toHigh Performance Computing and enabling cross-Tripaldatabases querying capability using Tripal Elasticsearch(56) and Tripal Exchange (httptripalinfotutorialsv3xweb-services)

Use of the GDR the community resource for Rosaceaegenomics genetics and breeding research has continuedto grow over the last 5 years Between September 1 2013and August 31 2018 the GDR had 256075 visits by 95259unique visitors from 190 countries who accessed 1547877pages The GDR is part of AgBioData (57) (httpswwwagbiodataorg) a consortium working to improve stan-

dards and sustainability of genomics genetics and breedingdatabases and further enable agricultural science

ACKNOWLEDGEMENTS

The authors acknowledge with thanks their fundingsources the Rosaceae research community for providingdata support and feedback the Tripal and GMOD com-munity of developers for developing and sharing Tripalmodules and code the AgBioData Consortium and USLand Grant Universities for support

FUNDING

USDA National Institute of Food and Agriculture Spe-cialty Crop Research Initiative projects [2014-51181-2237 2014-51181-22378] USDA National Institute ofFood and Agriculture National Research Support Project10 NSF PGRP award 444573 NSF CIF21 DIBBs award1443040 Washington Tree Fruit Research CommissionWashington State University Clemson University Fundingfor open access charge Federal Grant USDA NIFA SCRIGrantConflict of interest statement None declared

REFERENCES1 ShulaevV KorbanSS SosinskiB AbbottAG AldwinckleHS

FoltaKM IezzoniA MainD ArusP DandekarAM et al(2008) Multiple models for rosaceae genomics Plant Physiol 147985ndash1003

D1144 Nucleic Acids Research 2019 Vol 47 Database issue

2 JungS JesuduraiC StatonM DuZ FicklinS ChoIAbbottA TomkinsJ and MainD (2004) GDR (Genome Databasefor Rosaceae) integrated web resources for Rosaceae genomics andgenetics research BMC Bioinformatics 5 130

3 JungS StatonM LeeT BlendaA SvancaraR AbbottA andMainD (2008) GDR (Genome Database for Rosaceae) integratedweb-database for Rosaceae genomics and genetics data Nucleic AcidsRes 36 D1034ndashD1040

4 JungS FicklinSP LeeT ChengC-H BlendaA ZhengP YuJBombarelyA ChoI RuS et al (2014) The Genome Database forRosaceae (GDR) year 10 update Nucleic Acids Res 42D1237ndashD1244

5 CooperL MeierA LaporteM-A ElserJL MungallCSinnBT CavaliereD CarbonS DunnNA SmithB et al (2018)The Planteome database an integrated resource for referenceontologies plant genomics and phenomics Nucleic Acids Res 46D1168ndashD1180

6 SandersonL-A FicklinSP ChengC-H JungS FeltusFABettKE and MainD (2013) Tripal v11 a standards-based toolkitfor construction of online genetic and genomic databases Database(Oxford) 2013 bat075

7 FicklinSP SandersonL-A ChengC-H StatonME LeeTChoI-H JungS BettKE and MainD (2011) Tripal aconstruction toolkit for online genome databases Database(Oxford) 2011 bar044

8 EdgerPP VanBurenR ColleM PoortenTJ WaiCMNiederhuthCE AlgerEI OuS AcharyaCB WangJ et al(2018) Single-molecule sequencing and optical mapping yields animproved genome of woodland strawberry (Fragaria vesca) withchromosome-scale contiguity Gigascience 7 1ndash7

9 HirakawaH ShirasawaK KosugiS TashiroK NakayamaSYamadaM KoharaM WatanabeA KishidaY FujishiroT et al(2014) Dissection of the octoploid strawberry genome by deepsequencing of the genomes of fragaria species DNA Res 21169ndash181

10 ShulaevV SargentDJ CrowhurstRN MocklerTC FolkertsODelcherAL JaiswalP MockaitisK ListonA ManeSP et al(2010) The genome of woodland strawberry (Fragaria vesca) NatGenet 43 109ndash116

11 TennessenJA GovindarajuluR AshmanT-L and ListonA(2014) Evolutionary origins and dynamics of octoploid strawberrysubgenomes revealed by dense targeted capture linkage mapsGenome Biol Evol 6 3295ndash3313

12 DarwishO ShahanR LiuZ SlovinJP and AlkharoufNW(2015) Re-annotation of the woodland strawberry (Fragaria vesca)genome BMC Genomics 16 29

13 LiY WeiW FengJ LuoH PiM LiuZ and KangC (2018)Genome re-annotation of the wild strawberry Fragaria vesca usingextensive Illumina- and SMRT-based RNA-seq datasets DNA Res25 61ndash70

14 ButiM MorettoM BarghiniE MascagniF NataliL BrilliMLomsadzeA SonegoP GiongoL AlongeM et al (2018) Thegenome sequence and transcriptome of Potentilla micrantha andtheir comparison to Fragaria vesca (the woodland strawberry)Gigascience 7 1ndash14

15 DaccordN CeltonJ-M LinsmithG BeckerC ChoisneNSchijlenE van de GeestH BiancoL MichelettiD VelascoRet al (2017) High-quality de novo assembly of the apple genome andmethylome dynamics of early fruit development Nat Genet 491099ndash1106

16 VelascoR ZharkikhA AffourtitJ DhingraA CestaroAKalyanaramanA FontanaP BhatnagarSK TroggioM PrussDet al (2010) The genome of the domesticated apple (Malus timesdomestica Borkh) Nat Genet 42 833ndash839

17 The International Peach Genome Initiative VerdeI AbbottAGScalabrinS JungS ShuS MarroniF ZhebentyayevaTDettoriMT GrimwoodJ et al (2013) The high-quality draftgenome of peach (Prunus persica) identifies unique patterns ofgenetic diversity domestication and genome evolution Nat Genet45 487ndash494

18 VerdeI JenkinsJ DondiniL MicaliS PagliaraniGVendraminE ParisR AraminiV GazzaL RossiniL et al(2017) The Peach v20 release high-resolution linkage mapping and

deep resequencing improve chromosome-scale assembly andcontiguity BMC Genomics 18 225

19 ShirasawaK IsuzugawaK IkenagaM SaitoY YamamotoTHirakawaH and IsobeS (2017) The genome sequence of sweetcherry (Prunus avium) for use in genomics-assisted breeding DNARes 24 499ndash508

20 ChagneD CrowhurstRN PindoM ThrimawithanaA DengCIrelandH FiersM DzierzonH CestaroA FontanaP et al(2014) The draft genome sequence of European pear (Pyruscommunis L rsquoBartlettrsquo) PLoS One 9 e92644

21 HibrandL RuttinkT HamamaL KirovI LakhwaniDZhouN-N BourkeP DaccordN LeusL SchulzD et al (2018)A high-quality sequence of Rosa chinensis to elucidate genomestructure and ornamental traits Nat Plants 4 473ndash484

22 RaymondO GouzyJ JustJ BadouinH VerdenaudMLemainqueA VergneP MojaS ChoisneN PontC et al (2018)The Rosa genome provides new insights into the domestication ofmodern roses Nat Genet 50 772ndash777

23 NakamuraN HirakawaH SatoS OtagakiS MatsumotoSTabataS and TanakaY (2018) Genome structure of Rosa multifloraa wild ancestor of cultivated roses DNA Res 25 113ndash121

24 JibranR DzierzonH BassilN BushakraJM EdgerPPSullivanS FinnCE DossettM ViningKJ VanBurenR et al(2018) Chromosome-scale scaffolding of the black raspberry (Rubusoccidentalis L) genome based on chromatin interaction data HorticRes 5 8

25 VanBurenR BryantD BushakraJM ViningKJ EdgerPPRowleyER PriestHD MichaelTP LyonsE FilichkinSAet al (2016) The genome of black raspberry (Rubus occidentalis)Plant J 87 535ndash547

26 FinnRD AttwoodTK BabbittPC BatemanA BorkPBridgeAJ ChangH-Y DosztanyiZ El-GebaliS FraserM et al(2017) InterPro in 2017ndashndashbeyond protein family and domainannotations Nucleic Acids Res 45 D190ndashD199

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene Ontology tool for the unification of biology NatGenet 25 25ndash29

28 The Gene Ontology Consortium (2017) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338

29 WangY TangH DeBarryJD TanX LiJ WangX LeeT -hJinH MarlerB GuoH et al (2012) MCScanX a toolkit fordetection and evolutionary analysis of gene synteny and collinearityNucleic Acids Res 40 e49

30 BuelsR YaoE DieshCM HayesRD Munoz-TorresMHeltG GoodsteinDM ElsikCG LewisSE SteinL et al(2016) JBrowse a dynamic web platform for genome visualizationand analysis Genome Biol 17 66

31 CamachoC CoulourisG AvagyanV MaN PapadopoulosJBealerK and MaddenTL (2009) BLAST+ architecture andapplications BMC Bioinformatics 10 421

32 SchlapferP ZhangP WangC KimT BanfM ChaeLDreherK ChavaliAK Nilo-PoyancoR BernardT et al (2017)Genome-wide prediction of metabolic enzymes pathways and geneclusters in plants Plant Physiol 173 2041ndash2059

33 CaspiR DreherK and KarpPD (2013) The challenge ofconstructing classifying and representing metabolic pathwaysFEMS Microbiol Lett 345 85ndash93

34 BolgerAM LohseM and UsadelB (2014) Trimmomatic a flexibletrimmer for Illumina sequence data Bioinformatics 30 2114ndash2120

35 GrabherrMG HaasBJ YassourM LevinJZ ThompsonDAAmitI AdiconisX FanL RaychowdhuryR ZengQ et al(2011) Full-length transcriptome assembly from RNA-Seq datawithout a reference genome Nat Biotechnol 29 644ndash652

36 GordonD AbajianC and GreenP (1998) Consed a graphical toolfor sequence finishing Genome Res 8 195ndash202

37 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

38 HuangX and MadanA (1999) CAP3 A DNA sequence assemblyprogram Genome Res 9 868ndash877

39 LangmeadB TrapnellC PopM and SalzbergSL (2009) Ultrafastand memory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10 R25

Nucleic Acids Research 2019 Vol 47 Database issue D1145

40 FuL NiuB ZhuZ WuS and LiW (2012) CD-HIT acceleratedfor clustering the next-generation sequencing data Bioinformatics28 3150ndash3152

41 DavidsonNM and OshlackA (2014) Corset enabling differentialgene expression analysis for de novo assembled transcriptomesGenome Biol 15 410

42 Koning-BoucoiranCFS EsselinkGD VukosavljevM van rsquotWestendeWPC GitongaVW KrensFA VoorripsRE van deWegWE SchulzD DebenerT et al (2015) Using RNA-Seq toassemble a rose transcriptome with more than 13000 full-lengthexpressed genes and to develop the WagRhSNP 68k Axiom SNParray for rose (Rosa L) Front Plant Sci 6 249

43 FuentesL MonsalveL Morales-QuintanaL ValdenegroMMartınezJ-P DefilippiBG and Gonzalez-AgueroM (2015)Differential expression of ethylene biosynthesis genes in drupelets andreceptacle of raspberry (Rubus idaeus) J Plant Physiol 179100ndash105

44 JungS CestaroA TroggioM MainD ZhengP ChoIFoltaKM SosinskiB AbbottA CeltonJ-M et al (2012) Wholegenome comparisons of Fragaria Prunus and Malus reveal differentmodes of evolution between Rosaceous subfamilies BMC Genomics13 129

45 Youens-ClarkK FagaB YapIV SteinL and WareD (2009)CMap 101 a comparative mapping application for the InternetBioinformatics 25 3040ndash3042

46 IezzoniA PeaceC MainD BassilN CoeM FinnC GasicKLubyJ HokansonS McFersonJ et al (2017) RosBREED2progress and future plans to enable DNA-informed breeding in theRosaceae Acta Hortic doi1017660ActaHortic201711722

47 ChagneD CrowhurstRN TroggioM DaveyMW GilmoreBLawleyC VanderzandeS HellensRP KumarS CestaroA et al(2012) Genome-wide SNP detection validation and development ofan 8K SNP array for apple PLoS One 7 e31745

48 BiancoL CestaroA SargentDJ BanchiE DerdakS DiGuardoM SalviS JansenJ ViolaR GutI et al (2014)Development and validation of a 20K single nucleotidepolymorphism (SNP) whole genome genotyping array for apple(Malus times domestica Borkh) PLoS One 9 e110377

49 BiancoL CestaroA LinsmithG MurantyH DenanceCTheronA PoncetC MichelettiD KerschbamerE Di PierroEAet al (2016) Development and validation of the Axiom(reg)Apple480K SNP genotyping array Plant J 86 62ndash74

50 PeaceC BassilN MainD FicklinS RosyaraUR StegmeirTSeboltA GilmoreB LawleyC MocklerTC et al (2012)Development and evaluation of a genome-wide 6K SNP array fordiploid sweet cherry and tetraploid sour cherry PLoS One 7 e48305

51 VerdeI BassilN ScalabrinS GilmoreB LawleyCT GasicKMichelettiD RosyaraUR CattonaroF VendraminE et al(2012) Development and evaluation of a 9K SNP array for peach byinternationally coordinated SNP detection and validation in breedinggermplasm PLoS One 7 e35668

52 BassilNV DavisTM ZhangH FicklinS MittmannMWebsterT MahoneyL WoodD AlperinES RosyaraUR et al(2015) Development and preliminary evaluation of a 90 K AxiomregSNP array for the allo-octoploid cultivated strawberry Fragaria timesananassa BMC Genomics 16 155

53 EvansK (2013) Apple Breeding in Pacific Northwest Acta Hortic976 75ndash78

54 RifeTW and PolandJA (2014) Field Book an open-sourceapplication for field data collection on android Crop Sci 541624ndash1627

55 JungS LeeT ChengC-H FicklinS YuJ HumannJ andMainD (2017) Extension modules for storage visualization andquerying of genomic genetic and breeding data in Tripal databasesDatabase (Oxford) 2017 bax092

56 ChenM HenryN AlmsaeedA ZhouX WegrzynJ FicklinSand StatonM (2017) New extension software modules to enhancesearching and display of transcriptome data in Tripal databasesDatabase (Oxford) 2017 bax052

57 HarperL CampbellJ CannonEKS JungS PoelchauMWallsR AndorfC ArnaudE BerardiniTZ BirkettC et al(2018) AgBioData consortium recommendations for sustainablegenomics and genetics databases for agriculture Database (Oxford)2018 bay088

D1142 Nucleic Acids Research 2019 Vol 47 Database issue

Figure 2 SNP Genotype Search Page in GDR (A) Users can search SNP genotype data by dataset name species germplasm name SNP name genomelocation andor gene name Users can also upload a file with a list of germplasm names (B) Search result table that shows SNP name genome locationallele and the genotype data of all the germplasm chosen in the order of SNP location in the genome The red square highlights the options to downloadthe genotype for all the marker displayed in the result page or the genotype data that are polymorphic in the germplasm set chosen

compatible with Field Book (54) an open-source App forandroid phones and tablets that allows researchers to re-place hard-copy field books or spreadsheet files BIMS al-lows breeders to upload data collected with the Field Bookapp directly to their private secure database in the GDRinto a consistent format Having BIMS in the GDR givesbreeders the opportunity to integrate their data with pub-licly available data and easily release some or all of their datapublicly as needed Currently the sweet cherry phenotypicdata from the RosBREED project is available for public ac-cess through BIMS In BIMS an accordion menu on theleft side provides quick access to various functionalities ThelsquoData Importrsquo section provides data templates for users toenter their data and upload the data file themselves (Figure3A) The lsquoSearchrsquo section allows users to search and save thelist of germplasm individuals (lsquoaccessionsrsquo) using any com-bination of properties and trait cut-off values name triallocation cross parent and trait values (Figure 3B) Whena filter is applied to choose accessions the rightmost sec-tion shows the number of accessions belonging to the fil-tered dataset When a trait descriptor is chosen as a filterthe middle section shows a histogram along with the statis-tical values such as maximum minimum mean and stan-dard deviation to show users the distribution of data pointswithin the dataset chosen (Figure 3B) The list of the ac-cession names chosen can be viewed and downloaded ina table with an option of adding more data on the acces-sions such as parents cross number and trait values (Figure3C) The list of accessions can be saved in user accounts andcan be used to retrieve any data associated with the acces-sions The lsquoData Analysisrsquo section allows users to choosetwo datasets using the categories or saved accession listsand compare the trait statistics between the two datasets(Figure 3D) This analysis function allows users to comparevarious traits of two sets of accessions such as progeniesfrom two different crosses

Community resources and data submission

The GDR continues to provide community-based resourcesunder the community navigation menu including pages forthe US RosEXEC RosIGI and conference employmentnotices and mailing lists The US RosEXEC (Rosaceae Ge-nomics Genetics and Breeding Executive Committee) andRosIGI (Rosaceae International Genomics Initiative) serveas communication and coordination focal points for the re-search community with the former also operating as theSteering Committee for the GDR The US RosEXEC andRosIGI pages provide official documents minutes of quar-terly meetings membership lists and subcommittee infor-mation Several mailing lists in addition to the GDR mail-ing list are available to serve the community with informa-tion for specific interests or purposes and the archives canbe viewed through the message board sites

While the GDR team actively curates data from pub-lications we encourage authors to officially archive theirdatasets by submitting their data at the time of manuscriptsubmission The Data Submission page under the Data nav-igation menu provides various data templates for genes ge-netic maps QTL maps as well as genotypic and phenotypicdata In addition the page has a link to a page where the rec-ommended list of files for whole genome data submissionare available

CONCLUSION AND FUTURE DIRECTION

Recent availability of multiple genome assemblies and an-notation data from seven major crops and 14 species in theRosaceae family opened up the opportunity to investigatethe evolution and biological basis of a wide range of plantforms and functions as well as to share knowledge amongmajor rosaceous crops to improve their performance Withthe goal of facilitating these efforts the GDR focused onintegrating the wealth of the new genomic data with tran-scriptomic genetic map genetic marker trait locus pheno-typic and genotypic data To accommodate scientistsrsquo data-

Nucleic Acids Research 2019 Vol 47 Database issue D1143

Figure 3 BIMS in GDR (A) lsquoTemplate Listrsquo subsection in lsquoData Importrsquo section provide downloadable templates for users to enter various breedingdata (B) Searchrsquo section allows users to search and save the list of accessions using any combination of properties and trait cut of values name triallocation cross parent and trait values The middle section shows the statistical information on the filtered dataset for the trait chosen and the right sectionshows the number of accessions filtered so far (C) A page with the search result table Users can add more columns in the table using lsquoColumn optionsrsquoand savedownload the result table (D) lsquoData Analysisrsquo section that allows users to choose two datasets using the categories or saved accession lists andcompare the trait statistics between the two datasets

mining needs that came with these new types and large vol-ume of data we made various new web interfaces availableThe new web interfaces are either new Tripal modules thatwe developed such as MapViewer BIMS Chado LoaderChado Data Display and Chado Search modules (55) orTripal modules that other database teams developed suchas the Synteny Viewer and Tripal BLAST The open-sourcedatabase platform Tripal allows us to meet emerging de-mands for storage querying and display of new data typesmore efficiently and quickly

During the last 5 years we also made advances in curat-ing data with more ontologies and incorporating ontologiesin data query pages to facilitate data-sharing across differ-ent data types species and databases Future effort will in-clude further developing MapViewer to integrate genomedata and genetic maps providing an enhanced querying in-terface expanding analysis capabilities in BIMS throughaccess to additional functionality integrating Galaxy anal-ysis platform through Tripal Galaxy module (httpswwwdrupalorgprojecttripal galaxy) providing access toHigh Performance Computing and enabling cross-Tripaldatabases querying capability using Tripal Elasticsearch(56) and Tripal Exchange (httptripalinfotutorialsv3xweb-services)

Use of the GDR the community resource for Rosaceaegenomics genetics and breeding research has continuedto grow over the last 5 years Between September 1 2013and August 31 2018 the GDR had 256075 visits by 95259unique visitors from 190 countries who accessed 1547877pages The GDR is part of AgBioData (57) (httpswwwagbiodataorg) a consortium working to improve stan-

dards and sustainability of genomics genetics and breedingdatabases and further enable agricultural science

ACKNOWLEDGEMENTS

The authors acknowledge with thanks their fundingsources the Rosaceae research community for providingdata support and feedback the Tripal and GMOD com-munity of developers for developing and sharing Tripalmodules and code the AgBioData Consortium and USLand Grant Universities for support

FUNDING

USDA National Institute of Food and Agriculture Spe-cialty Crop Research Initiative projects [2014-51181-2237 2014-51181-22378] USDA National Institute ofFood and Agriculture National Research Support Project10 NSF PGRP award 444573 NSF CIF21 DIBBs award1443040 Washington Tree Fruit Research CommissionWashington State University Clemson University Fundingfor open access charge Federal Grant USDA NIFA SCRIGrantConflict of interest statement None declared

REFERENCES1 ShulaevV KorbanSS SosinskiB AbbottAG AldwinckleHS

FoltaKM IezzoniA MainD ArusP DandekarAM et al(2008) Multiple models for rosaceae genomics Plant Physiol 147985ndash1003

D1144 Nucleic Acids Research 2019 Vol 47 Database issue

2 JungS JesuduraiC StatonM DuZ FicklinS ChoIAbbottA TomkinsJ and MainD (2004) GDR (Genome Databasefor Rosaceae) integrated web resources for Rosaceae genomics andgenetics research BMC Bioinformatics 5 130

3 JungS StatonM LeeT BlendaA SvancaraR AbbottA andMainD (2008) GDR (Genome Database for Rosaceae) integratedweb-database for Rosaceae genomics and genetics data Nucleic AcidsRes 36 D1034ndashD1040

4 JungS FicklinSP LeeT ChengC-H BlendaA ZhengP YuJBombarelyA ChoI RuS et al (2014) The Genome Database forRosaceae (GDR) year 10 update Nucleic Acids Res 42D1237ndashD1244

5 CooperL MeierA LaporteM-A ElserJL MungallCSinnBT CavaliereD CarbonS DunnNA SmithB et al (2018)The Planteome database an integrated resource for referenceontologies plant genomics and phenomics Nucleic Acids Res 46D1168ndashD1180

6 SandersonL-A FicklinSP ChengC-H JungS FeltusFABettKE and MainD (2013) Tripal v11 a standards-based toolkitfor construction of online genetic and genomic databases Database(Oxford) 2013 bat075

7 FicklinSP SandersonL-A ChengC-H StatonME LeeTChoI-H JungS BettKE and MainD (2011) Tripal aconstruction toolkit for online genome databases Database(Oxford) 2011 bar044

8 EdgerPP VanBurenR ColleM PoortenTJ WaiCMNiederhuthCE AlgerEI OuS AcharyaCB WangJ et al(2018) Single-molecule sequencing and optical mapping yields animproved genome of woodland strawberry (Fragaria vesca) withchromosome-scale contiguity Gigascience 7 1ndash7

9 HirakawaH ShirasawaK KosugiS TashiroK NakayamaSYamadaM KoharaM WatanabeA KishidaY FujishiroT et al(2014) Dissection of the octoploid strawberry genome by deepsequencing of the genomes of fragaria species DNA Res 21169ndash181

10 ShulaevV SargentDJ CrowhurstRN MocklerTC FolkertsODelcherAL JaiswalP MockaitisK ListonA ManeSP et al(2010) The genome of woodland strawberry (Fragaria vesca) NatGenet 43 109ndash116

11 TennessenJA GovindarajuluR AshmanT-L and ListonA(2014) Evolutionary origins and dynamics of octoploid strawberrysubgenomes revealed by dense targeted capture linkage mapsGenome Biol Evol 6 3295ndash3313

12 DarwishO ShahanR LiuZ SlovinJP and AlkharoufNW(2015) Re-annotation of the woodland strawberry (Fragaria vesca)genome BMC Genomics 16 29

13 LiY WeiW FengJ LuoH PiM LiuZ and KangC (2018)Genome re-annotation of the wild strawberry Fragaria vesca usingextensive Illumina- and SMRT-based RNA-seq datasets DNA Res25 61ndash70

14 ButiM MorettoM BarghiniE MascagniF NataliL BrilliMLomsadzeA SonegoP GiongoL AlongeM et al (2018) Thegenome sequence and transcriptome of Potentilla micrantha andtheir comparison to Fragaria vesca (the woodland strawberry)Gigascience 7 1ndash14

15 DaccordN CeltonJ-M LinsmithG BeckerC ChoisneNSchijlenE van de GeestH BiancoL MichelettiD VelascoRet al (2017) High-quality de novo assembly of the apple genome andmethylome dynamics of early fruit development Nat Genet 491099ndash1106

16 VelascoR ZharkikhA AffourtitJ DhingraA CestaroAKalyanaramanA FontanaP BhatnagarSK TroggioM PrussDet al (2010) The genome of the domesticated apple (Malus timesdomestica Borkh) Nat Genet 42 833ndash839

17 The International Peach Genome Initiative VerdeI AbbottAGScalabrinS JungS ShuS MarroniF ZhebentyayevaTDettoriMT GrimwoodJ et al (2013) The high-quality draftgenome of peach (Prunus persica) identifies unique patterns ofgenetic diversity domestication and genome evolution Nat Genet45 487ndash494

18 VerdeI JenkinsJ DondiniL MicaliS PagliaraniGVendraminE ParisR AraminiV GazzaL RossiniL et al(2017) The Peach v20 release high-resolution linkage mapping and

deep resequencing improve chromosome-scale assembly andcontiguity BMC Genomics 18 225

19 ShirasawaK IsuzugawaK IkenagaM SaitoY YamamotoTHirakawaH and IsobeS (2017) The genome sequence of sweetcherry (Prunus avium) for use in genomics-assisted breeding DNARes 24 499ndash508

20 ChagneD CrowhurstRN PindoM ThrimawithanaA DengCIrelandH FiersM DzierzonH CestaroA FontanaP et al(2014) The draft genome sequence of European pear (Pyruscommunis L rsquoBartlettrsquo) PLoS One 9 e92644

21 HibrandL RuttinkT HamamaL KirovI LakhwaniDZhouN-N BourkeP DaccordN LeusL SchulzD et al (2018)A high-quality sequence of Rosa chinensis to elucidate genomestructure and ornamental traits Nat Plants 4 473ndash484

22 RaymondO GouzyJ JustJ BadouinH VerdenaudMLemainqueA VergneP MojaS ChoisneN PontC et al (2018)The Rosa genome provides new insights into the domestication ofmodern roses Nat Genet 50 772ndash777

23 NakamuraN HirakawaH SatoS OtagakiS MatsumotoSTabataS and TanakaY (2018) Genome structure of Rosa multifloraa wild ancestor of cultivated roses DNA Res 25 113ndash121

24 JibranR DzierzonH BassilN BushakraJM EdgerPPSullivanS FinnCE DossettM ViningKJ VanBurenR et al(2018) Chromosome-scale scaffolding of the black raspberry (Rubusoccidentalis L) genome based on chromatin interaction data HorticRes 5 8

25 VanBurenR BryantD BushakraJM ViningKJ EdgerPPRowleyER PriestHD MichaelTP LyonsE FilichkinSAet al (2016) The genome of black raspberry (Rubus occidentalis)Plant J 87 535ndash547

26 FinnRD AttwoodTK BabbittPC BatemanA BorkPBridgeAJ ChangH-Y DosztanyiZ El-GebaliS FraserM et al(2017) InterPro in 2017ndashndashbeyond protein family and domainannotations Nucleic Acids Res 45 D190ndashD199

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene Ontology tool for the unification of biology NatGenet 25 25ndash29

28 The Gene Ontology Consortium (2017) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338

29 WangY TangH DeBarryJD TanX LiJ WangX LeeT -hJinH MarlerB GuoH et al (2012) MCScanX a toolkit fordetection and evolutionary analysis of gene synteny and collinearityNucleic Acids Res 40 e49

30 BuelsR YaoE DieshCM HayesRD Munoz-TorresMHeltG GoodsteinDM ElsikCG LewisSE SteinL et al(2016) JBrowse a dynamic web platform for genome visualizationand analysis Genome Biol 17 66

31 CamachoC CoulourisG AvagyanV MaN PapadopoulosJBealerK and MaddenTL (2009) BLAST+ architecture andapplications BMC Bioinformatics 10 421

32 SchlapferP ZhangP WangC KimT BanfM ChaeLDreherK ChavaliAK Nilo-PoyancoR BernardT et al (2017)Genome-wide prediction of metabolic enzymes pathways and geneclusters in plants Plant Physiol 173 2041ndash2059

33 CaspiR DreherK and KarpPD (2013) The challenge ofconstructing classifying and representing metabolic pathwaysFEMS Microbiol Lett 345 85ndash93

34 BolgerAM LohseM and UsadelB (2014) Trimmomatic a flexibletrimmer for Illumina sequence data Bioinformatics 30 2114ndash2120

35 GrabherrMG HaasBJ YassourM LevinJZ ThompsonDAAmitI AdiconisX FanL RaychowdhuryR ZengQ et al(2011) Full-length transcriptome assembly from RNA-Seq datawithout a reference genome Nat Biotechnol 29 644ndash652

36 GordonD AbajianC and GreenP (1998) Consed a graphical toolfor sequence finishing Genome Res 8 195ndash202

37 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

38 HuangX and MadanA (1999) CAP3 A DNA sequence assemblyprogram Genome Res 9 868ndash877

39 LangmeadB TrapnellC PopM and SalzbergSL (2009) Ultrafastand memory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10 R25

Nucleic Acids Research 2019 Vol 47 Database issue D1145

40 FuL NiuB ZhuZ WuS and LiW (2012) CD-HIT acceleratedfor clustering the next-generation sequencing data Bioinformatics28 3150ndash3152

41 DavidsonNM and OshlackA (2014) Corset enabling differentialgene expression analysis for de novo assembled transcriptomesGenome Biol 15 410

42 Koning-BoucoiranCFS EsselinkGD VukosavljevM van rsquotWestendeWPC GitongaVW KrensFA VoorripsRE van deWegWE SchulzD DebenerT et al (2015) Using RNA-Seq toassemble a rose transcriptome with more than 13000 full-lengthexpressed genes and to develop the WagRhSNP 68k Axiom SNParray for rose (Rosa L) Front Plant Sci 6 249

43 FuentesL MonsalveL Morales-QuintanaL ValdenegroMMartınezJ-P DefilippiBG and Gonzalez-AgueroM (2015)Differential expression of ethylene biosynthesis genes in drupelets andreceptacle of raspberry (Rubus idaeus) J Plant Physiol 179100ndash105

44 JungS CestaroA TroggioM MainD ZhengP ChoIFoltaKM SosinskiB AbbottA CeltonJ-M et al (2012) Wholegenome comparisons of Fragaria Prunus and Malus reveal differentmodes of evolution between Rosaceous subfamilies BMC Genomics13 129

45 Youens-ClarkK FagaB YapIV SteinL and WareD (2009)CMap 101 a comparative mapping application for the InternetBioinformatics 25 3040ndash3042

46 IezzoniA PeaceC MainD BassilN CoeM FinnC GasicKLubyJ HokansonS McFersonJ et al (2017) RosBREED2progress and future plans to enable DNA-informed breeding in theRosaceae Acta Hortic doi1017660ActaHortic201711722

47 ChagneD CrowhurstRN TroggioM DaveyMW GilmoreBLawleyC VanderzandeS HellensRP KumarS CestaroA et al(2012) Genome-wide SNP detection validation and development ofan 8K SNP array for apple PLoS One 7 e31745

48 BiancoL CestaroA SargentDJ BanchiE DerdakS DiGuardoM SalviS JansenJ ViolaR GutI et al (2014)Development and validation of a 20K single nucleotidepolymorphism (SNP) whole genome genotyping array for apple(Malus times domestica Borkh) PLoS One 9 e110377

49 BiancoL CestaroA LinsmithG MurantyH DenanceCTheronA PoncetC MichelettiD KerschbamerE Di PierroEAet al (2016) Development and validation of the Axiom(reg)Apple480K SNP genotyping array Plant J 86 62ndash74

50 PeaceC BassilN MainD FicklinS RosyaraUR StegmeirTSeboltA GilmoreB LawleyC MocklerTC et al (2012)Development and evaluation of a genome-wide 6K SNP array fordiploid sweet cherry and tetraploid sour cherry PLoS One 7 e48305

51 VerdeI BassilN ScalabrinS GilmoreB LawleyCT GasicKMichelettiD RosyaraUR CattonaroF VendraminE et al(2012) Development and evaluation of a 9K SNP array for peach byinternationally coordinated SNP detection and validation in breedinggermplasm PLoS One 7 e35668

52 BassilNV DavisTM ZhangH FicklinS MittmannMWebsterT MahoneyL WoodD AlperinES RosyaraUR et al(2015) Development and preliminary evaluation of a 90 K AxiomregSNP array for the allo-octoploid cultivated strawberry Fragaria timesananassa BMC Genomics 16 155

53 EvansK (2013) Apple Breeding in Pacific Northwest Acta Hortic976 75ndash78

54 RifeTW and PolandJA (2014) Field Book an open-sourceapplication for field data collection on android Crop Sci 541624ndash1627

55 JungS LeeT ChengC-H FicklinS YuJ HumannJ andMainD (2017) Extension modules for storage visualization andquerying of genomic genetic and breeding data in Tripal databasesDatabase (Oxford) 2017 bax092

56 ChenM HenryN AlmsaeedA ZhouX WegrzynJ FicklinSand StatonM (2017) New extension software modules to enhancesearching and display of transcriptome data in Tripal databasesDatabase (Oxford) 2017 bax052

57 HarperL CampbellJ CannonEKS JungS PoelchauMWallsR AndorfC ArnaudE BerardiniTZ BirkettC et al(2018) AgBioData consortium recommendations for sustainablegenomics and genetics databases for agriculture Database (Oxford)2018 bay088

Nucleic Acids Research 2019 Vol 47 Database issue D1143

Figure 3 BIMS in GDR (A) lsquoTemplate Listrsquo subsection in lsquoData Importrsquo section provide downloadable templates for users to enter various breedingdata (B) Searchrsquo section allows users to search and save the list of accessions using any combination of properties and trait cut of values name triallocation cross parent and trait values The middle section shows the statistical information on the filtered dataset for the trait chosen and the right sectionshows the number of accessions filtered so far (C) A page with the search result table Users can add more columns in the table using lsquoColumn optionsrsquoand savedownload the result table (D) lsquoData Analysisrsquo section that allows users to choose two datasets using the categories or saved accession lists andcompare the trait statistics between the two datasets

mining needs that came with these new types and large vol-ume of data we made various new web interfaces availableThe new web interfaces are either new Tripal modules thatwe developed such as MapViewer BIMS Chado LoaderChado Data Display and Chado Search modules (55) orTripal modules that other database teams developed suchas the Synteny Viewer and Tripal BLAST The open-sourcedatabase platform Tripal allows us to meet emerging de-mands for storage querying and display of new data typesmore efficiently and quickly

During the last 5 years we also made advances in curat-ing data with more ontologies and incorporating ontologiesin data query pages to facilitate data-sharing across differ-ent data types species and databases Future effort will in-clude further developing MapViewer to integrate genomedata and genetic maps providing an enhanced querying in-terface expanding analysis capabilities in BIMS throughaccess to additional functionality integrating Galaxy anal-ysis platform through Tripal Galaxy module (httpswwwdrupalorgprojecttripal galaxy) providing access toHigh Performance Computing and enabling cross-Tripaldatabases querying capability using Tripal Elasticsearch(56) and Tripal Exchange (httptripalinfotutorialsv3xweb-services)

Use of the GDR the community resource for Rosaceaegenomics genetics and breeding research has continuedto grow over the last 5 years Between September 1 2013and August 31 2018 the GDR had 256075 visits by 95259unique visitors from 190 countries who accessed 1547877pages The GDR is part of AgBioData (57) (httpswwwagbiodataorg) a consortium working to improve stan-

dards and sustainability of genomics genetics and breedingdatabases and further enable agricultural science

ACKNOWLEDGEMENTS

The authors acknowledge with thanks their fundingsources the Rosaceae research community for providingdata support and feedback the Tripal and GMOD com-munity of developers for developing and sharing Tripalmodules and code the AgBioData Consortium and USLand Grant Universities for support

FUNDING

USDA National Institute of Food and Agriculture Spe-cialty Crop Research Initiative projects [2014-51181-2237 2014-51181-22378] USDA National Institute ofFood and Agriculture National Research Support Project10 NSF PGRP award 444573 NSF CIF21 DIBBs award1443040 Washington Tree Fruit Research CommissionWashington State University Clemson University Fundingfor open access charge Federal Grant USDA NIFA SCRIGrantConflict of interest statement None declared

REFERENCES1 ShulaevV KorbanSS SosinskiB AbbottAG AldwinckleHS

FoltaKM IezzoniA MainD ArusP DandekarAM et al(2008) Multiple models for rosaceae genomics Plant Physiol 147985ndash1003

D1144 Nucleic Acids Research 2019 Vol 47 Database issue

2 JungS JesuduraiC StatonM DuZ FicklinS ChoIAbbottA TomkinsJ and MainD (2004) GDR (Genome Databasefor Rosaceae) integrated web resources for Rosaceae genomics andgenetics research BMC Bioinformatics 5 130

3 JungS StatonM LeeT BlendaA SvancaraR AbbottA andMainD (2008) GDR (Genome Database for Rosaceae) integratedweb-database for Rosaceae genomics and genetics data Nucleic AcidsRes 36 D1034ndashD1040

4 JungS FicklinSP LeeT ChengC-H BlendaA ZhengP YuJBombarelyA ChoI RuS et al (2014) The Genome Database forRosaceae (GDR) year 10 update Nucleic Acids Res 42D1237ndashD1244

5 CooperL MeierA LaporteM-A ElserJL MungallCSinnBT CavaliereD CarbonS DunnNA SmithB et al (2018)The Planteome database an integrated resource for referenceontologies plant genomics and phenomics Nucleic Acids Res 46D1168ndashD1180

6 SandersonL-A FicklinSP ChengC-H JungS FeltusFABettKE and MainD (2013) Tripal v11 a standards-based toolkitfor construction of online genetic and genomic databases Database(Oxford) 2013 bat075

7 FicklinSP SandersonL-A ChengC-H StatonME LeeTChoI-H JungS BettKE and MainD (2011) Tripal aconstruction toolkit for online genome databases Database(Oxford) 2011 bar044

8 EdgerPP VanBurenR ColleM PoortenTJ WaiCMNiederhuthCE AlgerEI OuS AcharyaCB WangJ et al(2018) Single-molecule sequencing and optical mapping yields animproved genome of woodland strawberry (Fragaria vesca) withchromosome-scale contiguity Gigascience 7 1ndash7

9 HirakawaH ShirasawaK KosugiS TashiroK NakayamaSYamadaM KoharaM WatanabeA KishidaY FujishiroT et al(2014) Dissection of the octoploid strawberry genome by deepsequencing of the genomes of fragaria species DNA Res 21169ndash181

10 ShulaevV SargentDJ CrowhurstRN MocklerTC FolkertsODelcherAL JaiswalP MockaitisK ListonA ManeSP et al(2010) The genome of woodland strawberry (Fragaria vesca) NatGenet 43 109ndash116

11 TennessenJA GovindarajuluR AshmanT-L and ListonA(2014) Evolutionary origins and dynamics of octoploid strawberrysubgenomes revealed by dense targeted capture linkage mapsGenome Biol Evol 6 3295ndash3313

12 DarwishO ShahanR LiuZ SlovinJP and AlkharoufNW(2015) Re-annotation of the woodland strawberry (Fragaria vesca)genome BMC Genomics 16 29

13 LiY WeiW FengJ LuoH PiM LiuZ and KangC (2018)Genome re-annotation of the wild strawberry Fragaria vesca usingextensive Illumina- and SMRT-based RNA-seq datasets DNA Res25 61ndash70

14 ButiM MorettoM BarghiniE MascagniF NataliL BrilliMLomsadzeA SonegoP GiongoL AlongeM et al (2018) Thegenome sequence and transcriptome of Potentilla micrantha andtheir comparison to Fragaria vesca (the woodland strawberry)Gigascience 7 1ndash14

15 DaccordN CeltonJ-M LinsmithG BeckerC ChoisneNSchijlenE van de GeestH BiancoL MichelettiD VelascoRet al (2017) High-quality de novo assembly of the apple genome andmethylome dynamics of early fruit development Nat Genet 491099ndash1106

16 VelascoR ZharkikhA AffourtitJ DhingraA CestaroAKalyanaramanA FontanaP BhatnagarSK TroggioM PrussDet al (2010) The genome of the domesticated apple (Malus timesdomestica Borkh) Nat Genet 42 833ndash839

17 The International Peach Genome Initiative VerdeI AbbottAGScalabrinS JungS ShuS MarroniF ZhebentyayevaTDettoriMT GrimwoodJ et al (2013) The high-quality draftgenome of peach (Prunus persica) identifies unique patterns ofgenetic diversity domestication and genome evolution Nat Genet45 487ndash494

18 VerdeI JenkinsJ DondiniL MicaliS PagliaraniGVendraminE ParisR AraminiV GazzaL RossiniL et al(2017) The Peach v20 release high-resolution linkage mapping and

deep resequencing improve chromosome-scale assembly andcontiguity BMC Genomics 18 225

19 ShirasawaK IsuzugawaK IkenagaM SaitoY YamamotoTHirakawaH and IsobeS (2017) The genome sequence of sweetcherry (Prunus avium) for use in genomics-assisted breeding DNARes 24 499ndash508

20 ChagneD CrowhurstRN PindoM ThrimawithanaA DengCIrelandH FiersM DzierzonH CestaroA FontanaP et al(2014) The draft genome sequence of European pear (Pyruscommunis L rsquoBartlettrsquo) PLoS One 9 e92644

21 HibrandL RuttinkT HamamaL KirovI LakhwaniDZhouN-N BourkeP DaccordN LeusL SchulzD et al (2018)A high-quality sequence of Rosa chinensis to elucidate genomestructure and ornamental traits Nat Plants 4 473ndash484

22 RaymondO GouzyJ JustJ BadouinH VerdenaudMLemainqueA VergneP MojaS ChoisneN PontC et al (2018)The Rosa genome provides new insights into the domestication ofmodern roses Nat Genet 50 772ndash777

23 NakamuraN HirakawaH SatoS OtagakiS MatsumotoSTabataS and TanakaY (2018) Genome structure of Rosa multifloraa wild ancestor of cultivated roses DNA Res 25 113ndash121

24 JibranR DzierzonH BassilN BushakraJM EdgerPPSullivanS FinnCE DossettM ViningKJ VanBurenR et al(2018) Chromosome-scale scaffolding of the black raspberry (Rubusoccidentalis L) genome based on chromatin interaction data HorticRes 5 8

25 VanBurenR BryantD BushakraJM ViningKJ EdgerPPRowleyER PriestHD MichaelTP LyonsE FilichkinSAet al (2016) The genome of black raspberry (Rubus occidentalis)Plant J 87 535ndash547

26 FinnRD AttwoodTK BabbittPC BatemanA BorkPBridgeAJ ChangH-Y DosztanyiZ El-GebaliS FraserM et al(2017) InterPro in 2017ndashndashbeyond protein family and domainannotations Nucleic Acids Res 45 D190ndashD199

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene Ontology tool for the unification of biology NatGenet 25 25ndash29

28 The Gene Ontology Consortium (2017) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338

29 WangY TangH DeBarryJD TanX LiJ WangX LeeT -hJinH MarlerB GuoH et al (2012) MCScanX a toolkit fordetection and evolutionary analysis of gene synteny and collinearityNucleic Acids Res 40 e49

30 BuelsR YaoE DieshCM HayesRD Munoz-TorresMHeltG GoodsteinDM ElsikCG LewisSE SteinL et al(2016) JBrowse a dynamic web platform for genome visualizationand analysis Genome Biol 17 66

31 CamachoC CoulourisG AvagyanV MaN PapadopoulosJBealerK and MaddenTL (2009) BLAST+ architecture andapplications BMC Bioinformatics 10 421

32 SchlapferP ZhangP WangC KimT BanfM ChaeLDreherK ChavaliAK Nilo-PoyancoR BernardT et al (2017)Genome-wide prediction of metabolic enzymes pathways and geneclusters in plants Plant Physiol 173 2041ndash2059

33 CaspiR DreherK and KarpPD (2013) The challenge ofconstructing classifying and representing metabolic pathwaysFEMS Microbiol Lett 345 85ndash93

34 BolgerAM LohseM and UsadelB (2014) Trimmomatic a flexibletrimmer for Illumina sequence data Bioinformatics 30 2114ndash2120

35 GrabherrMG HaasBJ YassourM LevinJZ ThompsonDAAmitI AdiconisX FanL RaychowdhuryR ZengQ et al(2011) Full-length transcriptome assembly from RNA-Seq datawithout a reference genome Nat Biotechnol 29 644ndash652

36 GordonD AbajianC and GreenP (1998) Consed a graphical toolfor sequence finishing Genome Res 8 195ndash202

37 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

38 HuangX and MadanA (1999) CAP3 A DNA sequence assemblyprogram Genome Res 9 868ndash877

39 LangmeadB TrapnellC PopM and SalzbergSL (2009) Ultrafastand memory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10 R25

Nucleic Acids Research 2019 Vol 47 Database issue D1145

40 FuL NiuB ZhuZ WuS and LiW (2012) CD-HIT acceleratedfor clustering the next-generation sequencing data Bioinformatics28 3150ndash3152

41 DavidsonNM and OshlackA (2014) Corset enabling differentialgene expression analysis for de novo assembled transcriptomesGenome Biol 15 410

42 Koning-BoucoiranCFS EsselinkGD VukosavljevM van rsquotWestendeWPC GitongaVW KrensFA VoorripsRE van deWegWE SchulzD DebenerT et al (2015) Using RNA-Seq toassemble a rose transcriptome with more than 13000 full-lengthexpressed genes and to develop the WagRhSNP 68k Axiom SNParray for rose (Rosa L) Front Plant Sci 6 249

43 FuentesL MonsalveL Morales-QuintanaL ValdenegroMMartınezJ-P DefilippiBG and Gonzalez-AgueroM (2015)Differential expression of ethylene biosynthesis genes in drupelets andreceptacle of raspberry (Rubus idaeus) J Plant Physiol 179100ndash105

44 JungS CestaroA TroggioM MainD ZhengP ChoIFoltaKM SosinskiB AbbottA CeltonJ-M et al (2012) Wholegenome comparisons of Fragaria Prunus and Malus reveal differentmodes of evolution between Rosaceous subfamilies BMC Genomics13 129

45 Youens-ClarkK FagaB YapIV SteinL and WareD (2009)CMap 101 a comparative mapping application for the InternetBioinformatics 25 3040ndash3042

46 IezzoniA PeaceC MainD BassilN CoeM FinnC GasicKLubyJ HokansonS McFersonJ et al (2017) RosBREED2progress and future plans to enable DNA-informed breeding in theRosaceae Acta Hortic doi1017660ActaHortic201711722

47 ChagneD CrowhurstRN TroggioM DaveyMW GilmoreBLawleyC VanderzandeS HellensRP KumarS CestaroA et al(2012) Genome-wide SNP detection validation and development ofan 8K SNP array for apple PLoS One 7 e31745

48 BiancoL CestaroA SargentDJ BanchiE DerdakS DiGuardoM SalviS JansenJ ViolaR GutI et al (2014)Development and validation of a 20K single nucleotidepolymorphism (SNP) whole genome genotyping array for apple(Malus times domestica Borkh) PLoS One 9 e110377

49 BiancoL CestaroA LinsmithG MurantyH DenanceCTheronA PoncetC MichelettiD KerschbamerE Di PierroEAet al (2016) Development and validation of the Axiom(reg)Apple480K SNP genotyping array Plant J 86 62ndash74

50 PeaceC BassilN MainD FicklinS RosyaraUR StegmeirTSeboltA GilmoreB LawleyC MocklerTC et al (2012)Development and evaluation of a genome-wide 6K SNP array fordiploid sweet cherry and tetraploid sour cherry PLoS One 7 e48305

51 VerdeI BassilN ScalabrinS GilmoreB LawleyCT GasicKMichelettiD RosyaraUR CattonaroF VendraminE et al(2012) Development and evaluation of a 9K SNP array for peach byinternationally coordinated SNP detection and validation in breedinggermplasm PLoS One 7 e35668

52 BassilNV DavisTM ZhangH FicklinS MittmannMWebsterT MahoneyL WoodD AlperinES RosyaraUR et al(2015) Development and preliminary evaluation of a 90 K AxiomregSNP array for the allo-octoploid cultivated strawberry Fragaria timesananassa BMC Genomics 16 155

53 EvansK (2013) Apple Breeding in Pacific Northwest Acta Hortic976 75ndash78

54 RifeTW and PolandJA (2014) Field Book an open-sourceapplication for field data collection on android Crop Sci 541624ndash1627

55 JungS LeeT ChengC-H FicklinS YuJ HumannJ andMainD (2017) Extension modules for storage visualization andquerying of genomic genetic and breeding data in Tripal databasesDatabase (Oxford) 2017 bax092

56 ChenM HenryN AlmsaeedA ZhouX WegrzynJ FicklinSand StatonM (2017) New extension software modules to enhancesearching and display of transcriptome data in Tripal databasesDatabase (Oxford) 2017 bax052

57 HarperL CampbellJ CannonEKS JungS PoelchauMWallsR AndorfC ArnaudE BerardiniTZ BirkettC et al(2018) AgBioData consortium recommendations for sustainablegenomics and genetics databases for agriculture Database (Oxford)2018 bay088

D1144 Nucleic Acids Research 2019 Vol 47 Database issue

2 JungS JesuduraiC StatonM DuZ FicklinS ChoIAbbottA TomkinsJ and MainD (2004) GDR (Genome Databasefor Rosaceae) integrated web resources for Rosaceae genomics andgenetics research BMC Bioinformatics 5 130

3 JungS StatonM LeeT BlendaA SvancaraR AbbottA andMainD (2008) GDR (Genome Database for Rosaceae) integratedweb-database for Rosaceae genomics and genetics data Nucleic AcidsRes 36 D1034ndashD1040

4 JungS FicklinSP LeeT ChengC-H BlendaA ZhengP YuJBombarelyA ChoI RuS et al (2014) The Genome Database forRosaceae (GDR) year 10 update Nucleic Acids Res 42D1237ndashD1244

5 CooperL MeierA LaporteM-A ElserJL MungallCSinnBT CavaliereD CarbonS DunnNA SmithB et al (2018)The Planteome database an integrated resource for referenceontologies plant genomics and phenomics Nucleic Acids Res 46D1168ndashD1180

6 SandersonL-A FicklinSP ChengC-H JungS FeltusFABettKE and MainD (2013) Tripal v11 a standards-based toolkitfor construction of online genetic and genomic databases Database(Oxford) 2013 bat075

7 FicklinSP SandersonL-A ChengC-H StatonME LeeTChoI-H JungS BettKE and MainD (2011) Tripal aconstruction toolkit for online genome databases Database(Oxford) 2011 bar044

8 EdgerPP VanBurenR ColleM PoortenTJ WaiCMNiederhuthCE AlgerEI OuS AcharyaCB WangJ et al(2018) Single-molecule sequencing and optical mapping yields animproved genome of woodland strawberry (Fragaria vesca) withchromosome-scale contiguity Gigascience 7 1ndash7

9 HirakawaH ShirasawaK KosugiS TashiroK NakayamaSYamadaM KoharaM WatanabeA KishidaY FujishiroT et al(2014) Dissection of the octoploid strawberry genome by deepsequencing of the genomes of fragaria species DNA Res 21169ndash181

10 ShulaevV SargentDJ CrowhurstRN MocklerTC FolkertsODelcherAL JaiswalP MockaitisK ListonA ManeSP et al(2010) The genome of woodland strawberry (Fragaria vesca) NatGenet 43 109ndash116

11 TennessenJA GovindarajuluR AshmanT-L and ListonA(2014) Evolutionary origins and dynamics of octoploid strawberrysubgenomes revealed by dense targeted capture linkage mapsGenome Biol Evol 6 3295ndash3313

12 DarwishO ShahanR LiuZ SlovinJP and AlkharoufNW(2015) Re-annotation of the woodland strawberry (Fragaria vesca)genome BMC Genomics 16 29

13 LiY WeiW FengJ LuoH PiM LiuZ and KangC (2018)Genome re-annotation of the wild strawberry Fragaria vesca usingextensive Illumina- and SMRT-based RNA-seq datasets DNA Res25 61ndash70

14 ButiM MorettoM BarghiniE MascagniF NataliL BrilliMLomsadzeA SonegoP GiongoL AlongeM et al (2018) Thegenome sequence and transcriptome of Potentilla micrantha andtheir comparison to Fragaria vesca (the woodland strawberry)Gigascience 7 1ndash14

15 DaccordN CeltonJ-M LinsmithG BeckerC ChoisneNSchijlenE van de GeestH BiancoL MichelettiD VelascoRet al (2017) High-quality de novo assembly of the apple genome andmethylome dynamics of early fruit development Nat Genet 491099ndash1106

16 VelascoR ZharkikhA AffourtitJ DhingraA CestaroAKalyanaramanA FontanaP BhatnagarSK TroggioM PrussDet al (2010) The genome of the domesticated apple (Malus timesdomestica Borkh) Nat Genet 42 833ndash839

17 The International Peach Genome Initiative VerdeI AbbottAGScalabrinS JungS ShuS MarroniF ZhebentyayevaTDettoriMT GrimwoodJ et al (2013) The high-quality draftgenome of peach (Prunus persica) identifies unique patterns ofgenetic diversity domestication and genome evolution Nat Genet45 487ndash494

18 VerdeI JenkinsJ DondiniL MicaliS PagliaraniGVendraminE ParisR AraminiV GazzaL RossiniL et al(2017) The Peach v20 release high-resolution linkage mapping and

deep resequencing improve chromosome-scale assembly andcontiguity BMC Genomics 18 225

19 ShirasawaK IsuzugawaK IkenagaM SaitoY YamamotoTHirakawaH and IsobeS (2017) The genome sequence of sweetcherry (Prunus avium) for use in genomics-assisted breeding DNARes 24 499ndash508

20 ChagneD CrowhurstRN PindoM ThrimawithanaA DengCIrelandH FiersM DzierzonH CestaroA FontanaP et al(2014) The draft genome sequence of European pear (Pyruscommunis L rsquoBartlettrsquo) PLoS One 9 e92644

21 HibrandL RuttinkT HamamaL KirovI LakhwaniDZhouN-N BourkeP DaccordN LeusL SchulzD et al (2018)A high-quality sequence of Rosa chinensis to elucidate genomestructure and ornamental traits Nat Plants 4 473ndash484

22 RaymondO GouzyJ JustJ BadouinH VerdenaudMLemainqueA VergneP MojaS ChoisneN PontC et al (2018)The Rosa genome provides new insights into the domestication ofmodern roses Nat Genet 50 772ndash777

23 NakamuraN HirakawaH SatoS OtagakiS MatsumotoSTabataS and TanakaY (2018) Genome structure of Rosa multifloraa wild ancestor of cultivated roses DNA Res 25 113ndash121

24 JibranR DzierzonH BassilN BushakraJM EdgerPPSullivanS FinnCE DossettM ViningKJ VanBurenR et al(2018) Chromosome-scale scaffolding of the black raspberry (Rubusoccidentalis L) genome based on chromatin interaction data HorticRes 5 8

25 VanBurenR BryantD BushakraJM ViningKJ EdgerPPRowleyER PriestHD MichaelTP LyonsE FilichkinSAet al (2016) The genome of black raspberry (Rubus occidentalis)Plant J 87 535ndash547

26 FinnRD AttwoodTK BabbittPC BatemanA BorkPBridgeAJ ChangH-Y DosztanyiZ El-GebaliS FraserM et al(2017) InterPro in 2017ndashndashbeyond protein family and domainannotations Nucleic Acids Res 45 D190ndashD199

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene Ontology tool for the unification of biology NatGenet 25 25ndash29

28 The Gene Ontology Consortium (2017) Expansion of the GeneOntology knowledgebase and resources Nucleic Acids Res 45D331ndashD338

29 WangY TangH DeBarryJD TanX LiJ WangX LeeT -hJinH MarlerB GuoH et al (2012) MCScanX a toolkit fordetection and evolutionary analysis of gene synteny and collinearityNucleic Acids Res 40 e49

30 BuelsR YaoE DieshCM HayesRD Munoz-TorresMHeltG GoodsteinDM ElsikCG LewisSE SteinL et al(2016) JBrowse a dynamic web platform for genome visualizationand analysis Genome Biol 17 66

31 CamachoC CoulourisG AvagyanV MaN PapadopoulosJBealerK and MaddenTL (2009) BLAST+ architecture andapplications BMC Bioinformatics 10 421

32 SchlapferP ZhangP WangC KimT BanfM ChaeLDreherK ChavaliAK Nilo-PoyancoR BernardT et al (2017)Genome-wide prediction of metabolic enzymes pathways and geneclusters in plants Plant Physiol 173 2041ndash2059

33 CaspiR DreherK and KarpPD (2013) The challenge ofconstructing classifying and representing metabolic pathwaysFEMS Microbiol Lett 345 85ndash93

34 BolgerAM LohseM and UsadelB (2014) Trimmomatic a flexibletrimmer for Illumina sequence data Bioinformatics 30 2114ndash2120

35 GrabherrMG HaasBJ YassourM LevinJZ ThompsonDAAmitI AdiconisX FanL RaychowdhuryR ZengQ et al(2011) Full-length transcriptome assembly from RNA-Seq datawithout a reference genome Nat Biotechnol 29 644ndash652

36 GordonD AbajianC and GreenP (1998) Consed a graphical toolfor sequence finishing Genome Res 8 195ndash202

37 AltschulSF GishW MillerW MyersEW and LipmanDJ(1990) Basic local alignment search tool J Mol Biol 215 403ndash410

38 HuangX and MadanA (1999) CAP3 A DNA sequence assemblyprogram Genome Res 9 868ndash877

39 LangmeadB TrapnellC PopM and SalzbergSL (2009) Ultrafastand memory-efficient alignment of short DNA sequences to thehuman genome Genome Biol 10 R25

Nucleic Acids Research 2019 Vol 47 Database issue D1145

40 FuL NiuB ZhuZ WuS and LiW (2012) CD-HIT acceleratedfor clustering the next-generation sequencing data Bioinformatics28 3150ndash3152

41 DavidsonNM and OshlackA (2014) Corset enabling differentialgene expression analysis for de novo assembled transcriptomesGenome Biol 15 410

42 Koning-BoucoiranCFS EsselinkGD VukosavljevM van rsquotWestendeWPC GitongaVW KrensFA VoorripsRE van deWegWE SchulzD DebenerT et al (2015) Using RNA-Seq toassemble a rose transcriptome with more than 13000 full-lengthexpressed genes and to develop the WagRhSNP 68k Axiom SNParray for rose (Rosa L) Front Plant Sci 6 249

43 FuentesL MonsalveL Morales-QuintanaL ValdenegroMMartınezJ-P DefilippiBG and Gonzalez-AgueroM (2015)Differential expression of ethylene biosynthesis genes in drupelets andreceptacle of raspberry (Rubus idaeus) J Plant Physiol 179100ndash105

44 JungS CestaroA TroggioM MainD ZhengP ChoIFoltaKM SosinskiB AbbottA CeltonJ-M et al (2012) Wholegenome comparisons of Fragaria Prunus and Malus reveal differentmodes of evolution between Rosaceous subfamilies BMC Genomics13 129

45 Youens-ClarkK FagaB YapIV SteinL and WareD (2009)CMap 101 a comparative mapping application for the InternetBioinformatics 25 3040ndash3042

46 IezzoniA PeaceC MainD BassilN CoeM FinnC GasicKLubyJ HokansonS McFersonJ et al (2017) RosBREED2progress and future plans to enable DNA-informed breeding in theRosaceae Acta Hortic doi1017660ActaHortic201711722

47 ChagneD CrowhurstRN TroggioM DaveyMW GilmoreBLawleyC VanderzandeS HellensRP KumarS CestaroA et al(2012) Genome-wide SNP detection validation and development ofan 8K SNP array for apple PLoS One 7 e31745

48 BiancoL CestaroA SargentDJ BanchiE DerdakS DiGuardoM SalviS JansenJ ViolaR GutI et al (2014)Development and validation of a 20K single nucleotidepolymorphism (SNP) whole genome genotyping array for apple(Malus times domestica Borkh) PLoS One 9 e110377

49 BiancoL CestaroA LinsmithG MurantyH DenanceCTheronA PoncetC MichelettiD KerschbamerE Di PierroEAet al (2016) Development and validation of the Axiom(reg)Apple480K SNP genotyping array Plant J 86 62ndash74

50 PeaceC BassilN MainD FicklinS RosyaraUR StegmeirTSeboltA GilmoreB LawleyC MocklerTC et al (2012)Development and evaluation of a genome-wide 6K SNP array fordiploid sweet cherry and tetraploid sour cherry PLoS One 7 e48305

51 VerdeI BassilN ScalabrinS GilmoreB LawleyCT GasicKMichelettiD RosyaraUR CattonaroF VendraminE et al(2012) Development and evaluation of a 9K SNP array for peach byinternationally coordinated SNP detection and validation in breedinggermplasm PLoS One 7 e35668

52 BassilNV DavisTM ZhangH FicklinS MittmannMWebsterT MahoneyL WoodD AlperinES RosyaraUR et al(2015) Development and preliminary evaluation of a 90 K AxiomregSNP array for the allo-octoploid cultivated strawberry Fragaria timesananassa BMC Genomics 16 155

53 EvansK (2013) Apple Breeding in Pacific Northwest Acta Hortic976 75ndash78

54 RifeTW and PolandJA (2014) Field Book an open-sourceapplication for field data collection on android Crop Sci 541624ndash1627

55 JungS LeeT ChengC-H FicklinS YuJ HumannJ andMainD (2017) Extension modules for storage visualization andquerying of genomic genetic and breeding data in Tripal databasesDatabase (Oxford) 2017 bax092

56 ChenM HenryN AlmsaeedA ZhouX WegrzynJ FicklinSand StatonM (2017) New extension software modules to enhancesearching and display of transcriptome data in Tripal databasesDatabase (Oxford) 2017 bax052

57 HarperL CampbellJ CannonEKS JungS PoelchauMWallsR AndorfC ArnaudE BerardiniTZ BirkettC et al(2018) AgBioData consortium recommendations for sustainablegenomics and genetics databases for agriculture Database (Oxford)2018 bay088

Nucleic Acids Research 2019 Vol 47 Database issue D1145

40 FuL NiuB ZhuZ WuS and LiW (2012) CD-HIT acceleratedfor clustering the next-generation sequencing data Bioinformatics28 3150ndash3152

41 DavidsonNM and OshlackA (2014) Corset enabling differentialgene expression analysis for de novo assembled transcriptomesGenome Biol 15 410

42 Koning-BoucoiranCFS EsselinkGD VukosavljevM van rsquotWestendeWPC GitongaVW KrensFA VoorripsRE van deWegWE SchulzD DebenerT et al (2015) Using RNA-Seq toassemble a rose transcriptome with more than 13000 full-lengthexpressed genes and to develop the WagRhSNP 68k Axiom SNParray for rose (Rosa L) Front Plant Sci 6 249

43 FuentesL MonsalveL Morales-QuintanaL ValdenegroMMartınezJ-P DefilippiBG and Gonzalez-AgueroM (2015)Differential expression of ethylene biosynthesis genes in drupelets andreceptacle of raspberry (Rubus idaeus) J Plant Physiol 179100ndash105

44 JungS CestaroA TroggioM MainD ZhengP ChoIFoltaKM SosinskiB AbbottA CeltonJ-M et al (2012) Wholegenome comparisons of Fragaria Prunus and Malus reveal differentmodes of evolution between Rosaceous subfamilies BMC Genomics13 129

45 Youens-ClarkK FagaB YapIV SteinL and WareD (2009)CMap 101 a comparative mapping application for the InternetBioinformatics 25 3040ndash3042

46 IezzoniA PeaceC MainD BassilN CoeM FinnC GasicKLubyJ HokansonS McFersonJ et al (2017) RosBREED2progress and future plans to enable DNA-informed breeding in theRosaceae Acta Hortic doi1017660ActaHortic201711722

47 ChagneD CrowhurstRN TroggioM DaveyMW GilmoreBLawleyC VanderzandeS HellensRP KumarS CestaroA et al(2012) Genome-wide SNP detection validation and development ofan 8K SNP array for apple PLoS One 7 e31745

48 BiancoL CestaroA SargentDJ BanchiE DerdakS DiGuardoM SalviS JansenJ ViolaR GutI et al (2014)Development and validation of a 20K single nucleotidepolymorphism (SNP) whole genome genotyping array for apple(Malus times domestica Borkh) PLoS One 9 e110377

49 BiancoL CestaroA LinsmithG MurantyH DenanceCTheronA PoncetC MichelettiD KerschbamerE Di PierroEAet al (2016) Development and validation of the Axiom(reg)Apple480K SNP genotyping array Plant J 86 62ndash74

50 PeaceC BassilN MainD FicklinS RosyaraUR StegmeirTSeboltA GilmoreB LawleyC MocklerTC et al (2012)Development and evaluation of a genome-wide 6K SNP array fordiploid sweet cherry and tetraploid sour cherry PLoS One 7 e48305

51 VerdeI BassilN ScalabrinS GilmoreB LawleyCT GasicKMichelettiD RosyaraUR CattonaroF VendraminE et al(2012) Development and evaluation of a 9K SNP array for peach byinternationally coordinated SNP detection and validation in breedinggermplasm PLoS One 7 e35668

52 BassilNV DavisTM ZhangH FicklinS MittmannMWebsterT MahoneyL WoodD AlperinES RosyaraUR et al(2015) Development and preliminary evaluation of a 90 K AxiomregSNP array for the allo-octoploid cultivated strawberry Fragaria timesananassa BMC Genomics 16 155

53 EvansK (2013) Apple Breeding in Pacific Northwest Acta Hortic976 75ndash78

54 RifeTW and PolandJA (2014) Field Book an open-sourceapplication for field data collection on android Crop Sci 541624ndash1627

55 JungS LeeT ChengC-H FicklinS YuJ HumannJ andMainD (2017) Extension modules for storage visualization andquerying of genomic genetic and breeding data in Tripal databasesDatabase (Oxford) 2017 bax092

56 ChenM HenryN AlmsaeedA ZhouX WegrzynJ FicklinSand StatonM (2017) New extension software modules to enhancesearching and display of transcriptome data in Tripal databasesDatabase (Oxford) 2017 bax052

57 HarperL CampbellJ CannonEKS JungS PoelchauMWallsR AndorfC ArnaudE BerardiniTZ BirkettC et al(2018) AgBioData consortium recommendations for sustainablegenomics and genetics databases for agriculture Database (Oxford)2018 bay088