PubChem BioAssay: 2014 update
-
Upload
independent -
Category
Documents
-
view
0 -
download
0
Transcript of PubChem BioAssay: 2014 update
PubChem BioAssay 2014 updateYanli Wang Tugba Suzek Jian Zhang Jiyao Wang Siqian He Tiejun Cheng
Benjamin A Shoemaker Asta Gindulyte and Stephen H Bryant
National Center for Biotechnology Information National Library of Medicine National Institutes of HealthBethesda MD 20894 USA
Received September 12 2013 Revised September 30 2013 Accepted October 1 2013
ABSTRACT
PubChemrsquos BioAssay database (httppubchemncbinlmnihgov) is a public repository for archivingbiological tests of small molecules generatedthrough high-throughput screening experimentsmedicinal chemistry studies chemical biologyresearch and drug discovery programs In additionthe BioAssay database contains data from high-throughput RNA interference screening aimed atidentifying critical genes responsible for a biologicalprocess or disease condition The mission ofPubChem is to serve the community by providingfree and easy access to all deposited data To thisend PubChem BioAssay is integrated into theNational Center for Biotechnology Information re-trieval system making them searchable by Entrezqueries and cross-linked to other biomedicalinformation archived at National Center forBiotechnology Information Moreover PubChemBioAssay provides web-based and programmatictools allowing users to search access and analyzebioassay test results and metadata In this work weprovide an update for the PubChem BioAssayresource such as information content growth newdevelopments supporting data integration andsearch and the recently deployed PubChemUpload to streamline chemical structure andbioassay submissions
INTRODUCTION
The PubChem BioAssay database (httppubchemncbinlmnihgov) (1ndash4) is a public repository for biologicalactivity data of small molecules and RNAi reagentshosted by the National Center for BiotechnologyInformation (NCBI) (5) a division of the NationalLibrary Medicine under the National Institutes ofHealth since 2004 BioAssay test results are linked to thechemical structures of tested small molecules and the
sequencing data of screened RNA interference (RNAi)reagents as available In addition the informationcontent in the BioAssay database is linked to several bio-medical and literature databases hosted at NCBIincluding PubMed Protein Gene NucleotideBioSystems Taxonomy OMIM and protein 3D structureassociated with bioassay targets PubChem is committedto offer biomedical researchers free access to thisinformation BioAssay data can be searched accessedand analyzed by Entrez queries as well as via a suite ofweb-based and programmatic tools provided byPubChem making PubChem a widely used public infor-mation system for accelerating chemical biology researchand drug development Table 1 provides a summary forBioAssay services and the corresponding URLs Most ofthe web-based services can also be accessed at httppubchemncbinlmnihgovassayDeveloping and managing a public archive system for
complex bioassay data has been both challenging and re-warding In the past 9 years PubChem has come a longway to manage the rapidly growing data and meet theincreasing demand from the community PubChem hasbecome a leading public bioassay data repository by (i)supporting broad types of bioactivity information withan optimized bioassay data standard (ii) maintainingsteady enhancement of database infrastructure and scal-ability (iii) providing and enhancing a streamlined dataupload system (iv) integrating with other biomedical in-formation resources and (v) expanding and empoweringsearch retrieval analysis and download tools In thiswork we provide an update on several aspects of the in-formation resource including data content growthdatabase infrastructure consolidation new searchindices project-based bioassay links and newly developedweb services including target-based bioactivity data toolsand the recently deployed PubChem Upload system
BioAssay DATA CONTENT GROWTH
The BioAssay database has been growing substantiallyduring the past years (Figure 1) As of 1 September2013 the BioAssay database has received gt700 000
To whom correspondence should be addressed Tel +1 301 435 7811 Fax +1 301 435 7793 Email ywangncbinlmnihgovCorrespondence may also be addressed to Stephen H Bryant Tel +1 301 435 7792 Fax +1 301 435 7793 Email bryantncbinlmnihgov
Nucleic Acids Research 2013 1ndash8doi101093nargkt978
Published by Oxford University Press 2013 This work is written by US Government employees and is in the public domain in the US
Nucleic Acids Research Advance Access published November 5 2013 at N
ational Institutes of Health L
ibrary on Decem
ber 12 2013httpnaroxfordjournalsorg
Dow
nloaded from
depositions of bioassays (Figure 1A) Counting solely thelatest version of each bioassay record by accession (ieAID) the database contains 200 000 000 bioactivityoutcome summaries (Figure 1B) and 1 200 000 000 datapoints representing biological properties for 2 800 000small molecule samples 1 900 000 chemical structures
and 108 000 RNAi reagents (Figure 1C) This informationrepresents tens of thousands of potential modulators forgt8000 protein targets and 30 000 genes critical for biolo-gical process hence providing rich information onchemical and RNAi tools for chemical and molecularbiology research
Table 1 A list of PubChem BioAssay services
Service Description URL example
BioAssay service home Access a list of BioAssay services httppubchemncbinlmnihgovassay
BioAssay search Search BioAssay database with Entrez httpwwwncbinlmnihgovpcassay
BioAssay search advanced page An interface for searching multiple search fields httpwwwncbinlmnihgovpcassaylimits
BioAssay text search advanced page An interface for reviewing search history andrefining search results with Boolean operation
httpwwwncbinlmnihgovpcassayadvanced
BioAssay summary Access and download a bioassay record http pubchemncbinlmnihgovassayassaycgiaid=myAID
BioAssay data retrieval tool Retrieve a full data table or an active subset froma single bioassay record
http pubchemncbinlmnihgovassayassaydatahtmlaid=myAID
http pubchemncbinlmnihgovassayassaydatahtmlact=actampaid=myAID
BioAssay data selection tool Select a user-defined data subset from a singlebioassay record
http pubchemncbinlmnihgovassayassaycgiq=tampaid=myAID
Bioactivity data tool Retrieve multiple-assay bioactivity data for asingle substance sample (SID) chemical struc-ture (CID) protein target (GI) or gene target(GeneID)
http pubchemncbinlmnihgovassaycgisid=mySID
http pubchemncbinlmnihgovassaycgisid=myCID
http pubchemncbinlmnihgovassaycgisid=myGI
http pubchemncbinlmnihgovassaycgisid=myGeneID
BioActivity summary(compound-centric)
Summarize and analyze bioactivity data for a setof records presented from the compound pointof view
httppubchemncbinlmnihgovassaybioactivitycgitab=1
BioActivity summary (assay-centric) Summarize and analyze bioactivity data for a setof records presented from the assay point ofview
httppubchemncbinlmnihgovassaybioactivitycgitab=2
BioActivity summary (target-centric) Summarize and analyze bioactivity data for a setof records presented from the target point ofview
httppubchemncbinlmnihgovassaybioactivitycgitab=3
Structure-activity relationshipanalysis (SAR)
Analyze and visualize structure-activity relation-ship with clustering tools and a heatmap-styledisplay
httppubchemncbinlmnihgovassayassaycgip=heat
Scatter plothistogram Analyze bioassay test results with histogram orscatter plot
httppubchemncbinlmnihgovassayplotcgiplottype=2
Dose-response curve tool Analyze bioassay test results and visualize dose-response curve
httppubchemncbinlmnihgovassayplotcgiplottype=1
Related BioAssay Summarize bioassay relationship by overlap ofactive compounds target sequence similaritydeposited annotation same publicationcommon pathways and same assay project
httppubchemncbinlmnihgovassayassayHeatmapcgi
PubChem PUGSOAP PubChem programmatic tool for data retrieval httppubchemncbinlmnihgovpugpughelphtml
PUGREST PubChem REST api for data retrieval httppubchemncbinlmnihgovpug_restPUG_RESThtml
Bioassay download tool A flexible download interface httppubchemncbinlmnihgovassayassaydownloadcgi
BioAssay FTP FTP for all PubChem BioAssay records andrelated information
ftpftpncbinlmnihgovpubchemBioassay
BioAssay data standard XML data specification for PubChem BioAssaydata model
ftpftpncbinlmnihgovpubchemdata_spec
PubChem upload Substance and bioassay submission system httppubchemncbinlmnihgovupload
2 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
The content in the PubChem BioAssay database iscontributed by gt50 organizations worldwide includingUS government-funded institutions pharmaceuticalcompanies research laboratories and collaboratorshosting chemical biology databases A summary ofbioassay vendors and submission counts is provided athttppubchemncbinlmnihgovsourcesassayBioAssay datasets added during the past 2 years include (i)small molecule data from screening centers of the NIHMolecular Libraries and Imaging Program [MolecularLibrary Program (MLP)] (httpcommonfundnihgovmolecularlibraries) ICCB-LongwoodNSRB ScreenFacility at the Harvard Medical School (httpiccbmedharvardedu) EPA Tox21 (httpepagovncctTox21)and Milwaukee Institute for Drug Discovery (httpwww4uwmedudrugdiscovery) (ii) a curated datasetrecords from the Meiler Lab at Vanderbilt Universitywhich derives the ultimate bioactivity outcome of asmall molecule by combining multiple bioassay results inPubChem to facilitate cheminformatics studies (6) (iii)curated datasets from literature extraction by IUPHAR-DB (7) and ChEMBL (8) and (iv) small interfering RNA(siRNA) data from Drosophila RNAi Screening CenterICCB-LongwoodNSRB Screening Facility at theHarvard Medical School (httpiccbmedharvardedu)Cancer Research UK Cambridge Research InstituteDepartment of Molecular Cell Biology at WeizmannInstitute of Science Institut National de la Sante et dela Recherche Medicale (INSERM) Peterson Lab atGenentech and ten Dijke Lab at Leiden UniversityMedical Center Many of these newly added siRNAdatasets are associated with recent publications injournals such as Nature Cell Biology (9ndash11) GenomeResearch (12) J Virol (13) Cancer Research (14)PNAS (1516) Nature (17ndash19) Science (2021) andNature Genetics (22) Each of these bioassay records islinked to the corresponding abstract in PubMedallowing PubChem users to track down the publicationeasily Vice versa users of PubMed also gain accessto the corresponding bioassay datasets through thiscross-linkPubChem continues to mirror the ChEMBL database
(8) hosted at the European Bioinformatics InstituteMultiple ChEMBL releases and database changes overthe past 2 years have been incorporated into PubChemRecently added annotations at ChEMBL are recorded viathe Categorized Comment field of the PubChem BioAssaydata model (1) Binding surface ligand and lipophilicligand efficiency indices are added to a bioassay recordas additional test results As a result many of thebioassay records in PubChem have gone throughmultiple updates Annotation for bioactivity outcome(eg active or inactive) is largely missing in theChEMBL datasets hindering their integration with therest of PubChem data and analysis tools In such a casePubChem now assigns bioactivity outcome using a50 mM cutoff based on readouts such as IC50 EC50or Ki allowing a larger portion of the ChEMBL datablended in the PubChem systemF
igure
1Growth
inPubChem
BioAssay(A
)Records
(B)bioactivityoutcomes
(countedbyAID
ndashSID
pair)and(C
)uniquetested
samples
Nucleic Acids Research 2013 3
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
DATABASE INFRASTRUCTURE ENHANCEMENT
A robust and scalable database system is crucial tosupport the rapid growth of PubChem BioAssay A setof relational databases and tables is designed and set upon Microsoft SQL servers to (i) accept bioassay submis-sion from depositors (ii) archive bioassay update withversion control (iii) track embargo status (iv) recordand derive links and relationships among bioassays andother biomedical information (v) provide search indexes(vi) support fast data retrieval and analysis and (vii) facili-tate daily update at the FTP site Challenged by theaccelerated growth of bioassay data content greatefforts have been invested in the past years to enhancethe database infrastructure capacity by both hardwareupgrade and revised database design As a result newservices have been added to the PubChem resourceFurthermore performance in bioassay data retrieval anddownload services have been significantly improvedthereby significantly eliminating a queuing system tominimize the user wait time
DATA INTEGRATION AND NEW WEB SERVICES
The PubChem BioAssay database is fully integrated withother biomedical databases hosted by NCBI and providesa suite of web-based and programmatic tools to supportdata access retrieval analysis and download fromPubChem or cross-linked databases (Table 1) Severalnew services for integrating bioassay target and bioactivitydata or grouping bioassays based on an assay project aredescribed later Other developments that have focused onbehind-the-scene enhancement of data retrieval withoutsignificant web interface change will not be summarizedin this work
Rapid access of bioactivity data for a protein orgene target
PubChem BioAssay closes the gap between molecular andchemical biology research by presenting and linking up in-formation of both chemical and RNAi tools in one systemsupporting the study of gene function and biologicalpathways The majority of small molecule screening datain PubChem are associated with protein targets whileRNAi screening data links each tested reagent to a genePubChem provides multiple mechanisms for cross-referencing protein and gene targets from bioactivitydata (1) As a result a protein or gene may link to manybioactivity datasets It is critical to provide rapid access tosuch multi-assay bioactivity data for these protein and genetargets Such a service provides a unique annotation serviceto the corresponding Entrez Protein or Gene record whichleads users to experimental data from chemical biology andRNAi research enhancing the discoverability of the NCBIEntrez system Toward this end two new services theProtein Target Bioactivity Data Tool and the Gene TargetBioactivity Data Tool were developed respectively toaccess associated bioactivity information in PubChemFrom a protein target record such as G-protein-
coupled receptor (GPCR) 35 (httpwwwncbinlmnihgovproteinNP_0052922) bioactivity data for this
protein target can be accessed by the link lsquoBioAssay byTarget (Summary)rsquo As shown in Figure 2A this ProteinTarget Bioactivity Data Tool draws and identifies eachtested substance together with its bioactivity resultsassay title and a link to detailed data such as dose-response curves The data table is sorted by bioactivityoutcome and potency of the substances by defaultshowing first active data and potent reagents Graphicalfilters are provided at the top of the page allowing one todrill down to a data subset of onersquos interest For examplethis GPCR protein has a lsquoProbersquo filter highlighting threechemical probes discovered by a high-throughputscreening (HTS) project for selective GPR35 antagonists
The bioactivity data for the relevant gene target record(httpwwwncbinlmnihgovgene2859) can be accessedby the link lsquoBioAssay by Target (Summary)rsquo With thisGene Target Bioactivity Data Tool a similar summaryof relevant bioassay activity results is displayed asshown in Figure 2B Note that using a gene identifier inthis case additional data are retrieved including RNAitest results (as indicated with the filter lsquoRNAirsquo shownunder lsquoSubstance Typesrsquo) which indicates that GPR35functions as a cellular gene repressing HPV18 LCR asidentified by a genome-wide siRNA screen This exampleillustrates the power of aggregating bioactivity data acrossdatasets onto a unified display The Gene TargetBioactivity Data Tool is particularly useful for accessingdatasets from multiple depositors and literature-baseddata from many journal articles Moreover it links simul-taneously to findings in chemical biology research andRNAi screenings enabling users to evaluate the biologicalrole of a gene and to identify its small molecular regula-tors using data shown on the same display
BioAssays associated with the same assay project
PubChem tracks the relationships among bioassay recordsas indicated by submitters PubChem has also developedseveral computational methods for identifying additionalbioassay linkages based on target sequence similaritycommon active compounds and biological pathways aswell as datasets abstracted from the same publication(1) To better support decision making PubChem nowclusters and links up bioassays based on assay projectsThis feature aims to use data deposited by a network suchas the NIH MLP and the Tox21 program MLP-fundedscreening laboratories are required to deposit data pro-gressively into PubChem as an assay project continuesIt usually takes months or years to finish an assayproject aimed at developing chemical probe hence oftenmultiple bioassay datasets are submitted to PubChem forthe same project but under distinct accessions (AIDs)These datasets are highly relevant often covering aprimary HTS result follow-ups with dose-response andtoxicity testing or counter screenings against biologicallyrelated targets different cell lines or using different assaymethods PubChem allows submitters to specify such re-lationships via the cross-reference (XRef) data field Onthe other hand it is up to the submitters to provide alllinks as new data are made available As a result cross-references to related bioassay datasets unfortunately may
4 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
Figure 2 Bioactivity data for a (A) protein target and (B) gene target
Nucleic Acids Research 2013 5
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
be lacking or incomplete among many datasets making itdifficult for users to discover these key associationsTo improve this situation it is now a common practice to
create a lsquoSummaryrsquo bioassay at the outset of a multi-assayproject and then link each subsequent-related assay back tothat summary record This means that the submitter onlyneeds to specify a single link for each bioassay record to thesame summary and all other links between related assaysare automatically generated As a result assay projects areindexed on top of the individual records Users visiting anybioassay record can access all relevant datasets of the sameproject without the need for the submitter to specify allconnections As shown in Figure 3 the links to theserelated bioassays are labeled in the BioAssay Summaryservice as lsquoSame Projectrsquo under the lsquoRelated BioAssaysrsquosection The Modulation of the Metabotropic GlutamateReceptor mGluR3 (GRM3) assay (httppubchemncbinlmnihgovassayassaycgiaid=651839) indicates onlyone lsquoDepositor Specifiedrsquo assay whereas eight bioassayrecords were identified as related to the same project bythe new procedure One may see details of the related bio-assays by clicking the link lsquoSame Projectrsquo
PUBLIC ACCESS
BioAssay record and BioAssay summary service
A PubChem BioAssay record can be accessed via theBioAssay Summary service at httppubchemncbinlmnihgovassayassaycgi where myAID is a validBioAssay accession (AID) As shown in Figure 3 for theGRM3 assay (AID 651839) the BioAssay Summaryservice provides (i) full access to submitted informationincluding bioassay protocol descriptions assay dataand cross-references (ii) derived bioassay relationshipsand (iii) tools for evaluating tested compounds studyingSAR or researching target For the lsquoTargetrsquo section alink lsquoMore Bioactivity datarsquo has been recently addedto gather all bioactivity data in PubChem associatedwith the GRM3 target The BioAssay Summary servicenow provides instant access to bioassay data table andenhanced function for data download with improveddatabase infrastructure With the recently launchedPubChem Social Media outreach links to social mediaaccounts are now provided on this page
Figure 3 BioAssay Summary page for bioassay record AID 651839 New and enhanced features are highlighted including fast download instantaccess to data table link to additional bioactivity data targeting GRM3 link to related bioassays on the same project and links to social mediaaccount
6 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
BioAssay search
Keyword search in the PubChem BioAssay database issupported by NCBI Entrez at httpwwwncbinlmnihgovpcassay Textual information in PubChemBioAssay is indexed under numerous fields Anadvanced interface is provided at httpwwwncbinlmnihgovpcassaylimits (Limits page) to access multipleindices and filters (1) Based on information provided incategorized comment fields and keywords in the title of abioassay record new filters were added to support theidentification of records containing (i) biochemical assay(ii) cell-based assay (iii) proteinndashprotein interaction bio-activity and (iv) in vivo or in vitro assay A newly addedmenu lsquoAssay Projectrsquo can be used to select an assay projectand accessing related datasets ChEMBL depositor infor-mation is also indexed to support sub-setting ChEMBLrecords As a result although httpwwwncbinlmnihgovpcassayterm=ChEMBL[sourcename] retrieves allChEMBL bioassays in PubChem httpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3AScientific+Literature225BSourceName5D[SourceName] re-trieves literature-based records from ChEMBL andhttpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3ASt+Jude+Malaria+Screening225BSourceName5D[SourceName] retrieves ChEMBL records de-posited by St Jude Malaria Screening
PubChem BioAssay FTP AND DOWNLOAD
PubChem provides multiple services for users todownload bioassay records which have been describedpreviously (1) This primarily includes (i) an enhanceddownload function at the Summary service (shown inFigure 3) (ii) a web-based BioAssay download serviceat httppubchemncbinlmnihgovassayassaydownloadcgi with a flexible interface supporting full or partial datadownload by specifying bioassay accessions (AIDs) andtested substance accessions (SIDs) and (iii) daily updatedPubChem BioAssay FTP at ftpftpncbinlmnihgovpubchemBioassay providing open access to all bioassaydatasets While the primary FTP structure remains thesame one new FTP directory lsquoExtrasrsquo is added to offeradditional information of the BioAssay resource In thisfolder the file lsquoCid2BioactivityLinkrsquo provides a list oftested compounds and the corresponding URLs linkingto associated bioactivity data Similarly thelsquoGi2BioactivityLinkrsquo and lsquoGeneid2BioactivityLinkrsquo filesprovide the list of the corresponding bioactivity datalinks for protein and gene targets respectively ThelsquoAid2GiGeneidrsquo contains all the bioassay (AID) pro-tein target (GI) and gene target (Gene ID) associationsin the BioAssay database Also a file for assayproject-based related bioassays is added to the directoryat ftpftpncbinlmnihgovpubchemBioassayAssayNeighbors Column headers for the comma-separatedvalues (CSV) format has been modified to provide con-sistency among multiple download methods (ftpftpncbinlmnihgovpubchemBioassayCSVREADME)Readout names are now provided in CSV files to ease dataparsing and interpretation In addition PubChem PUG
SOAP (httppubchemncbinlmnihgovpugpughelphtml) and PUGREST (httppubchemncbinlmnihgovpug_restPUG_RESThtml) facilities are being de-veloped to support programmatic retrieval of bioassayinformation
PubChem UPLOAD FOR BioAssay SUBMISSION
As a public repository handling diverse and vast amountsof chemical structure and bioassay data it is critical forPubChem to provide an efficient and user-friendly way toupload data The recently released PubChem Upload(httppubchemncbinlmnihgovupload) makes use ofadvances in web technologies to offer streamlinedsupport for data submissions and updates to theSubstance and BioAssay databases PubChem Uploadsupports all functionalities and data exchange formats ofits predecessor (1) Furthermore it provides an extensiveset of wizards inline help tips and tutorials for guidingsubmitters to enter assay data and descriptive informa-tion More specifically the new assay submissioncapabilities offered by PubChem Upload include (i)bioassay submission wizards to assist novice users forboth small molecule and RNAi screenings (ii) improveduser interface response to complex input with newer webtechnology (iii) simplified new user registration upgradesfor production user accounts (iv) improved helpincluding hints built into user interface and tutorial (v)extensive PubChem bioassay templates for new submis-sions or for record updates (vi) full editing and integra-tion of assay data and description tables and (vii)expanded importexport handling of spreadsheets forassays A detailed help document tutorial and samplesubmission templates for PubChem Upload are availableat httppubchemncbinlmnihgovuploaddocsupload_helphtml httppubchemncbinlmnihgovuploadtutorial and httppubchemncbinlmnihgovuploaddocsupload_helphtmlAssaySubmission respectively Adetailed description of PubChem Upload will be providedin a separate article
SUMMARY
PubChem is committed to serve as a public repository forbioactivity data of small molecules and RNAi PubChemalso provides an integrated information platform with asuite of tools allowing users to query analyze anddownload all database content PubChem will continueto improve services and tools as technology advancesand to further integrate the information it contains tothird party annotations and other public biomedicaldata With the support of open access to the data andthe delivery of the new Upload system PubChemwelcomes the community to use the resource and to con-tribute data content to the repository
ACKNOWLEDGEMENTS
The authors thank all submitters who have contributeddata to PubChem and the rest of the PubChem team fortheir support
Nucleic Acids Research 2013 7
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
FUNDING
The NIH Intramural Research program Funding foropen access charge National Insitutes of Health USA
Conflict of interest statement None declared
REFERENCES
1 WangY XiaoJ SuzekTO ZhangJ WangJ ZhouZHanL KarapetyanK DrachevaS ShoemakerBA et al(2012) PubChemrsquos BioAssay database Nucleic Acids Res 40D400ndashD412
2 WangY BoltonE DrachevaS KarapetyanKShoemakerBA SuzekTO WangJ XiaoJ ZhangJ andBryantSH (2010) An overview of the PubChem BioAssayresource Nucleic Acids Res 38 D255ndashD266
3 WangY XiaoJ SuzekTO ZhangJ WangJ and BryantSH(2009) PubChem a public information system for analyzingbioactivities of small molecules Nucleic Acids Res 37W623ndashW633
4 BoltonEE WangY ThiessenPA and BryantSH (2008)PubChem integrated platform of small molecules and biologicalactivities Annu Rep Comput Chem 4 217ndash241
5 SayersEW BarrettT BensonDA BoltonE BryantSHCaneseK ChetverninV ChurchDM DiCuccioMFederhenS et al (2011) Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 39D38ndashD51
6 ButkiewiczM LoweEW Jr MuellerR MendenhallJLTeixeiraPL WeaverCD and MeilerJ (2013) Benchmarkingligand-based virtual high-throughput screening with the PubChemdatabase Molecules 18 735ndash756
7 SharmanJL BensonHE PawsonAJ LukitoVMpamhangaCP BombailV DavenportAP PetersJASpeddingM and HarmarAJ (2013) IUPHAR-DB updateddatabase content and new features Nucleic Acids Res 41D1083ndashD1088
8 GaultonA BellisLJ BentoAP ChambersJ DaviesMHerseyA LightY McGlincheyS MichalovichD Al-LazikaniB et al (2012) ChEMBL a large-scale bioactivitydatabase for drug discovery Nucleic Acids Res 40D1100ndashD1107
9 MulderKW WangX EscriuC ItoY SchwarzRF GillisJSirokmanyG DonatiG Uribe-LewisS PavlidisP et al (2012)Diverse epigenetic strategies interact to control epidermaldifferentiation Nat Cell Biol 14 753ndash763
10 ChihB LiuP ChinnY ChalouniC KomuvesLG HassPESandovalW and PetersonAS (2012) A ciliopathy complex atthe transition zone protects the cilia as a privileged membranedomain Nat Cell Biol 14 61ndash72
11 Prager-KhoutorskyM LichtensteinA KrishnanRRajendranK MayoA KamZ GeigerB and BershadskyAD(2011) Fibroblast polarization is a matrix-rigidity-dependentprocess controlled by focal adhesion mechanosensing Nat CellBiol 13 1457ndash1465
12 Imberg-KazdanK HaS GreenfieldA PoultneyCSBonneauR LoganSK and GarabedianMJ (2013) A genome-wide RNA interference screen identifies new regulators ofandrogen receptor function in prostate cancer cells Genome Res23 581ndash591
13 PowellML SmithJA SowaME HarperJW IftnerTStubenrauchF and HowleyPM (2010) NCoR1 mediatespapillomavirus E8E2C transcriptional repression J Virol 844451ndash4460
14 GalluzziL MorselliE VitaleI KeppO SenovillaLCriolloA ServantN PaccardC HupeP RobertT et al(2010) miR-181a and miR-630 regulate cisplatin-induced cancercell death Cancer Res 70 1793ndash1803
15 SmithJA WhiteEA SowaME PowellML OttingerMHarperJW and HowleyPM (2010) Genome-wide siRNA screenidentifies SMCX EP400 and Brd4 as E2-dependent regulators ofhuman papillomavirus oncogene expression Proc Natl Acad SciUSA 107 3752ndash3757
16 ZhangSL YerominAV ZhangXH YuY SafrinaOPennaA RoosJ StaudermanKA and CahalanMD (2006)Genome-wide RNAi screen of Ca(2+) influx identifies genes thatregulate Ca(2+) release-activated Ca(2+) channel activity ProcNatl Acad Sci USA 103 9357ndash9362
17 FriedmanA and PerrimonN (2006) A functional RNAi screenfor regulators of receptor tyrosine kinase and ERK signallingNature 444 230ndash234
18 GwackY SharmaS NardoneJ TanasaB IugaASrikanthS OkamuraH BoltonD FeskeS HoganPG et al(2006) A genome-wide Drosophila RNAi screen identifies DYRK-family kinases as regulators of NFAT Nature 441 646ndash650
19 BardF CasanoL MallabiabarrenaA WallaceE SaitoKKitayamaH GuizzuntiG HuY WendlerF DasguptaRet al (2006) Functional genomics reveals genes involved inprotein secretion and Golgi organization Nature 439 604ndash607
20 VigM PeineltC BeckA KoomoaDL RabahD Koblan-HubersonM KraftS TurnerH FleigA PennerR et al(2006) CRACM1 is a plasma membrane protein essential forstore-operated Ca2+ entry Science 312 1220ndash1223
21 DasGuptaR KaykasA MoonRT and PerrimonN (2005)Functional genomic analysis of the Wnt-wingless signalingpathway Science 308 826ndash833
22 NybakkenK VokesSA LinTY McMahonAP andPerrimonN (2005) A genome-wide RNA interference screen inDrosophila melanogaster cells for new components of the Hhsignaling pathway Nat Genet 37 1323ndash1332
8 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
depositions of bioassays (Figure 1A) Counting solely thelatest version of each bioassay record by accession (ieAID) the database contains 200 000 000 bioactivityoutcome summaries (Figure 1B) and 1 200 000 000 datapoints representing biological properties for 2 800 000small molecule samples 1 900 000 chemical structures
and 108 000 RNAi reagents (Figure 1C) This informationrepresents tens of thousands of potential modulators forgt8000 protein targets and 30 000 genes critical for biolo-gical process hence providing rich information onchemical and RNAi tools for chemical and molecularbiology research
Table 1 A list of PubChem BioAssay services
Service Description URL example
BioAssay service home Access a list of BioAssay services httppubchemncbinlmnihgovassay
BioAssay search Search BioAssay database with Entrez httpwwwncbinlmnihgovpcassay
BioAssay search advanced page An interface for searching multiple search fields httpwwwncbinlmnihgovpcassaylimits
BioAssay text search advanced page An interface for reviewing search history andrefining search results with Boolean operation
httpwwwncbinlmnihgovpcassayadvanced
BioAssay summary Access and download a bioassay record http pubchemncbinlmnihgovassayassaycgiaid=myAID
BioAssay data retrieval tool Retrieve a full data table or an active subset froma single bioassay record
http pubchemncbinlmnihgovassayassaydatahtmlaid=myAID
http pubchemncbinlmnihgovassayassaydatahtmlact=actampaid=myAID
BioAssay data selection tool Select a user-defined data subset from a singlebioassay record
http pubchemncbinlmnihgovassayassaycgiq=tampaid=myAID
Bioactivity data tool Retrieve multiple-assay bioactivity data for asingle substance sample (SID) chemical struc-ture (CID) protein target (GI) or gene target(GeneID)
http pubchemncbinlmnihgovassaycgisid=mySID
http pubchemncbinlmnihgovassaycgisid=myCID
http pubchemncbinlmnihgovassaycgisid=myGI
http pubchemncbinlmnihgovassaycgisid=myGeneID
BioActivity summary(compound-centric)
Summarize and analyze bioactivity data for a setof records presented from the compound pointof view
httppubchemncbinlmnihgovassaybioactivitycgitab=1
BioActivity summary (assay-centric) Summarize and analyze bioactivity data for a setof records presented from the assay point ofview
httppubchemncbinlmnihgovassaybioactivitycgitab=2
BioActivity summary (target-centric) Summarize and analyze bioactivity data for a setof records presented from the target point ofview
httppubchemncbinlmnihgovassaybioactivitycgitab=3
Structure-activity relationshipanalysis (SAR)
Analyze and visualize structure-activity relation-ship with clustering tools and a heatmap-styledisplay
httppubchemncbinlmnihgovassayassaycgip=heat
Scatter plothistogram Analyze bioassay test results with histogram orscatter plot
httppubchemncbinlmnihgovassayplotcgiplottype=2
Dose-response curve tool Analyze bioassay test results and visualize dose-response curve
httppubchemncbinlmnihgovassayplotcgiplottype=1
Related BioAssay Summarize bioassay relationship by overlap ofactive compounds target sequence similaritydeposited annotation same publicationcommon pathways and same assay project
httppubchemncbinlmnihgovassayassayHeatmapcgi
PubChem PUGSOAP PubChem programmatic tool for data retrieval httppubchemncbinlmnihgovpugpughelphtml
PUGREST PubChem REST api for data retrieval httppubchemncbinlmnihgovpug_restPUG_RESThtml
Bioassay download tool A flexible download interface httppubchemncbinlmnihgovassayassaydownloadcgi
BioAssay FTP FTP for all PubChem BioAssay records andrelated information
ftpftpncbinlmnihgovpubchemBioassay
BioAssay data standard XML data specification for PubChem BioAssaydata model
ftpftpncbinlmnihgovpubchemdata_spec
PubChem upload Substance and bioassay submission system httppubchemncbinlmnihgovupload
2 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
The content in the PubChem BioAssay database iscontributed by gt50 organizations worldwide includingUS government-funded institutions pharmaceuticalcompanies research laboratories and collaboratorshosting chemical biology databases A summary ofbioassay vendors and submission counts is provided athttppubchemncbinlmnihgovsourcesassayBioAssay datasets added during the past 2 years include (i)small molecule data from screening centers of the NIHMolecular Libraries and Imaging Program [MolecularLibrary Program (MLP)] (httpcommonfundnihgovmolecularlibraries) ICCB-LongwoodNSRB ScreenFacility at the Harvard Medical School (httpiccbmedharvardedu) EPA Tox21 (httpepagovncctTox21)and Milwaukee Institute for Drug Discovery (httpwww4uwmedudrugdiscovery) (ii) a curated datasetrecords from the Meiler Lab at Vanderbilt Universitywhich derives the ultimate bioactivity outcome of asmall molecule by combining multiple bioassay results inPubChem to facilitate cheminformatics studies (6) (iii)curated datasets from literature extraction by IUPHAR-DB (7) and ChEMBL (8) and (iv) small interfering RNA(siRNA) data from Drosophila RNAi Screening CenterICCB-LongwoodNSRB Screening Facility at theHarvard Medical School (httpiccbmedharvardedu)Cancer Research UK Cambridge Research InstituteDepartment of Molecular Cell Biology at WeizmannInstitute of Science Institut National de la Sante et dela Recherche Medicale (INSERM) Peterson Lab atGenentech and ten Dijke Lab at Leiden UniversityMedical Center Many of these newly added siRNAdatasets are associated with recent publications injournals such as Nature Cell Biology (9ndash11) GenomeResearch (12) J Virol (13) Cancer Research (14)PNAS (1516) Nature (17ndash19) Science (2021) andNature Genetics (22) Each of these bioassay records islinked to the corresponding abstract in PubMedallowing PubChem users to track down the publicationeasily Vice versa users of PubMed also gain accessto the corresponding bioassay datasets through thiscross-linkPubChem continues to mirror the ChEMBL database
(8) hosted at the European Bioinformatics InstituteMultiple ChEMBL releases and database changes overthe past 2 years have been incorporated into PubChemRecently added annotations at ChEMBL are recorded viathe Categorized Comment field of the PubChem BioAssaydata model (1) Binding surface ligand and lipophilicligand efficiency indices are added to a bioassay recordas additional test results As a result many of thebioassay records in PubChem have gone throughmultiple updates Annotation for bioactivity outcome(eg active or inactive) is largely missing in theChEMBL datasets hindering their integration with therest of PubChem data and analysis tools In such a casePubChem now assigns bioactivity outcome using a50 mM cutoff based on readouts such as IC50 EC50or Ki allowing a larger portion of the ChEMBL datablended in the PubChem systemF
igure
1Growth
inPubChem
BioAssay(A
)Records
(B)bioactivityoutcomes
(countedbyAID
ndashSID
pair)and(C
)uniquetested
samples
Nucleic Acids Research 2013 3
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
DATABASE INFRASTRUCTURE ENHANCEMENT
A robust and scalable database system is crucial tosupport the rapid growth of PubChem BioAssay A setof relational databases and tables is designed and set upon Microsoft SQL servers to (i) accept bioassay submis-sion from depositors (ii) archive bioassay update withversion control (iii) track embargo status (iv) recordand derive links and relationships among bioassays andother biomedical information (v) provide search indexes(vi) support fast data retrieval and analysis and (vii) facili-tate daily update at the FTP site Challenged by theaccelerated growth of bioassay data content greatefforts have been invested in the past years to enhancethe database infrastructure capacity by both hardwareupgrade and revised database design As a result newservices have been added to the PubChem resourceFurthermore performance in bioassay data retrieval anddownload services have been significantly improvedthereby significantly eliminating a queuing system tominimize the user wait time
DATA INTEGRATION AND NEW WEB SERVICES
The PubChem BioAssay database is fully integrated withother biomedical databases hosted by NCBI and providesa suite of web-based and programmatic tools to supportdata access retrieval analysis and download fromPubChem or cross-linked databases (Table 1) Severalnew services for integrating bioassay target and bioactivitydata or grouping bioassays based on an assay project aredescribed later Other developments that have focused onbehind-the-scene enhancement of data retrieval withoutsignificant web interface change will not be summarizedin this work
Rapid access of bioactivity data for a protein orgene target
PubChem BioAssay closes the gap between molecular andchemical biology research by presenting and linking up in-formation of both chemical and RNAi tools in one systemsupporting the study of gene function and biologicalpathways The majority of small molecule screening datain PubChem are associated with protein targets whileRNAi screening data links each tested reagent to a genePubChem provides multiple mechanisms for cross-referencing protein and gene targets from bioactivitydata (1) As a result a protein or gene may link to manybioactivity datasets It is critical to provide rapid access tosuch multi-assay bioactivity data for these protein and genetargets Such a service provides a unique annotation serviceto the corresponding Entrez Protein or Gene record whichleads users to experimental data from chemical biology andRNAi research enhancing the discoverability of the NCBIEntrez system Toward this end two new services theProtein Target Bioactivity Data Tool and the Gene TargetBioactivity Data Tool were developed respectively toaccess associated bioactivity information in PubChemFrom a protein target record such as G-protein-
coupled receptor (GPCR) 35 (httpwwwncbinlmnihgovproteinNP_0052922) bioactivity data for this
protein target can be accessed by the link lsquoBioAssay byTarget (Summary)rsquo As shown in Figure 2A this ProteinTarget Bioactivity Data Tool draws and identifies eachtested substance together with its bioactivity resultsassay title and a link to detailed data such as dose-response curves The data table is sorted by bioactivityoutcome and potency of the substances by defaultshowing first active data and potent reagents Graphicalfilters are provided at the top of the page allowing one todrill down to a data subset of onersquos interest For examplethis GPCR protein has a lsquoProbersquo filter highlighting threechemical probes discovered by a high-throughputscreening (HTS) project for selective GPR35 antagonists
The bioactivity data for the relevant gene target record(httpwwwncbinlmnihgovgene2859) can be accessedby the link lsquoBioAssay by Target (Summary)rsquo With thisGene Target Bioactivity Data Tool a similar summaryof relevant bioassay activity results is displayed asshown in Figure 2B Note that using a gene identifier inthis case additional data are retrieved including RNAitest results (as indicated with the filter lsquoRNAirsquo shownunder lsquoSubstance Typesrsquo) which indicates that GPR35functions as a cellular gene repressing HPV18 LCR asidentified by a genome-wide siRNA screen This exampleillustrates the power of aggregating bioactivity data acrossdatasets onto a unified display The Gene TargetBioactivity Data Tool is particularly useful for accessingdatasets from multiple depositors and literature-baseddata from many journal articles Moreover it links simul-taneously to findings in chemical biology research andRNAi screenings enabling users to evaluate the biologicalrole of a gene and to identify its small molecular regula-tors using data shown on the same display
BioAssays associated with the same assay project
PubChem tracks the relationships among bioassay recordsas indicated by submitters PubChem has also developedseveral computational methods for identifying additionalbioassay linkages based on target sequence similaritycommon active compounds and biological pathways aswell as datasets abstracted from the same publication(1) To better support decision making PubChem nowclusters and links up bioassays based on assay projectsThis feature aims to use data deposited by a network suchas the NIH MLP and the Tox21 program MLP-fundedscreening laboratories are required to deposit data pro-gressively into PubChem as an assay project continuesIt usually takes months or years to finish an assayproject aimed at developing chemical probe hence oftenmultiple bioassay datasets are submitted to PubChem forthe same project but under distinct accessions (AIDs)These datasets are highly relevant often covering aprimary HTS result follow-ups with dose-response andtoxicity testing or counter screenings against biologicallyrelated targets different cell lines or using different assaymethods PubChem allows submitters to specify such re-lationships via the cross-reference (XRef) data field Onthe other hand it is up to the submitters to provide alllinks as new data are made available As a result cross-references to related bioassay datasets unfortunately may
4 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
Figure 2 Bioactivity data for a (A) protein target and (B) gene target
Nucleic Acids Research 2013 5
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
be lacking or incomplete among many datasets making itdifficult for users to discover these key associationsTo improve this situation it is now a common practice to
create a lsquoSummaryrsquo bioassay at the outset of a multi-assayproject and then link each subsequent-related assay back tothat summary record This means that the submitter onlyneeds to specify a single link for each bioassay record to thesame summary and all other links between related assaysare automatically generated As a result assay projects areindexed on top of the individual records Users visiting anybioassay record can access all relevant datasets of the sameproject without the need for the submitter to specify allconnections As shown in Figure 3 the links to theserelated bioassays are labeled in the BioAssay Summaryservice as lsquoSame Projectrsquo under the lsquoRelated BioAssaysrsquosection The Modulation of the Metabotropic GlutamateReceptor mGluR3 (GRM3) assay (httppubchemncbinlmnihgovassayassaycgiaid=651839) indicates onlyone lsquoDepositor Specifiedrsquo assay whereas eight bioassayrecords were identified as related to the same project bythe new procedure One may see details of the related bio-assays by clicking the link lsquoSame Projectrsquo
PUBLIC ACCESS
BioAssay record and BioAssay summary service
A PubChem BioAssay record can be accessed via theBioAssay Summary service at httppubchemncbinlmnihgovassayassaycgi where myAID is a validBioAssay accession (AID) As shown in Figure 3 for theGRM3 assay (AID 651839) the BioAssay Summaryservice provides (i) full access to submitted informationincluding bioassay protocol descriptions assay dataand cross-references (ii) derived bioassay relationshipsand (iii) tools for evaluating tested compounds studyingSAR or researching target For the lsquoTargetrsquo section alink lsquoMore Bioactivity datarsquo has been recently addedto gather all bioactivity data in PubChem associatedwith the GRM3 target The BioAssay Summary servicenow provides instant access to bioassay data table andenhanced function for data download with improveddatabase infrastructure With the recently launchedPubChem Social Media outreach links to social mediaaccounts are now provided on this page
Figure 3 BioAssay Summary page for bioassay record AID 651839 New and enhanced features are highlighted including fast download instantaccess to data table link to additional bioactivity data targeting GRM3 link to related bioassays on the same project and links to social mediaaccount
6 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
BioAssay search
Keyword search in the PubChem BioAssay database issupported by NCBI Entrez at httpwwwncbinlmnihgovpcassay Textual information in PubChemBioAssay is indexed under numerous fields Anadvanced interface is provided at httpwwwncbinlmnihgovpcassaylimits (Limits page) to access multipleindices and filters (1) Based on information provided incategorized comment fields and keywords in the title of abioassay record new filters were added to support theidentification of records containing (i) biochemical assay(ii) cell-based assay (iii) proteinndashprotein interaction bio-activity and (iv) in vivo or in vitro assay A newly addedmenu lsquoAssay Projectrsquo can be used to select an assay projectand accessing related datasets ChEMBL depositor infor-mation is also indexed to support sub-setting ChEMBLrecords As a result although httpwwwncbinlmnihgovpcassayterm=ChEMBL[sourcename] retrieves allChEMBL bioassays in PubChem httpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3AScientific+Literature225BSourceName5D[SourceName] re-trieves literature-based records from ChEMBL andhttpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3ASt+Jude+Malaria+Screening225BSourceName5D[SourceName] retrieves ChEMBL records de-posited by St Jude Malaria Screening
PubChem BioAssay FTP AND DOWNLOAD
PubChem provides multiple services for users todownload bioassay records which have been describedpreviously (1) This primarily includes (i) an enhanceddownload function at the Summary service (shown inFigure 3) (ii) a web-based BioAssay download serviceat httppubchemncbinlmnihgovassayassaydownloadcgi with a flexible interface supporting full or partial datadownload by specifying bioassay accessions (AIDs) andtested substance accessions (SIDs) and (iii) daily updatedPubChem BioAssay FTP at ftpftpncbinlmnihgovpubchemBioassay providing open access to all bioassaydatasets While the primary FTP structure remains thesame one new FTP directory lsquoExtrasrsquo is added to offeradditional information of the BioAssay resource In thisfolder the file lsquoCid2BioactivityLinkrsquo provides a list oftested compounds and the corresponding URLs linkingto associated bioactivity data Similarly thelsquoGi2BioactivityLinkrsquo and lsquoGeneid2BioactivityLinkrsquo filesprovide the list of the corresponding bioactivity datalinks for protein and gene targets respectively ThelsquoAid2GiGeneidrsquo contains all the bioassay (AID) pro-tein target (GI) and gene target (Gene ID) associationsin the BioAssay database Also a file for assayproject-based related bioassays is added to the directoryat ftpftpncbinlmnihgovpubchemBioassayAssayNeighbors Column headers for the comma-separatedvalues (CSV) format has been modified to provide con-sistency among multiple download methods (ftpftpncbinlmnihgovpubchemBioassayCSVREADME)Readout names are now provided in CSV files to ease dataparsing and interpretation In addition PubChem PUG
SOAP (httppubchemncbinlmnihgovpugpughelphtml) and PUGREST (httppubchemncbinlmnihgovpug_restPUG_RESThtml) facilities are being de-veloped to support programmatic retrieval of bioassayinformation
PubChem UPLOAD FOR BioAssay SUBMISSION
As a public repository handling diverse and vast amountsof chemical structure and bioassay data it is critical forPubChem to provide an efficient and user-friendly way toupload data The recently released PubChem Upload(httppubchemncbinlmnihgovupload) makes use ofadvances in web technologies to offer streamlinedsupport for data submissions and updates to theSubstance and BioAssay databases PubChem Uploadsupports all functionalities and data exchange formats ofits predecessor (1) Furthermore it provides an extensiveset of wizards inline help tips and tutorials for guidingsubmitters to enter assay data and descriptive informa-tion More specifically the new assay submissioncapabilities offered by PubChem Upload include (i)bioassay submission wizards to assist novice users forboth small molecule and RNAi screenings (ii) improveduser interface response to complex input with newer webtechnology (iii) simplified new user registration upgradesfor production user accounts (iv) improved helpincluding hints built into user interface and tutorial (v)extensive PubChem bioassay templates for new submis-sions or for record updates (vi) full editing and integra-tion of assay data and description tables and (vii)expanded importexport handling of spreadsheets forassays A detailed help document tutorial and samplesubmission templates for PubChem Upload are availableat httppubchemncbinlmnihgovuploaddocsupload_helphtml httppubchemncbinlmnihgovuploadtutorial and httppubchemncbinlmnihgovuploaddocsupload_helphtmlAssaySubmission respectively Adetailed description of PubChem Upload will be providedin a separate article
SUMMARY
PubChem is committed to serve as a public repository forbioactivity data of small molecules and RNAi PubChemalso provides an integrated information platform with asuite of tools allowing users to query analyze anddownload all database content PubChem will continueto improve services and tools as technology advancesand to further integrate the information it contains tothird party annotations and other public biomedicaldata With the support of open access to the data andthe delivery of the new Upload system PubChemwelcomes the community to use the resource and to con-tribute data content to the repository
ACKNOWLEDGEMENTS
The authors thank all submitters who have contributeddata to PubChem and the rest of the PubChem team fortheir support
Nucleic Acids Research 2013 7
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
FUNDING
The NIH Intramural Research program Funding foropen access charge National Insitutes of Health USA
Conflict of interest statement None declared
REFERENCES
1 WangY XiaoJ SuzekTO ZhangJ WangJ ZhouZHanL KarapetyanK DrachevaS ShoemakerBA et al(2012) PubChemrsquos BioAssay database Nucleic Acids Res 40D400ndashD412
2 WangY BoltonE DrachevaS KarapetyanKShoemakerBA SuzekTO WangJ XiaoJ ZhangJ andBryantSH (2010) An overview of the PubChem BioAssayresource Nucleic Acids Res 38 D255ndashD266
3 WangY XiaoJ SuzekTO ZhangJ WangJ and BryantSH(2009) PubChem a public information system for analyzingbioactivities of small molecules Nucleic Acids Res 37W623ndashW633
4 BoltonEE WangY ThiessenPA and BryantSH (2008)PubChem integrated platform of small molecules and biologicalactivities Annu Rep Comput Chem 4 217ndash241
5 SayersEW BarrettT BensonDA BoltonE BryantSHCaneseK ChetverninV ChurchDM DiCuccioMFederhenS et al (2011) Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 39D38ndashD51
6 ButkiewiczM LoweEW Jr MuellerR MendenhallJLTeixeiraPL WeaverCD and MeilerJ (2013) Benchmarkingligand-based virtual high-throughput screening with the PubChemdatabase Molecules 18 735ndash756
7 SharmanJL BensonHE PawsonAJ LukitoVMpamhangaCP BombailV DavenportAP PetersJASpeddingM and HarmarAJ (2013) IUPHAR-DB updateddatabase content and new features Nucleic Acids Res 41D1083ndashD1088
8 GaultonA BellisLJ BentoAP ChambersJ DaviesMHerseyA LightY McGlincheyS MichalovichD Al-LazikaniB et al (2012) ChEMBL a large-scale bioactivitydatabase for drug discovery Nucleic Acids Res 40D1100ndashD1107
9 MulderKW WangX EscriuC ItoY SchwarzRF GillisJSirokmanyG DonatiG Uribe-LewisS PavlidisP et al (2012)Diverse epigenetic strategies interact to control epidermaldifferentiation Nat Cell Biol 14 753ndash763
10 ChihB LiuP ChinnY ChalouniC KomuvesLG HassPESandovalW and PetersonAS (2012) A ciliopathy complex atthe transition zone protects the cilia as a privileged membranedomain Nat Cell Biol 14 61ndash72
11 Prager-KhoutorskyM LichtensteinA KrishnanRRajendranK MayoA KamZ GeigerB and BershadskyAD(2011) Fibroblast polarization is a matrix-rigidity-dependentprocess controlled by focal adhesion mechanosensing Nat CellBiol 13 1457ndash1465
12 Imberg-KazdanK HaS GreenfieldA PoultneyCSBonneauR LoganSK and GarabedianMJ (2013) A genome-wide RNA interference screen identifies new regulators ofandrogen receptor function in prostate cancer cells Genome Res23 581ndash591
13 PowellML SmithJA SowaME HarperJW IftnerTStubenrauchF and HowleyPM (2010) NCoR1 mediatespapillomavirus E8E2C transcriptional repression J Virol 844451ndash4460
14 GalluzziL MorselliE VitaleI KeppO SenovillaLCriolloA ServantN PaccardC HupeP RobertT et al(2010) miR-181a and miR-630 regulate cisplatin-induced cancercell death Cancer Res 70 1793ndash1803
15 SmithJA WhiteEA SowaME PowellML OttingerMHarperJW and HowleyPM (2010) Genome-wide siRNA screenidentifies SMCX EP400 and Brd4 as E2-dependent regulators ofhuman papillomavirus oncogene expression Proc Natl Acad SciUSA 107 3752ndash3757
16 ZhangSL YerominAV ZhangXH YuY SafrinaOPennaA RoosJ StaudermanKA and CahalanMD (2006)Genome-wide RNAi screen of Ca(2+) influx identifies genes thatregulate Ca(2+) release-activated Ca(2+) channel activity ProcNatl Acad Sci USA 103 9357ndash9362
17 FriedmanA and PerrimonN (2006) A functional RNAi screenfor regulators of receptor tyrosine kinase and ERK signallingNature 444 230ndash234
18 GwackY SharmaS NardoneJ TanasaB IugaASrikanthS OkamuraH BoltonD FeskeS HoganPG et al(2006) A genome-wide Drosophila RNAi screen identifies DYRK-family kinases as regulators of NFAT Nature 441 646ndash650
19 BardF CasanoL MallabiabarrenaA WallaceE SaitoKKitayamaH GuizzuntiG HuY WendlerF DasguptaRet al (2006) Functional genomics reveals genes involved inprotein secretion and Golgi organization Nature 439 604ndash607
20 VigM PeineltC BeckA KoomoaDL RabahD Koblan-HubersonM KraftS TurnerH FleigA PennerR et al(2006) CRACM1 is a plasma membrane protein essential forstore-operated Ca2+ entry Science 312 1220ndash1223
21 DasGuptaR KaykasA MoonRT and PerrimonN (2005)Functional genomic analysis of the Wnt-wingless signalingpathway Science 308 826ndash833
22 NybakkenK VokesSA LinTY McMahonAP andPerrimonN (2005) A genome-wide RNA interference screen inDrosophila melanogaster cells for new components of the Hhsignaling pathway Nat Genet 37 1323ndash1332
8 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
The content in the PubChem BioAssay database iscontributed by gt50 organizations worldwide includingUS government-funded institutions pharmaceuticalcompanies research laboratories and collaboratorshosting chemical biology databases A summary ofbioassay vendors and submission counts is provided athttppubchemncbinlmnihgovsourcesassayBioAssay datasets added during the past 2 years include (i)small molecule data from screening centers of the NIHMolecular Libraries and Imaging Program [MolecularLibrary Program (MLP)] (httpcommonfundnihgovmolecularlibraries) ICCB-LongwoodNSRB ScreenFacility at the Harvard Medical School (httpiccbmedharvardedu) EPA Tox21 (httpepagovncctTox21)and Milwaukee Institute for Drug Discovery (httpwww4uwmedudrugdiscovery) (ii) a curated datasetrecords from the Meiler Lab at Vanderbilt Universitywhich derives the ultimate bioactivity outcome of asmall molecule by combining multiple bioassay results inPubChem to facilitate cheminformatics studies (6) (iii)curated datasets from literature extraction by IUPHAR-DB (7) and ChEMBL (8) and (iv) small interfering RNA(siRNA) data from Drosophila RNAi Screening CenterICCB-LongwoodNSRB Screening Facility at theHarvard Medical School (httpiccbmedharvardedu)Cancer Research UK Cambridge Research InstituteDepartment of Molecular Cell Biology at WeizmannInstitute of Science Institut National de la Sante et dela Recherche Medicale (INSERM) Peterson Lab atGenentech and ten Dijke Lab at Leiden UniversityMedical Center Many of these newly added siRNAdatasets are associated with recent publications injournals such as Nature Cell Biology (9ndash11) GenomeResearch (12) J Virol (13) Cancer Research (14)PNAS (1516) Nature (17ndash19) Science (2021) andNature Genetics (22) Each of these bioassay records islinked to the corresponding abstract in PubMedallowing PubChem users to track down the publicationeasily Vice versa users of PubMed also gain accessto the corresponding bioassay datasets through thiscross-linkPubChem continues to mirror the ChEMBL database
(8) hosted at the European Bioinformatics InstituteMultiple ChEMBL releases and database changes overthe past 2 years have been incorporated into PubChemRecently added annotations at ChEMBL are recorded viathe Categorized Comment field of the PubChem BioAssaydata model (1) Binding surface ligand and lipophilicligand efficiency indices are added to a bioassay recordas additional test results As a result many of thebioassay records in PubChem have gone throughmultiple updates Annotation for bioactivity outcome(eg active or inactive) is largely missing in theChEMBL datasets hindering their integration with therest of PubChem data and analysis tools In such a casePubChem now assigns bioactivity outcome using a50 mM cutoff based on readouts such as IC50 EC50or Ki allowing a larger portion of the ChEMBL datablended in the PubChem systemF
igure
1Growth
inPubChem
BioAssay(A
)Records
(B)bioactivityoutcomes
(countedbyAID
ndashSID
pair)and(C
)uniquetested
samples
Nucleic Acids Research 2013 3
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
DATABASE INFRASTRUCTURE ENHANCEMENT
A robust and scalable database system is crucial tosupport the rapid growth of PubChem BioAssay A setof relational databases and tables is designed and set upon Microsoft SQL servers to (i) accept bioassay submis-sion from depositors (ii) archive bioassay update withversion control (iii) track embargo status (iv) recordand derive links and relationships among bioassays andother biomedical information (v) provide search indexes(vi) support fast data retrieval and analysis and (vii) facili-tate daily update at the FTP site Challenged by theaccelerated growth of bioassay data content greatefforts have been invested in the past years to enhancethe database infrastructure capacity by both hardwareupgrade and revised database design As a result newservices have been added to the PubChem resourceFurthermore performance in bioassay data retrieval anddownload services have been significantly improvedthereby significantly eliminating a queuing system tominimize the user wait time
DATA INTEGRATION AND NEW WEB SERVICES
The PubChem BioAssay database is fully integrated withother biomedical databases hosted by NCBI and providesa suite of web-based and programmatic tools to supportdata access retrieval analysis and download fromPubChem or cross-linked databases (Table 1) Severalnew services for integrating bioassay target and bioactivitydata or grouping bioassays based on an assay project aredescribed later Other developments that have focused onbehind-the-scene enhancement of data retrieval withoutsignificant web interface change will not be summarizedin this work
Rapid access of bioactivity data for a protein orgene target
PubChem BioAssay closes the gap between molecular andchemical biology research by presenting and linking up in-formation of both chemical and RNAi tools in one systemsupporting the study of gene function and biologicalpathways The majority of small molecule screening datain PubChem are associated with protein targets whileRNAi screening data links each tested reagent to a genePubChem provides multiple mechanisms for cross-referencing protein and gene targets from bioactivitydata (1) As a result a protein or gene may link to manybioactivity datasets It is critical to provide rapid access tosuch multi-assay bioactivity data for these protein and genetargets Such a service provides a unique annotation serviceto the corresponding Entrez Protein or Gene record whichleads users to experimental data from chemical biology andRNAi research enhancing the discoverability of the NCBIEntrez system Toward this end two new services theProtein Target Bioactivity Data Tool and the Gene TargetBioactivity Data Tool were developed respectively toaccess associated bioactivity information in PubChemFrom a protein target record such as G-protein-
coupled receptor (GPCR) 35 (httpwwwncbinlmnihgovproteinNP_0052922) bioactivity data for this
protein target can be accessed by the link lsquoBioAssay byTarget (Summary)rsquo As shown in Figure 2A this ProteinTarget Bioactivity Data Tool draws and identifies eachtested substance together with its bioactivity resultsassay title and a link to detailed data such as dose-response curves The data table is sorted by bioactivityoutcome and potency of the substances by defaultshowing first active data and potent reagents Graphicalfilters are provided at the top of the page allowing one todrill down to a data subset of onersquos interest For examplethis GPCR protein has a lsquoProbersquo filter highlighting threechemical probes discovered by a high-throughputscreening (HTS) project for selective GPR35 antagonists
The bioactivity data for the relevant gene target record(httpwwwncbinlmnihgovgene2859) can be accessedby the link lsquoBioAssay by Target (Summary)rsquo With thisGene Target Bioactivity Data Tool a similar summaryof relevant bioassay activity results is displayed asshown in Figure 2B Note that using a gene identifier inthis case additional data are retrieved including RNAitest results (as indicated with the filter lsquoRNAirsquo shownunder lsquoSubstance Typesrsquo) which indicates that GPR35functions as a cellular gene repressing HPV18 LCR asidentified by a genome-wide siRNA screen This exampleillustrates the power of aggregating bioactivity data acrossdatasets onto a unified display The Gene TargetBioactivity Data Tool is particularly useful for accessingdatasets from multiple depositors and literature-baseddata from many journal articles Moreover it links simul-taneously to findings in chemical biology research andRNAi screenings enabling users to evaluate the biologicalrole of a gene and to identify its small molecular regula-tors using data shown on the same display
BioAssays associated with the same assay project
PubChem tracks the relationships among bioassay recordsas indicated by submitters PubChem has also developedseveral computational methods for identifying additionalbioassay linkages based on target sequence similaritycommon active compounds and biological pathways aswell as datasets abstracted from the same publication(1) To better support decision making PubChem nowclusters and links up bioassays based on assay projectsThis feature aims to use data deposited by a network suchas the NIH MLP and the Tox21 program MLP-fundedscreening laboratories are required to deposit data pro-gressively into PubChem as an assay project continuesIt usually takes months or years to finish an assayproject aimed at developing chemical probe hence oftenmultiple bioassay datasets are submitted to PubChem forthe same project but under distinct accessions (AIDs)These datasets are highly relevant often covering aprimary HTS result follow-ups with dose-response andtoxicity testing or counter screenings against biologicallyrelated targets different cell lines or using different assaymethods PubChem allows submitters to specify such re-lationships via the cross-reference (XRef) data field Onthe other hand it is up to the submitters to provide alllinks as new data are made available As a result cross-references to related bioassay datasets unfortunately may
4 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
Figure 2 Bioactivity data for a (A) protein target and (B) gene target
Nucleic Acids Research 2013 5
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
be lacking or incomplete among many datasets making itdifficult for users to discover these key associationsTo improve this situation it is now a common practice to
create a lsquoSummaryrsquo bioassay at the outset of a multi-assayproject and then link each subsequent-related assay back tothat summary record This means that the submitter onlyneeds to specify a single link for each bioassay record to thesame summary and all other links between related assaysare automatically generated As a result assay projects areindexed on top of the individual records Users visiting anybioassay record can access all relevant datasets of the sameproject without the need for the submitter to specify allconnections As shown in Figure 3 the links to theserelated bioassays are labeled in the BioAssay Summaryservice as lsquoSame Projectrsquo under the lsquoRelated BioAssaysrsquosection The Modulation of the Metabotropic GlutamateReceptor mGluR3 (GRM3) assay (httppubchemncbinlmnihgovassayassaycgiaid=651839) indicates onlyone lsquoDepositor Specifiedrsquo assay whereas eight bioassayrecords were identified as related to the same project bythe new procedure One may see details of the related bio-assays by clicking the link lsquoSame Projectrsquo
PUBLIC ACCESS
BioAssay record and BioAssay summary service
A PubChem BioAssay record can be accessed via theBioAssay Summary service at httppubchemncbinlmnihgovassayassaycgi where myAID is a validBioAssay accession (AID) As shown in Figure 3 for theGRM3 assay (AID 651839) the BioAssay Summaryservice provides (i) full access to submitted informationincluding bioassay protocol descriptions assay dataand cross-references (ii) derived bioassay relationshipsand (iii) tools for evaluating tested compounds studyingSAR or researching target For the lsquoTargetrsquo section alink lsquoMore Bioactivity datarsquo has been recently addedto gather all bioactivity data in PubChem associatedwith the GRM3 target The BioAssay Summary servicenow provides instant access to bioassay data table andenhanced function for data download with improveddatabase infrastructure With the recently launchedPubChem Social Media outreach links to social mediaaccounts are now provided on this page
Figure 3 BioAssay Summary page for bioassay record AID 651839 New and enhanced features are highlighted including fast download instantaccess to data table link to additional bioactivity data targeting GRM3 link to related bioassays on the same project and links to social mediaaccount
6 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
BioAssay search
Keyword search in the PubChem BioAssay database issupported by NCBI Entrez at httpwwwncbinlmnihgovpcassay Textual information in PubChemBioAssay is indexed under numerous fields Anadvanced interface is provided at httpwwwncbinlmnihgovpcassaylimits (Limits page) to access multipleindices and filters (1) Based on information provided incategorized comment fields and keywords in the title of abioassay record new filters were added to support theidentification of records containing (i) biochemical assay(ii) cell-based assay (iii) proteinndashprotein interaction bio-activity and (iv) in vivo or in vitro assay A newly addedmenu lsquoAssay Projectrsquo can be used to select an assay projectand accessing related datasets ChEMBL depositor infor-mation is also indexed to support sub-setting ChEMBLrecords As a result although httpwwwncbinlmnihgovpcassayterm=ChEMBL[sourcename] retrieves allChEMBL bioassays in PubChem httpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3AScientific+Literature225BSourceName5D[SourceName] re-trieves literature-based records from ChEMBL andhttpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3ASt+Jude+Malaria+Screening225BSourceName5D[SourceName] retrieves ChEMBL records de-posited by St Jude Malaria Screening
PubChem BioAssay FTP AND DOWNLOAD
PubChem provides multiple services for users todownload bioassay records which have been describedpreviously (1) This primarily includes (i) an enhanceddownload function at the Summary service (shown inFigure 3) (ii) a web-based BioAssay download serviceat httppubchemncbinlmnihgovassayassaydownloadcgi with a flexible interface supporting full or partial datadownload by specifying bioassay accessions (AIDs) andtested substance accessions (SIDs) and (iii) daily updatedPubChem BioAssay FTP at ftpftpncbinlmnihgovpubchemBioassay providing open access to all bioassaydatasets While the primary FTP structure remains thesame one new FTP directory lsquoExtrasrsquo is added to offeradditional information of the BioAssay resource In thisfolder the file lsquoCid2BioactivityLinkrsquo provides a list oftested compounds and the corresponding URLs linkingto associated bioactivity data Similarly thelsquoGi2BioactivityLinkrsquo and lsquoGeneid2BioactivityLinkrsquo filesprovide the list of the corresponding bioactivity datalinks for protein and gene targets respectively ThelsquoAid2GiGeneidrsquo contains all the bioassay (AID) pro-tein target (GI) and gene target (Gene ID) associationsin the BioAssay database Also a file for assayproject-based related bioassays is added to the directoryat ftpftpncbinlmnihgovpubchemBioassayAssayNeighbors Column headers for the comma-separatedvalues (CSV) format has been modified to provide con-sistency among multiple download methods (ftpftpncbinlmnihgovpubchemBioassayCSVREADME)Readout names are now provided in CSV files to ease dataparsing and interpretation In addition PubChem PUG
SOAP (httppubchemncbinlmnihgovpugpughelphtml) and PUGREST (httppubchemncbinlmnihgovpug_restPUG_RESThtml) facilities are being de-veloped to support programmatic retrieval of bioassayinformation
PubChem UPLOAD FOR BioAssay SUBMISSION
As a public repository handling diverse and vast amountsof chemical structure and bioassay data it is critical forPubChem to provide an efficient and user-friendly way toupload data The recently released PubChem Upload(httppubchemncbinlmnihgovupload) makes use ofadvances in web technologies to offer streamlinedsupport for data submissions and updates to theSubstance and BioAssay databases PubChem Uploadsupports all functionalities and data exchange formats ofits predecessor (1) Furthermore it provides an extensiveset of wizards inline help tips and tutorials for guidingsubmitters to enter assay data and descriptive informa-tion More specifically the new assay submissioncapabilities offered by PubChem Upload include (i)bioassay submission wizards to assist novice users forboth small molecule and RNAi screenings (ii) improveduser interface response to complex input with newer webtechnology (iii) simplified new user registration upgradesfor production user accounts (iv) improved helpincluding hints built into user interface and tutorial (v)extensive PubChem bioassay templates for new submis-sions or for record updates (vi) full editing and integra-tion of assay data and description tables and (vii)expanded importexport handling of spreadsheets forassays A detailed help document tutorial and samplesubmission templates for PubChem Upload are availableat httppubchemncbinlmnihgovuploaddocsupload_helphtml httppubchemncbinlmnihgovuploadtutorial and httppubchemncbinlmnihgovuploaddocsupload_helphtmlAssaySubmission respectively Adetailed description of PubChem Upload will be providedin a separate article
SUMMARY
PubChem is committed to serve as a public repository forbioactivity data of small molecules and RNAi PubChemalso provides an integrated information platform with asuite of tools allowing users to query analyze anddownload all database content PubChem will continueto improve services and tools as technology advancesand to further integrate the information it contains tothird party annotations and other public biomedicaldata With the support of open access to the data andthe delivery of the new Upload system PubChemwelcomes the community to use the resource and to con-tribute data content to the repository
ACKNOWLEDGEMENTS
The authors thank all submitters who have contributeddata to PubChem and the rest of the PubChem team fortheir support
Nucleic Acids Research 2013 7
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
FUNDING
The NIH Intramural Research program Funding foropen access charge National Insitutes of Health USA
Conflict of interest statement None declared
REFERENCES
1 WangY XiaoJ SuzekTO ZhangJ WangJ ZhouZHanL KarapetyanK DrachevaS ShoemakerBA et al(2012) PubChemrsquos BioAssay database Nucleic Acids Res 40D400ndashD412
2 WangY BoltonE DrachevaS KarapetyanKShoemakerBA SuzekTO WangJ XiaoJ ZhangJ andBryantSH (2010) An overview of the PubChem BioAssayresource Nucleic Acids Res 38 D255ndashD266
3 WangY XiaoJ SuzekTO ZhangJ WangJ and BryantSH(2009) PubChem a public information system for analyzingbioactivities of small molecules Nucleic Acids Res 37W623ndashW633
4 BoltonEE WangY ThiessenPA and BryantSH (2008)PubChem integrated platform of small molecules and biologicalactivities Annu Rep Comput Chem 4 217ndash241
5 SayersEW BarrettT BensonDA BoltonE BryantSHCaneseK ChetverninV ChurchDM DiCuccioMFederhenS et al (2011) Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 39D38ndashD51
6 ButkiewiczM LoweEW Jr MuellerR MendenhallJLTeixeiraPL WeaverCD and MeilerJ (2013) Benchmarkingligand-based virtual high-throughput screening with the PubChemdatabase Molecules 18 735ndash756
7 SharmanJL BensonHE PawsonAJ LukitoVMpamhangaCP BombailV DavenportAP PetersJASpeddingM and HarmarAJ (2013) IUPHAR-DB updateddatabase content and new features Nucleic Acids Res 41D1083ndashD1088
8 GaultonA BellisLJ BentoAP ChambersJ DaviesMHerseyA LightY McGlincheyS MichalovichD Al-LazikaniB et al (2012) ChEMBL a large-scale bioactivitydatabase for drug discovery Nucleic Acids Res 40D1100ndashD1107
9 MulderKW WangX EscriuC ItoY SchwarzRF GillisJSirokmanyG DonatiG Uribe-LewisS PavlidisP et al (2012)Diverse epigenetic strategies interact to control epidermaldifferentiation Nat Cell Biol 14 753ndash763
10 ChihB LiuP ChinnY ChalouniC KomuvesLG HassPESandovalW and PetersonAS (2012) A ciliopathy complex atthe transition zone protects the cilia as a privileged membranedomain Nat Cell Biol 14 61ndash72
11 Prager-KhoutorskyM LichtensteinA KrishnanRRajendranK MayoA KamZ GeigerB and BershadskyAD(2011) Fibroblast polarization is a matrix-rigidity-dependentprocess controlled by focal adhesion mechanosensing Nat CellBiol 13 1457ndash1465
12 Imberg-KazdanK HaS GreenfieldA PoultneyCSBonneauR LoganSK and GarabedianMJ (2013) A genome-wide RNA interference screen identifies new regulators ofandrogen receptor function in prostate cancer cells Genome Res23 581ndash591
13 PowellML SmithJA SowaME HarperJW IftnerTStubenrauchF and HowleyPM (2010) NCoR1 mediatespapillomavirus E8E2C transcriptional repression J Virol 844451ndash4460
14 GalluzziL MorselliE VitaleI KeppO SenovillaLCriolloA ServantN PaccardC HupeP RobertT et al(2010) miR-181a and miR-630 regulate cisplatin-induced cancercell death Cancer Res 70 1793ndash1803
15 SmithJA WhiteEA SowaME PowellML OttingerMHarperJW and HowleyPM (2010) Genome-wide siRNA screenidentifies SMCX EP400 and Brd4 as E2-dependent regulators ofhuman papillomavirus oncogene expression Proc Natl Acad SciUSA 107 3752ndash3757
16 ZhangSL YerominAV ZhangXH YuY SafrinaOPennaA RoosJ StaudermanKA and CahalanMD (2006)Genome-wide RNAi screen of Ca(2+) influx identifies genes thatregulate Ca(2+) release-activated Ca(2+) channel activity ProcNatl Acad Sci USA 103 9357ndash9362
17 FriedmanA and PerrimonN (2006) A functional RNAi screenfor regulators of receptor tyrosine kinase and ERK signallingNature 444 230ndash234
18 GwackY SharmaS NardoneJ TanasaB IugaASrikanthS OkamuraH BoltonD FeskeS HoganPG et al(2006) A genome-wide Drosophila RNAi screen identifies DYRK-family kinases as regulators of NFAT Nature 441 646ndash650
19 BardF CasanoL MallabiabarrenaA WallaceE SaitoKKitayamaH GuizzuntiG HuY WendlerF DasguptaRet al (2006) Functional genomics reveals genes involved inprotein secretion and Golgi organization Nature 439 604ndash607
20 VigM PeineltC BeckA KoomoaDL RabahD Koblan-HubersonM KraftS TurnerH FleigA PennerR et al(2006) CRACM1 is a plasma membrane protein essential forstore-operated Ca2+ entry Science 312 1220ndash1223
21 DasGuptaR KaykasA MoonRT and PerrimonN (2005)Functional genomic analysis of the Wnt-wingless signalingpathway Science 308 826ndash833
22 NybakkenK VokesSA LinTY McMahonAP andPerrimonN (2005) A genome-wide RNA interference screen inDrosophila melanogaster cells for new components of the Hhsignaling pathway Nat Genet 37 1323ndash1332
8 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
DATABASE INFRASTRUCTURE ENHANCEMENT
A robust and scalable database system is crucial tosupport the rapid growth of PubChem BioAssay A setof relational databases and tables is designed and set upon Microsoft SQL servers to (i) accept bioassay submis-sion from depositors (ii) archive bioassay update withversion control (iii) track embargo status (iv) recordand derive links and relationships among bioassays andother biomedical information (v) provide search indexes(vi) support fast data retrieval and analysis and (vii) facili-tate daily update at the FTP site Challenged by theaccelerated growth of bioassay data content greatefforts have been invested in the past years to enhancethe database infrastructure capacity by both hardwareupgrade and revised database design As a result newservices have been added to the PubChem resourceFurthermore performance in bioassay data retrieval anddownload services have been significantly improvedthereby significantly eliminating a queuing system tominimize the user wait time
DATA INTEGRATION AND NEW WEB SERVICES
The PubChem BioAssay database is fully integrated withother biomedical databases hosted by NCBI and providesa suite of web-based and programmatic tools to supportdata access retrieval analysis and download fromPubChem or cross-linked databases (Table 1) Severalnew services for integrating bioassay target and bioactivitydata or grouping bioassays based on an assay project aredescribed later Other developments that have focused onbehind-the-scene enhancement of data retrieval withoutsignificant web interface change will not be summarizedin this work
Rapid access of bioactivity data for a protein orgene target
PubChem BioAssay closes the gap between molecular andchemical biology research by presenting and linking up in-formation of both chemical and RNAi tools in one systemsupporting the study of gene function and biologicalpathways The majority of small molecule screening datain PubChem are associated with protein targets whileRNAi screening data links each tested reagent to a genePubChem provides multiple mechanisms for cross-referencing protein and gene targets from bioactivitydata (1) As a result a protein or gene may link to manybioactivity datasets It is critical to provide rapid access tosuch multi-assay bioactivity data for these protein and genetargets Such a service provides a unique annotation serviceto the corresponding Entrez Protein or Gene record whichleads users to experimental data from chemical biology andRNAi research enhancing the discoverability of the NCBIEntrez system Toward this end two new services theProtein Target Bioactivity Data Tool and the Gene TargetBioactivity Data Tool were developed respectively toaccess associated bioactivity information in PubChemFrom a protein target record such as G-protein-
coupled receptor (GPCR) 35 (httpwwwncbinlmnihgovproteinNP_0052922) bioactivity data for this
protein target can be accessed by the link lsquoBioAssay byTarget (Summary)rsquo As shown in Figure 2A this ProteinTarget Bioactivity Data Tool draws and identifies eachtested substance together with its bioactivity resultsassay title and a link to detailed data such as dose-response curves The data table is sorted by bioactivityoutcome and potency of the substances by defaultshowing first active data and potent reagents Graphicalfilters are provided at the top of the page allowing one todrill down to a data subset of onersquos interest For examplethis GPCR protein has a lsquoProbersquo filter highlighting threechemical probes discovered by a high-throughputscreening (HTS) project for selective GPR35 antagonists
The bioactivity data for the relevant gene target record(httpwwwncbinlmnihgovgene2859) can be accessedby the link lsquoBioAssay by Target (Summary)rsquo With thisGene Target Bioactivity Data Tool a similar summaryof relevant bioassay activity results is displayed asshown in Figure 2B Note that using a gene identifier inthis case additional data are retrieved including RNAitest results (as indicated with the filter lsquoRNAirsquo shownunder lsquoSubstance Typesrsquo) which indicates that GPR35functions as a cellular gene repressing HPV18 LCR asidentified by a genome-wide siRNA screen This exampleillustrates the power of aggregating bioactivity data acrossdatasets onto a unified display The Gene TargetBioactivity Data Tool is particularly useful for accessingdatasets from multiple depositors and literature-baseddata from many journal articles Moreover it links simul-taneously to findings in chemical biology research andRNAi screenings enabling users to evaluate the biologicalrole of a gene and to identify its small molecular regula-tors using data shown on the same display
BioAssays associated with the same assay project
PubChem tracks the relationships among bioassay recordsas indicated by submitters PubChem has also developedseveral computational methods for identifying additionalbioassay linkages based on target sequence similaritycommon active compounds and biological pathways aswell as datasets abstracted from the same publication(1) To better support decision making PubChem nowclusters and links up bioassays based on assay projectsThis feature aims to use data deposited by a network suchas the NIH MLP and the Tox21 program MLP-fundedscreening laboratories are required to deposit data pro-gressively into PubChem as an assay project continuesIt usually takes months or years to finish an assayproject aimed at developing chemical probe hence oftenmultiple bioassay datasets are submitted to PubChem forthe same project but under distinct accessions (AIDs)These datasets are highly relevant often covering aprimary HTS result follow-ups with dose-response andtoxicity testing or counter screenings against biologicallyrelated targets different cell lines or using different assaymethods PubChem allows submitters to specify such re-lationships via the cross-reference (XRef) data field Onthe other hand it is up to the submitters to provide alllinks as new data are made available As a result cross-references to related bioassay datasets unfortunately may
4 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
Figure 2 Bioactivity data for a (A) protein target and (B) gene target
Nucleic Acids Research 2013 5
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
be lacking or incomplete among many datasets making itdifficult for users to discover these key associationsTo improve this situation it is now a common practice to
create a lsquoSummaryrsquo bioassay at the outset of a multi-assayproject and then link each subsequent-related assay back tothat summary record This means that the submitter onlyneeds to specify a single link for each bioassay record to thesame summary and all other links between related assaysare automatically generated As a result assay projects areindexed on top of the individual records Users visiting anybioassay record can access all relevant datasets of the sameproject without the need for the submitter to specify allconnections As shown in Figure 3 the links to theserelated bioassays are labeled in the BioAssay Summaryservice as lsquoSame Projectrsquo under the lsquoRelated BioAssaysrsquosection The Modulation of the Metabotropic GlutamateReceptor mGluR3 (GRM3) assay (httppubchemncbinlmnihgovassayassaycgiaid=651839) indicates onlyone lsquoDepositor Specifiedrsquo assay whereas eight bioassayrecords were identified as related to the same project bythe new procedure One may see details of the related bio-assays by clicking the link lsquoSame Projectrsquo
PUBLIC ACCESS
BioAssay record and BioAssay summary service
A PubChem BioAssay record can be accessed via theBioAssay Summary service at httppubchemncbinlmnihgovassayassaycgi where myAID is a validBioAssay accession (AID) As shown in Figure 3 for theGRM3 assay (AID 651839) the BioAssay Summaryservice provides (i) full access to submitted informationincluding bioassay protocol descriptions assay dataand cross-references (ii) derived bioassay relationshipsand (iii) tools for evaluating tested compounds studyingSAR or researching target For the lsquoTargetrsquo section alink lsquoMore Bioactivity datarsquo has been recently addedto gather all bioactivity data in PubChem associatedwith the GRM3 target The BioAssay Summary servicenow provides instant access to bioassay data table andenhanced function for data download with improveddatabase infrastructure With the recently launchedPubChem Social Media outreach links to social mediaaccounts are now provided on this page
Figure 3 BioAssay Summary page for bioassay record AID 651839 New and enhanced features are highlighted including fast download instantaccess to data table link to additional bioactivity data targeting GRM3 link to related bioassays on the same project and links to social mediaaccount
6 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
BioAssay search
Keyword search in the PubChem BioAssay database issupported by NCBI Entrez at httpwwwncbinlmnihgovpcassay Textual information in PubChemBioAssay is indexed under numerous fields Anadvanced interface is provided at httpwwwncbinlmnihgovpcassaylimits (Limits page) to access multipleindices and filters (1) Based on information provided incategorized comment fields and keywords in the title of abioassay record new filters were added to support theidentification of records containing (i) biochemical assay(ii) cell-based assay (iii) proteinndashprotein interaction bio-activity and (iv) in vivo or in vitro assay A newly addedmenu lsquoAssay Projectrsquo can be used to select an assay projectand accessing related datasets ChEMBL depositor infor-mation is also indexed to support sub-setting ChEMBLrecords As a result although httpwwwncbinlmnihgovpcassayterm=ChEMBL[sourcename] retrieves allChEMBL bioassays in PubChem httpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3AScientific+Literature225BSourceName5D[SourceName] re-trieves literature-based records from ChEMBL andhttpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3ASt+Jude+Malaria+Screening225BSourceName5D[SourceName] retrieves ChEMBL records de-posited by St Jude Malaria Screening
PubChem BioAssay FTP AND DOWNLOAD
PubChem provides multiple services for users todownload bioassay records which have been describedpreviously (1) This primarily includes (i) an enhanceddownload function at the Summary service (shown inFigure 3) (ii) a web-based BioAssay download serviceat httppubchemncbinlmnihgovassayassaydownloadcgi with a flexible interface supporting full or partial datadownload by specifying bioassay accessions (AIDs) andtested substance accessions (SIDs) and (iii) daily updatedPubChem BioAssay FTP at ftpftpncbinlmnihgovpubchemBioassay providing open access to all bioassaydatasets While the primary FTP structure remains thesame one new FTP directory lsquoExtrasrsquo is added to offeradditional information of the BioAssay resource In thisfolder the file lsquoCid2BioactivityLinkrsquo provides a list oftested compounds and the corresponding URLs linkingto associated bioactivity data Similarly thelsquoGi2BioactivityLinkrsquo and lsquoGeneid2BioactivityLinkrsquo filesprovide the list of the corresponding bioactivity datalinks for protein and gene targets respectively ThelsquoAid2GiGeneidrsquo contains all the bioassay (AID) pro-tein target (GI) and gene target (Gene ID) associationsin the BioAssay database Also a file for assayproject-based related bioassays is added to the directoryat ftpftpncbinlmnihgovpubchemBioassayAssayNeighbors Column headers for the comma-separatedvalues (CSV) format has been modified to provide con-sistency among multiple download methods (ftpftpncbinlmnihgovpubchemBioassayCSVREADME)Readout names are now provided in CSV files to ease dataparsing and interpretation In addition PubChem PUG
SOAP (httppubchemncbinlmnihgovpugpughelphtml) and PUGREST (httppubchemncbinlmnihgovpug_restPUG_RESThtml) facilities are being de-veloped to support programmatic retrieval of bioassayinformation
PubChem UPLOAD FOR BioAssay SUBMISSION
As a public repository handling diverse and vast amountsof chemical structure and bioassay data it is critical forPubChem to provide an efficient and user-friendly way toupload data The recently released PubChem Upload(httppubchemncbinlmnihgovupload) makes use ofadvances in web technologies to offer streamlinedsupport for data submissions and updates to theSubstance and BioAssay databases PubChem Uploadsupports all functionalities and data exchange formats ofits predecessor (1) Furthermore it provides an extensiveset of wizards inline help tips and tutorials for guidingsubmitters to enter assay data and descriptive informa-tion More specifically the new assay submissioncapabilities offered by PubChem Upload include (i)bioassay submission wizards to assist novice users forboth small molecule and RNAi screenings (ii) improveduser interface response to complex input with newer webtechnology (iii) simplified new user registration upgradesfor production user accounts (iv) improved helpincluding hints built into user interface and tutorial (v)extensive PubChem bioassay templates for new submis-sions or for record updates (vi) full editing and integra-tion of assay data and description tables and (vii)expanded importexport handling of spreadsheets forassays A detailed help document tutorial and samplesubmission templates for PubChem Upload are availableat httppubchemncbinlmnihgovuploaddocsupload_helphtml httppubchemncbinlmnihgovuploadtutorial and httppubchemncbinlmnihgovuploaddocsupload_helphtmlAssaySubmission respectively Adetailed description of PubChem Upload will be providedin a separate article
SUMMARY
PubChem is committed to serve as a public repository forbioactivity data of small molecules and RNAi PubChemalso provides an integrated information platform with asuite of tools allowing users to query analyze anddownload all database content PubChem will continueto improve services and tools as technology advancesand to further integrate the information it contains tothird party annotations and other public biomedicaldata With the support of open access to the data andthe delivery of the new Upload system PubChemwelcomes the community to use the resource and to con-tribute data content to the repository
ACKNOWLEDGEMENTS
The authors thank all submitters who have contributeddata to PubChem and the rest of the PubChem team fortheir support
Nucleic Acids Research 2013 7
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
FUNDING
The NIH Intramural Research program Funding foropen access charge National Insitutes of Health USA
Conflict of interest statement None declared
REFERENCES
1 WangY XiaoJ SuzekTO ZhangJ WangJ ZhouZHanL KarapetyanK DrachevaS ShoemakerBA et al(2012) PubChemrsquos BioAssay database Nucleic Acids Res 40D400ndashD412
2 WangY BoltonE DrachevaS KarapetyanKShoemakerBA SuzekTO WangJ XiaoJ ZhangJ andBryantSH (2010) An overview of the PubChem BioAssayresource Nucleic Acids Res 38 D255ndashD266
3 WangY XiaoJ SuzekTO ZhangJ WangJ and BryantSH(2009) PubChem a public information system for analyzingbioactivities of small molecules Nucleic Acids Res 37W623ndashW633
4 BoltonEE WangY ThiessenPA and BryantSH (2008)PubChem integrated platform of small molecules and biologicalactivities Annu Rep Comput Chem 4 217ndash241
5 SayersEW BarrettT BensonDA BoltonE BryantSHCaneseK ChetverninV ChurchDM DiCuccioMFederhenS et al (2011) Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 39D38ndashD51
6 ButkiewiczM LoweEW Jr MuellerR MendenhallJLTeixeiraPL WeaverCD and MeilerJ (2013) Benchmarkingligand-based virtual high-throughput screening with the PubChemdatabase Molecules 18 735ndash756
7 SharmanJL BensonHE PawsonAJ LukitoVMpamhangaCP BombailV DavenportAP PetersJASpeddingM and HarmarAJ (2013) IUPHAR-DB updateddatabase content and new features Nucleic Acids Res 41D1083ndashD1088
8 GaultonA BellisLJ BentoAP ChambersJ DaviesMHerseyA LightY McGlincheyS MichalovichD Al-LazikaniB et al (2012) ChEMBL a large-scale bioactivitydatabase for drug discovery Nucleic Acids Res 40D1100ndashD1107
9 MulderKW WangX EscriuC ItoY SchwarzRF GillisJSirokmanyG DonatiG Uribe-LewisS PavlidisP et al (2012)Diverse epigenetic strategies interact to control epidermaldifferentiation Nat Cell Biol 14 753ndash763
10 ChihB LiuP ChinnY ChalouniC KomuvesLG HassPESandovalW and PetersonAS (2012) A ciliopathy complex atthe transition zone protects the cilia as a privileged membranedomain Nat Cell Biol 14 61ndash72
11 Prager-KhoutorskyM LichtensteinA KrishnanRRajendranK MayoA KamZ GeigerB and BershadskyAD(2011) Fibroblast polarization is a matrix-rigidity-dependentprocess controlled by focal adhesion mechanosensing Nat CellBiol 13 1457ndash1465
12 Imberg-KazdanK HaS GreenfieldA PoultneyCSBonneauR LoganSK and GarabedianMJ (2013) A genome-wide RNA interference screen identifies new regulators ofandrogen receptor function in prostate cancer cells Genome Res23 581ndash591
13 PowellML SmithJA SowaME HarperJW IftnerTStubenrauchF and HowleyPM (2010) NCoR1 mediatespapillomavirus E8E2C transcriptional repression J Virol 844451ndash4460
14 GalluzziL MorselliE VitaleI KeppO SenovillaLCriolloA ServantN PaccardC HupeP RobertT et al(2010) miR-181a and miR-630 regulate cisplatin-induced cancercell death Cancer Res 70 1793ndash1803
15 SmithJA WhiteEA SowaME PowellML OttingerMHarperJW and HowleyPM (2010) Genome-wide siRNA screenidentifies SMCX EP400 and Brd4 as E2-dependent regulators ofhuman papillomavirus oncogene expression Proc Natl Acad SciUSA 107 3752ndash3757
16 ZhangSL YerominAV ZhangXH YuY SafrinaOPennaA RoosJ StaudermanKA and CahalanMD (2006)Genome-wide RNAi screen of Ca(2+) influx identifies genes thatregulate Ca(2+) release-activated Ca(2+) channel activity ProcNatl Acad Sci USA 103 9357ndash9362
17 FriedmanA and PerrimonN (2006) A functional RNAi screenfor regulators of receptor tyrosine kinase and ERK signallingNature 444 230ndash234
18 GwackY SharmaS NardoneJ TanasaB IugaASrikanthS OkamuraH BoltonD FeskeS HoganPG et al(2006) A genome-wide Drosophila RNAi screen identifies DYRK-family kinases as regulators of NFAT Nature 441 646ndash650
19 BardF CasanoL MallabiabarrenaA WallaceE SaitoKKitayamaH GuizzuntiG HuY WendlerF DasguptaRet al (2006) Functional genomics reveals genes involved inprotein secretion and Golgi organization Nature 439 604ndash607
20 VigM PeineltC BeckA KoomoaDL RabahD Koblan-HubersonM KraftS TurnerH FleigA PennerR et al(2006) CRACM1 is a plasma membrane protein essential forstore-operated Ca2+ entry Science 312 1220ndash1223
21 DasGuptaR KaykasA MoonRT and PerrimonN (2005)Functional genomic analysis of the Wnt-wingless signalingpathway Science 308 826ndash833
22 NybakkenK VokesSA LinTY McMahonAP andPerrimonN (2005) A genome-wide RNA interference screen inDrosophila melanogaster cells for new components of the Hhsignaling pathway Nat Genet 37 1323ndash1332
8 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
Figure 2 Bioactivity data for a (A) protein target and (B) gene target
Nucleic Acids Research 2013 5
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
be lacking or incomplete among many datasets making itdifficult for users to discover these key associationsTo improve this situation it is now a common practice to
create a lsquoSummaryrsquo bioassay at the outset of a multi-assayproject and then link each subsequent-related assay back tothat summary record This means that the submitter onlyneeds to specify a single link for each bioassay record to thesame summary and all other links between related assaysare automatically generated As a result assay projects areindexed on top of the individual records Users visiting anybioassay record can access all relevant datasets of the sameproject without the need for the submitter to specify allconnections As shown in Figure 3 the links to theserelated bioassays are labeled in the BioAssay Summaryservice as lsquoSame Projectrsquo under the lsquoRelated BioAssaysrsquosection The Modulation of the Metabotropic GlutamateReceptor mGluR3 (GRM3) assay (httppubchemncbinlmnihgovassayassaycgiaid=651839) indicates onlyone lsquoDepositor Specifiedrsquo assay whereas eight bioassayrecords were identified as related to the same project bythe new procedure One may see details of the related bio-assays by clicking the link lsquoSame Projectrsquo
PUBLIC ACCESS
BioAssay record and BioAssay summary service
A PubChem BioAssay record can be accessed via theBioAssay Summary service at httppubchemncbinlmnihgovassayassaycgi where myAID is a validBioAssay accession (AID) As shown in Figure 3 for theGRM3 assay (AID 651839) the BioAssay Summaryservice provides (i) full access to submitted informationincluding bioassay protocol descriptions assay dataand cross-references (ii) derived bioassay relationshipsand (iii) tools for evaluating tested compounds studyingSAR or researching target For the lsquoTargetrsquo section alink lsquoMore Bioactivity datarsquo has been recently addedto gather all bioactivity data in PubChem associatedwith the GRM3 target The BioAssay Summary servicenow provides instant access to bioassay data table andenhanced function for data download with improveddatabase infrastructure With the recently launchedPubChem Social Media outreach links to social mediaaccounts are now provided on this page
Figure 3 BioAssay Summary page for bioassay record AID 651839 New and enhanced features are highlighted including fast download instantaccess to data table link to additional bioactivity data targeting GRM3 link to related bioassays on the same project and links to social mediaaccount
6 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
BioAssay search
Keyword search in the PubChem BioAssay database issupported by NCBI Entrez at httpwwwncbinlmnihgovpcassay Textual information in PubChemBioAssay is indexed under numerous fields Anadvanced interface is provided at httpwwwncbinlmnihgovpcassaylimits (Limits page) to access multipleindices and filters (1) Based on information provided incategorized comment fields and keywords in the title of abioassay record new filters were added to support theidentification of records containing (i) biochemical assay(ii) cell-based assay (iii) proteinndashprotein interaction bio-activity and (iv) in vivo or in vitro assay A newly addedmenu lsquoAssay Projectrsquo can be used to select an assay projectand accessing related datasets ChEMBL depositor infor-mation is also indexed to support sub-setting ChEMBLrecords As a result although httpwwwncbinlmnihgovpcassayterm=ChEMBL[sourcename] retrieves allChEMBL bioassays in PubChem httpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3AScientific+Literature225BSourceName5D[SourceName] re-trieves literature-based records from ChEMBL andhttpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3ASt+Jude+Malaria+Screening225BSourceName5D[SourceName] retrieves ChEMBL records de-posited by St Jude Malaria Screening
PubChem BioAssay FTP AND DOWNLOAD
PubChem provides multiple services for users todownload bioassay records which have been describedpreviously (1) This primarily includes (i) an enhanceddownload function at the Summary service (shown inFigure 3) (ii) a web-based BioAssay download serviceat httppubchemncbinlmnihgovassayassaydownloadcgi with a flexible interface supporting full or partial datadownload by specifying bioassay accessions (AIDs) andtested substance accessions (SIDs) and (iii) daily updatedPubChem BioAssay FTP at ftpftpncbinlmnihgovpubchemBioassay providing open access to all bioassaydatasets While the primary FTP structure remains thesame one new FTP directory lsquoExtrasrsquo is added to offeradditional information of the BioAssay resource In thisfolder the file lsquoCid2BioactivityLinkrsquo provides a list oftested compounds and the corresponding URLs linkingto associated bioactivity data Similarly thelsquoGi2BioactivityLinkrsquo and lsquoGeneid2BioactivityLinkrsquo filesprovide the list of the corresponding bioactivity datalinks for protein and gene targets respectively ThelsquoAid2GiGeneidrsquo contains all the bioassay (AID) pro-tein target (GI) and gene target (Gene ID) associationsin the BioAssay database Also a file for assayproject-based related bioassays is added to the directoryat ftpftpncbinlmnihgovpubchemBioassayAssayNeighbors Column headers for the comma-separatedvalues (CSV) format has been modified to provide con-sistency among multiple download methods (ftpftpncbinlmnihgovpubchemBioassayCSVREADME)Readout names are now provided in CSV files to ease dataparsing and interpretation In addition PubChem PUG
SOAP (httppubchemncbinlmnihgovpugpughelphtml) and PUGREST (httppubchemncbinlmnihgovpug_restPUG_RESThtml) facilities are being de-veloped to support programmatic retrieval of bioassayinformation
PubChem UPLOAD FOR BioAssay SUBMISSION
As a public repository handling diverse and vast amountsof chemical structure and bioassay data it is critical forPubChem to provide an efficient and user-friendly way toupload data The recently released PubChem Upload(httppubchemncbinlmnihgovupload) makes use ofadvances in web technologies to offer streamlinedsupport for data submissions and updates to theSubstance and BioAssay databases PubChem Uploadsupports all functionalities and data exchange formats ofits predecessor (1) Furthermore it provides an extensiveset of wizards inline help tips and tutorials for guidingsubmitters to enter assay data and descriptive informa-tion More specifically the new assay submissioncapabilities offered by PubChem Upload include (i)bioassay submission wizards to assist novice users forboth small molecule and RNAi screenings (ii) improveduser interface response to complex input with newer webtechnology (iii) simplified new user registration upgradesfor production user accounts (iv) improved helpincluding hints built into user interface and tutorial (v)extensive PubChem bioassay templates for new submis-sions or for record updates (vi) full editing and integra-tion of assay data and description tables and (vii)expanded importexport handling of spreadsheets forassays A detailed help document tutorial and samplesubmission templates for PubChem Upload are availableat httppubchemncbinlmnihgovuploaddocsupload_helphtml httppubchemncbinlmnihgovuploadtutorial and httppubchemncbinlmnihgovuploaddocsupload_helphtmlAssaySubmission respectively Adetailed description of PubChem Upload will be providedin a separate article
SUMMARY
PubChem is committed to serve as a public repository forbioactivity data of small molecules and RNAi PubChemalso provides an integrated information platform with asuite of tools allowing users to query analyze anddownload all database content PubChem will continueto improve services and tools as technology advancesand to further integrate the information it contains tothird party annotations and other public biomedicaldata With the support of open access to the data andthe delivery of the new Upload system PubChemwelcomes the community to use the resource and to con-tribute data content to the repository
ACKNOWLEDGEMENTS
The authors thank all submitters who have contributeddata to PubChem and the rest of the PubChem team fortheir support
Nucleic Acids Research 2013 7
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
FUNDING
The NIH Intramural Research program Funding foropen access charge National Insitutes of Health USA
Conflict of interest statement None declared
REFERENCES
1 WangY XiaoJ SuzekTO ZhangJ WangJ ZhouZHanL KarapetyanK DrachevaS ShoemakerBA et al(2012) PubChemrsquos BioAssay database Nucleic Acids Res 40D400ndashD412
2 WangY BoltonE DrachevaS KarapetyanKShoemakerBA SuzekTO WangJ XiaoJ ZhangJ andBryantSH (2010) An overview of the PubChem BioAssayresource Nucleic Acids Res 38 D255ndashD266
3 WangY XiaoJ SuzekTO ZhangJ WangJ and BryantSH(2009) PubChem a public information system for analyzingbioactivities of small molecules Nucleic Acids Res 37W623ndashW633
4 BoltonEE WangY ThiessenPA and BryantSH (2008)PubChem integrated platform of small molecules and biologicalactivities Annu Rep Comput Chem 4 217ndash241
5 SayersEW BarrettT BensonDA BoltonE BryantSHCaneseK ChetverninV ChurchDM DiCuccioMFederhenS et al (2011) Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 39D38ndashD51
6 ButkiewiczM LoweEW Jr MuellerR MendenhallJLTeixeiraPL WeaverCD and MeilerJ (2013) Benchmarkingligand-based virtual high-throughput screening with the PubChemdatabase Molecules 18 735ndash756
7 SharmanJL BensonHE PawsonAJ LukitoVMpamhangaCP BombailV DavenportAP PetersJASpeddingM and HarmarAJ (2013) IUPHAR-DB updateddatabase content and new features Nucleic Acids Res 41D1083ndashD1088
8 GaultonA BellisLJ BentoAP ChambersJ DaviesMHerseyA LightY McGlincheyS MichalovichD Al-LazikaniB et al (2012) ChEMBL a large-scale bioactivitydatabase for drug discovery Nucleic Acids Res 40D1100ndashD1107
9 MulderKW WangX EscriuC ItoY SchwarzRF GillisJSirokmanyG DonatiG Uribe-LewisS PavlidisP et al (2012)Diverse epigenetic strategies interact to control epidermaldifferentiation Nat Cell Biol 14 753ndash763
10 ChihB LiuP ChinnY ChalouniC KomuvesLG HassPESandovalW and PetersonAS (2012) A ciliopathy complex atthe transition zone protects the cilia as a privileged membranedomain Nat Cell Biol 14 61ndash72
11 Prager-KhoutorskyM LichtensteinA KrishnanRRajendranK MayoA KamZ GeigerB and BershadskyAD(2011) Fibroblast polarization is a matrix-rigidity-dependentprocess controlled by focal adhesion mechanosensing Nat CellBiol 13 1457ndash1465
12 Imberg-KazdanK HaS GreenfieldA PoultneyCSBonneauR LoganSK and GarabedianMJ (2013) A genome-wide RNA interference screen identifies new regulators ofandrogen receptor function in prostate cancer cells Genome Res23 581ndash591
13 PowellML SmithJA SowaME HarperJW IftnerTStubenrauchF and HowleyPM (2010) NCoR1 mediatespapillomavirus E8E2C transcriptional repression J Virol 844451ndash4460
14 GalluzziL MorselliE VitaleI KeppO SenovillaLCriolloA ServantN PaccardC HupeP RobertT et al(2010) miR-181a and miR-630 regulate cisplatin-induced cancercell death Cancer Res 70 1793ndash1803
15 SmithJA WhiteEA SowaME PowellML OttingerMHarperJW and HowleyPM (2010) Genome-wide siRNA screenidentifies SMCX EP400 and Brd4 as E2-dependent regulators ofhuman papillomavirus oncogene expression Proc Natl Acad SciUSA 107 3752ndash3757
16 ZhangSL YerominAV ZhangXH YuY SafrinaOPennaA RoosJ StaudermanKA and CahalanMD (2006)Genome-wide RNAi screen of Ca(2+) influx identifies genes thatregulate Ca(2+) release-activated Ca(2+) channel activity ProcNatl Acad Sci USA 103 9357ndash9362
17 FriedmanA and PerrimonN (2006) A functional RNAi screenfor regulators of receptor tyrosine kinase and ERK signallingNature 444 230ndash234
18 GwackY SharmaS NardoneJ TanasaB IugaASrikanthS OkamuraH BoltonD FeskeS HoganPG et al(2006) A genome-wide Drosophila RNAi screen identifies DYRK-family kinases as regulators of NFAT Nature 441 646ndash650
19 BardF CasanoL MallabiabarrenaA WallaceE SaitoKKitayamaH GuizzuntiG HuY WendlerF DasguptaRet al (2006) Functional genomics reveals genes involved inprotein secretion and Golgi organization Nature 439 604ndash607
20 VigM PeineltC BeckA KoomoaDL RabahD Koblan-HubersonM KraftS TurnerH FleigA PennerR et al(2006) CRACM1 is a plasma membrane protein essential forstore-operated Ca2+ entry Science 312 1220ndash1223
21 DasGuptaR KaykasA MoonRT and PerrimonN (2005)Functional genomic analysis of the Wnt-wingless signalingpathway Science 308 826ndash833
22 NybakkenK VokesSA LinTY McMahonAP andPerrimonN (2005) A genome-wide RNA interference screen inDrosophila melanogaster cells for new components of the Hhsignaling pathway Nat Genet 37 1323ndash1332
8 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
be lacking or incomplete among many datasets making itdifficult for users to discover these key associationsTo improve this situation it is now a common practice to
create a lsquoSummaryrsquo bioassay at the outset of a multi-assayproject and then link each subsequent-related assay back tothat summary record This means that the submitter onlyneeds to specify a single link for each bioassay record to thesame summary and all other links between related assaysare automatically generated As a result assay projects areindexed on top of the individual records Users visiting anybioassay record can access all relevant datasets of the sameproject without the need for the submitter to specify allconnections As shown in Figure 3 the links to theserelated bioassays are labeled in the BioAssay Summaryservice as lsquoSame Projectrsquo under the lsquoRelated BioAssaysrsquosection The Modulation of the Metabotropic GlutamateReceptor mGluR3 (GRM3) assay (httppubchemncbinlmnihgovassayassaycgiaid=651839) indicates onlyone lsquoDepositor Specifiedrsquo assay whereas eight bioassayrecords were identified as related to the same project bythe new procedure One may see details of the related bio-assays by clicking the link lsquoSame Projectrsquo
PUBLIC ACCESS
BioAssay record and BioAssay summary service
A PubChem BioAssay record can be accessed via theBioAssay Summary service at httppubchemncbinlmnihgovassayassaycgi where myAID is a validBioAssay accession (AID) As shown in Figure 3 for theGRM3 assay (AID 651839) the BioAssay Summaryservice provides (i) full access to submitted informationincluding bioassay protocol descriptions assay dataand cross-references (ii) derived bioassay relationshipsand (iii) tools for evaluating tested compounds studyingSAR or researching target For the lsquoTargetrsquo section alink lsquoMore Bioactivity datarsquo has been recently addedto gather all bioactivity data in PubChem associatedwith the GRM3 target The BioAssay Summary servicenow provides instant access to bioassay data table andenhanced function for data download with improveddatabase infrastructure With the recently launchedPubChem Social Media outreach links to social mediaaccounts are now provided on this page
Figure 3 BioAssay Summary page for bioassay record AID 651839 New and enhanced features are highlighted including fast download instantaccess to data table link to additional bioactivity data targeting GRM3 link to related bioassays on the same project and links to social mediaaccount
6 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
BioAssay search
Keyword search in the PubChem BioAssay database issupported by NCBI Entrez at httpwwwncbinlmnihgovpcassay Textual information in PubChemBioAssay is indexed under numerous fields Anadvanced interface is provided at httpwwwncbinlmnihgovpcassaylimits (Limits page) to access multipleindices and filters (1) Based on information provided incategorized comment fields and keywords in the title of abioassay record new filters were added to support theidentification of records containing (i) biochemical assay(ii) cell-based assay (iii) proteinndashprotein interaction bio-activity and (iv) in vivo or in vitro assay A newly addedmenu lsquoAssay Projectrsquo can be used to select an assay projectand accessing related datasets ChEMBL depositor infor-mation is also indexed to support sub-setting ChEMBLrecords As a result although httpwwwncbinlmnihgovpcassayterm=ChEMBL[sourcename] retrieves allChEMBL bioassays in PubChem httpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3AScientific+Literature225BSourceName5D[SourceName] re-trieves literature-based records from ChEMBL andhttpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3ASt+Jude+Malaria+Screening225BSourceName5D[SourceName] retrieves ChEMBL records de-posited by St Jude Malaria Screening
PubChem BioAssay FTP AND DOWNLOAD
PubChem provides multiple services for users todownload bioassay records which have been describedpreviously (1) This primarily includes (i) an enhanceddownload function at the Summary service (shown inFigure 3) (ii) a web-based BioAssay download serviceat httppubchemncbinlmnihgovassayassaydownloadcgi with a flexible interface supporting full or partial datadownload by specifying bioassay accessions (AIDs) andtested substance accessions (SIDs) and (iii) daily updatedPubChem BioAssay FTP at ftpftpncbinlmnihgovpubchemBioassay providing open access to all bioassaydatasets While the primary FTP structure remains thesame one new FTP directory lsquoExtrasrsquo is added to offeradditional information of the BioAssay resource In thisfolder the file lsquoCid2BioactivityLinkrsquo provides a list oftested compounds and the corresponding URLs linkingto associated bioactivity data Similarly thelsquoGi2BioactivityLinkrsquo and lsquoGeneid2BioactivityLinkrsquo filesprovide the list of the corresponding bioactivity datalinks for protein and gene targets respectively ThelsquoAid2GiGeneidrsquo contains all the bioassay (AID) pro-tein target (GI) and gene target (Gene ID) associationsin the BioAssay database Also a file for assayproject-based related bioassays is added to the directoryat ftpftpncbinlmnihgovpubchemBioassayAssayNeighbors Column headers for the comma-separatedvalues (CSV) format has been modified to provide con-sistency among multiple download methods (ftpftpncbinlmnihgovpubchemBioassayCSVREADME)Readout names are now provided in CSV files to ease dataparsing and interpretation In addition PubChem PUG
SOAP (httppubchemncbinlmnihgovpugpughelphtml) and PUGREST (httppubchemncbinlmnihgovpug_restPUG_RESThtml) facilities are being de-veloped to support programmatic retrieval of bioassayinformation
PubChem UPLOAD FOR BioAssay SUBMISSION
As a public repository handling diverse and vast amountsof chemical structure and bioassay data it is critical forPubChem to provide an efficient and user-friendly way toupload data The recently released PubChem Upload(httppubchemncbinlmnihgovupload) makes use ofadvances in web technologies to offer streamlinedsupport for data submissions and updates to theSubstance and BioAssay databases PubChem Uploadsupports all functionalities and data exchange formats ofits predecessor (1) Furthermore it provides an extensiveset of wizards inline help tips and tutorials for guidingsubmitters to enter assay data and descriptive informa-tion More specifically the new assay submissioncapabilities offered by PubChem Upload include (i)bioassay submission wizards to assist novice users forboth small molecule and RNAi screenings (ii) improveduser interface response to complex input with newer webtechnology (iii) simplified new user registration upgradesfor production user accounts (iv) improved helpincluding hints built into user interface and tutorial (v)extensive PubChem bioassay templates for new submis-sions or for record updates (vi) full editing and integra-tion of assay data and description tables and (vii)expanded importexport handling of spreadsheets forassays A detailed help document tutorial and samplesubmission templates for PubChem Upload are availableat httppubchemncbinlmnihgovuploaddocsupload_helphtml httppubchemncbinlmnihgovuploadtutorial and httppubchemncbinlmnihgovuploaddocsupload_helphtmlAssaySubmission respectively Adetailed description of PubChem Upload will be providedin a separate article
SUMMARY
PubChem is committed to serve as a public repository forbioactivity data of small molecules and RNAi PubChemalso provides an integrated information platform with asuite of tools allowing users to query analyze anddownload all database content PubChem will continueto improve services and tools as technology advancesand to further integrate the information it contains tothird party annotations and other public biomedicaldata With the support of open access to the data andthe delivery of the new Upload system PubChemwelcomes the community to use the resource and to con-tribute data content to the repository
ACKNOWLEDGEMENTS
The authors thank all submitters who have contributeddata to PubChem and the rest of the PubChem team fortheir support
Nucleic Acids Research 2013 7
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
FUNDING
The NIH Intramural Research program Funding foropen access charge National Insitutes of Health USA
Conflict of interest statement None declared
REFERENCES
1 WangY XiaoJ SuzekTO ZhangJ WangJ ZhouZHanL KarapetyanK DrachevaS ShoemakerBA et al(2012) PubChemrsquos BioAssay database Nucleic Acids Res 40D400ndashD412
2 WangY BoltonE DrachevaS KarapetyanKShoemakerBA SuzekTO WangJ XiaoJ ZhangJ andBryantSH (2010) An overview of the PubChem BioAssayresource Nucleic Acids Res 38 D255ndashD266
3 WangY XiaoJ SuzekTO ZhangJ WangJ and BryantSH(2009) PubChem a public information system for analyzingbioactivities of small molecules Nucleic Acids Res 37W623ndashW633
4 BoltonEE WangY ThiessenPA and BryantSH (2008)PubChem integrated platform of small molecules and biologicalactivities Annu Rep Comput Chem 4 217ndash241
5 SayersEW BarrettT BensonDA BoltonE BryantSHCaneseK ChetverninV ChurchDM DiCuccioMFederhenS et al (2011) Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 39D38ndashD51
6 ButkiewiczM LoweEW Jr MuellerR MendenhallJLTeixeiraPL WeaverCD and MeilerJ (2013) Benchmarkingligand-based virtual high-throughput screening with the PubChemdatabase Molecules 18 735ndash756
7 SharmanJL BensonHE PawsonAJ LukitoVMpamhangaCP BombailV DavenportAP PetersJASpeddingM and HarmarAJ (2013) IUPHAR-DB updateddatabase content and new features Nucleic Acids Res 41D1083ndashD1088
8 GaultonA BellisLJ BentoAP ChambersJ DaviesMHerseyA LightY McGlincheyS MichalovichD Al-LazikaniB et al (2012) ChEMBL a large-scale bioactivitydatabase for drug discovery Nucleic Acids Res 40D1100ndashD1107
9 MulderKW WangX EscriuC ItoY SchwarzRF GillisJSirokmanyG DonatiG Uribe-LewisS PavlidisP et al (2012)Diverse epigenetic strategies interact to control epidermaldifferentiation Nat Cell Biol 14 753ndash763
10 ChihB LiuP ChinnY ChalouniC KomuvesLG HassPESandovalW and PetersonAS (2012) A ciliopathy complex atthe transition zone protects the cilia as a privileged membranedomain Nat Cell Biol 14 61ndash72
11 Prager-KhoutorskyM LichtensteinA KrishnanRRajendranK MayoA KamZ GeigerB and BershadskyAD(2011) Fibroblast polarization is a matrix-rigidity-dependentprocess controlled by focal adhesion mechanosensing Nat CellBiol 13 1457ndash1465
12 Imberg-KazdanK HaS GreenfieldA PoultneyCSBonneauR LoganSK and GarabedianMJ (2013) A genome-wide RNA interference screen identifies new regulators ofandrogen receptor function in prostate cancer cells Genome Res23 581ndash591
13 PowellML SmithJA SowaME HarperJW IftnerTStubenrauchF and HowleyPM (2010) NCoR1 mediatespapillomavirus E8E2C transcriptional repression J Virol 844451ndash4460
14 GalluzziL MorselliE VitaleI KeppO SenovillaLCriolloA ServantN PaccardC HupeP RobertT et al(2010) miR-181a and miR-630 regulate cisplatin-induced cancercell death Cancer Res 70 1793ndash1803
15 SmithJA WhiteEA SowaME PowellML OttingerMHarperJW and HowleyPM (2010) Genome-wide siRNA screenidentifies SMCX EP400 and Brd4 as E2-dependent regulators ofhuman papillomavirus oncogene expression Proc Natl Acad SciUSA 107 3752ndash3757
16 ZhangSL YerominAV ZhangXH YuY SafrinaOPennaA RoosJ StaudermanKA and CahalanMD (2006)Genome-wide RNAi screen of Ca(2+) influx identifies genes thatregulate Ca(2+) release-activated Ca(2+) channel activity ProcNatl Acad Sci USA 103 9357ndash9362
17 FriedmanA and PerrimonN (2006) A functional RNAi screenfor regulators of receptor tyrosine kinase and ERK signallingNature 444 230ndash234
18 GwackY SharmaS NardoneJ TanasaB IugaASrikanthS OkamuraH BoltonD FeskeS HoganPG et al(2006) A genome-wide Drosophila RNAi screen identifies DYRK-family kinases as regulators of NFAT Nature 441 646ndash650
19 BardF CasanoL MallabiabarrenaA WallaceE SaitoKKitayamaH GuizzuntiG HuY WendlerF DasguptaRet al (2006) Functional genomics reveals genes involved inprotein secretion and Golgi organization Nature 439 604ndash607
20 VigM PeineltC BeckA KoomoaDL RabahD Koblan-HubersonM KraftS TurnerH FleigA PennerR et al(2006) CRACM1 is a plasma membrane protein essential forstore-operated Ca2+ entry Science 312 1220ndash1223
21 DasGuptaR KaykasA MoonRT and PerrimonN (2005)Functional genomic analysis of the Wnt-wingless signalingpathway Science 308 826ndash833
22 NybakkenK VokesSA LinTY McMahonAP andPerrimonN (2005) A genome-wide RNA interference screen inDrosophila melanogaster cells for new components of the Hhsignaling pathway Nat Genet 37 1323ndash1332
8 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
BioAssay search
Keyword search in the PubChem BioAssay database issupported by NCBI Entrez at httpwwwncbinlmnihgovpcassay Textual information in PubChemBioAssay is indexed under numerous fields Anadvanced interface is provided at httpwwwncbinlmnihgovpcassaylimits (Limits page) to access multipleindices and filters (1) Based on information provided incategorized comment fields and keywords in the title of abioassay record new filters were added to support theidentification of records containing (i) biochemical assay(ii) cell-based assay (iii) proteinndashprotein interaction bio-activity and (iv) in vivo or in vitro assay A newly addedmenu lsquoAssay Projectrsquo can be used to select an assay projectand accessing related datasets ChEMBL depositor infor-mation is also indexed to support sub-setting ChEMBLrecords As a result although httpwwwncbinlmnihgovpcassayterm=ChEMBL[sourcename] retrieves allChEMBL bioassays in PubChem httpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3AScientific+Literature225BSourceName5D[SourceName] re-trieves literature-based records from ChEMBL andhttpwwwncbinlmnihgovpcassayterm=22ChEMBL3A3ASt+Jude+Malaria+Screening225BSourceName5D[SourceName] retrieves ChEMBL records de-posited by St Jude Malaria Screening
PubChem BioAssay FTP AND DOWNLOAD
PubChem provides multiple services for users todownload bioassay records which have been describedpreviously (1) This primarily includes (i) an enhanceddownload function at the Summary service (shown inFigure 3) (ii) a web-based BioAssay download serviceat httppubchemncbinlmnihgovassayassaydownloadcgi with a flexible interface supporting full or partial datadownload by specifying bioassay accessions (AIDs) andtested substance accessions (SIDs) and (iii) daily updatedPubChem BioAssay FTP at ftpftpncbinlmnihgovpubchemBioassay providing open access to all bioassaydatasets While the primary FTP structure remains thesame one new FTP directory lsquoExtrasrsquo is added to offeradditional information of the BioAssay resource In thisfolder the file lsquoCid2BioactivityLinkrsquo provides a list oftested compounds and the corresponding URLs linkingto associated bioactivity data Similarly thelsquoGi2BioactivityLinkrsquo and lsquoGeneid2BioactivityLinkrsquo filesprovide the list of the corresponding bioactivity datalinks for protein and gene targets respectively ThelsquoAid2GiGeneidrsquo contains all the bioassay (AID) pro-tein target (GI) and gene target (Gene ID) associationsin the BioAssay database Also a file for assayproject-based related bioassays is added to the directoryat ftpftpncbinlmnihgovpubchemBioassayAssayNeighbors Column headers for the comma-separatedvalues (CSV) format has been modified to provide con-sistency among multiple download methods (ftpftpncbinlmnihgovpubchemBioassayCSVREADME)Readout names are now provided in CSV files to ease dataparsing and interpretation In addition PubChem PUG
SOAP (httppubchemncbinlmnihgovpugpughelphtml) and PUGREST (httppubchemncbinlmnihgovpug_restPUG_RESThtml) facilities are being de-veloped to support programmatic retrieval of bioassayinformation
PubChem UPLOAD FOR BioAssay SUBMISSION
As a public repository handling diverse and vast amountsof chemical structure and bioassay data it is critical forPubChem to provide an efficient and user-friendly way toupload data The recently released PubChem Upload(httppubchemncbinlmnihgovupload) makes use ofadvances in web technologies to offer streamlinedsupport for data submissions and updates to theSubstance and BioAssay databases PubChem Uploadsupports all functionalities and data exchange formats ofits predecessor (1) Furthermore it provides an extensiveset of wizards inline help tips and tutorials for guidingsubmitters to enter assay data and descriptive informa-tion More specifically the new assay submissioncapabilities offered by PubChem Upload include (i)bioassay submission wizards to assist novice users forboth small molecule and RNAi screenings (ii) improveduser interface response to complex input with newer webtechnology (iii) simplified new user registration upgradesfor production user accounts (iv) improved helpincluding hints built into user interface and tutorial (v)extensive PubChem bioassay templates for new submis-sions or for record updates (vi) full editing and integra-tion of assay data and description tables and (vii)expanded importexport handling of spreadsheets forassays A detailed help document tutorial and samplesubmission templates for PubChem Upload are availableat httppubchemncbinlmnihgovuploaddocsupload_helphtml httppubchemncbinlmnihgovuploadtutorial and httppubchemncbinlmnihgovuploaddocsupload_helphtmlAssaySubmission respectively Adetailed description of PubChem Upload will be providedin a separate article
SUMMARY
PubChem is committed to serve as a public repository forbioactivity data of small molecules and RNAi PubChemalso provides an integrated information platform with asuite of tools allowing users to query analyze anddownload all database content PubChem will continueto improve services and tools as technology advancesand to further integrate the information it contains tothird party annotations and other public biomedicaldata With the support of open access to the data andthe delivery of the new Upload system PubChemwelcomes the community to use the resource and to con-tribute data content to the repository
ACKNOWLEDGEMENTS
The authors thank all submitters who have contributeddata to PubChem and the rest of the PubChem team fortheir support
Nucleic Acids Research 2013 7
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
FUNDING
The NIH Intramural Research program Funding foropen access charge National Insitutes of Health USA
Conflict of interest statement None declared
REFERENCES
1 WangY XiaoJ SuzekTO ZhangJ WangJ ZhouZHanL KarapetyanK DrachevaS ShoemakerBA et al(2012) PubChemrsquos BioAssay database Nucleic Acids Res 40D400ndashD412
2 WangY BoltonE DrachevaS KarapetyanKShoemakerBA SuzekTO WangJ XiaoJ ZhangJ andBryantSH (2010) An overview of the PubChem BioAssayresource Nucleic Acids Res 38 D255ndashD266
3 WangY XiaoJ SuzekTO ZhangJ WangJ and BryantSH(2009) PubChem a public information system for analyzingbioactivities of small molecules Nucleic Acids Res 37W623ndashW633
4 BoltonEE WangY ThiessenPA and BryantSH (2008)PubChem integrated platform of small molecules and biologicalactivities Annu Rep Comput Chem 4 217ndash241
5 SayersEW BarrettT BensonDA BoltonE BryantSHCaneseK ChetverninV ChurchDM DiCuccioMFederhenS et al (2011) Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 39D38ndashD51
6 ButkiewiczM LoweEW Jr MuellerR MendenhallJLTeixeiraPL WeaverCD and MeilerJ (2013) Benchmarkingligand-based virtual high-throughput screening with the PubChemdatabase Molecules 18 735ndash756
7 SharmanJL BensonHE PawsonAJ LukitoVMpamhangaCP BombailV DavenportAP PetersJASpeddingM and HarmarAJ (2013) IUPHAR-DB updateddatabase content and new features Nucleic Acids Res 41D1083ndashD1088
8 GaultonA BellisLJ BentoAP ChambersJ DaviesMHerseyA LightY McGlincheyS MichalovichD Al-LazikaniB et al (2012) ChEMBL a large-scale bioactivitydatabase for drug discovery Nucleic Acids Res 40D1100ndashD1107
9 MulderKW WangX EscriuC ItoY SchwarzRF GillisJSirokmanyG DonatiG Uribe-LewisS PavlidisP et al (2012)Diverse epigenetic strategies interact to control epidermaldifferentiation Nat Cell Biol 14 753ndash763
10 ChihB LiuP ChinnY ChalouniC KomuvesLG HassPESandovalW and PetersonAS (2012) A ciliopathy complex atthe transition zone protects the cilia as a privileged membranedomain Nat Cell Biol 14 61ndash72
11 Prager-KhoutorskyM LichtensteinA KrishnanRRajendranK MayoA KamZ GeigerB and BershadskyAD(2011) Fibroblast polarization is a matrix-rigidity-dependentprocess controlled by focal adhesion mechanosensing Nat CellBiol 13 1457ndash1465
12 Imberg-KazdanK HaS GreenfieldA PoultneyCSBonneauR LoganSK and GarabedianMJ (2013) A genome-wide RNA interference screen identifies new regulators ofandrogen receptor function in prostate cancer cells Genome Res23 581ndash591
13 PowellML SmithJA SowaME HarperJW IftnerTStubenrauchF and HowleyPM (2010) NCoR1 mediatespapillomavirus E8E2C transcriptional repression J Virol 844451ndash4460
14 GalluzziL MorselliE VitaleI KeppO SenovillaLCriolloA ServantN PaccardC HupeP RobertT et al(2010) miR-181a and miR-630 regulate cisplatin-induced cancercell death Cancer Res 70 1793ndash1803
15 SmithJA WhiteEA SowaME PowellML OttingerMHarperJW and HowleyPM (2010) Genome-wide siRNA screenidentifies SMCX EP400 and Brd4 as E2-dependent regulators ofhuman papillomavirus oncogene expression Proc Natl Acad SciUSA 107 3752ndash3757
16 ZhangSL YerominAV ZhangXH YuY SafrinaOPennaA RoosJ StaudermanKA and CahalanMD (2006)Genome-wide RNAi screen of Ca(2+) influx identifies genes thatregulate Ca(2+) release-activated Ca(2+) channel activity ProcNatl Acad Sci USA 103 9357ndash9362
17 FriedmanA and PerrimonN (2006) A functional RNAi screenfor regulators of receptor tyrosine kinase and ERK signallingNature 444 230ndash234
18 GwackY SharmaS NardoneJ TanasaB IugaASrikanthS OkamuraH BoltonD FeskeS HoganPG et al(2006) A genome-wide Drosophila RNAi screen identifies DYRK-family kinases as regulators of NFAT Nature 441 646ndash650
19 BardF CasanoL MallabiabarrenaA WallaceE SaitoKKitayamaH GuizzuntiG HuY WendlerF DasguptaRet al (2006) Functional genomics reveals genes involved inprotein secretion and Golgi organization Nature 439 604ndash607
20 VigM PeineltC BeckA KoomoaDL RabahD Koblan-HubersonM KraftS TurnerH FleigA PennerR et al(2006) CRACM1 is a plasma membrane protein essential forstore-operated Ca2+ entry Science 312 1220ndash1223
21 DasGuptaR KaykasA MoonRT and PerrimonN (2005)Functional genomic analysis of the Wnt-wingless signalingpathway Science 308 826ndash833
22 NybakkenK VokesSA LinTY McMahonAP andPerrimonN (2005) A genome-wide RNA interference screen inDrosophila melanogaster cells for new components of the Hhsignaling pathway Nat Genet 37 1323ndash1332
8 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from
FUNDING
The NIH Intramural Research program Funding foropen access charge National Insitutes of Health USA
Conflict of interest statement None declared
REFERENCES
1 WangY XiaoJ SuzekTO ZhangJ WangJ ZhouZHanL KarapetyanK DrachevaS ShoemakerBA et al(2012) PubChemrsquos BioAssay database Nucleic Acids Res 40D400ndashD412
2 WangY BoltonE DrachevaS KarapetyanKShoemakerBA SuzekTO WangJ XiaoJ ZhangJ andBryantSH (2010) An overview of the PubChem BioAssayresource Nucleic Acids Res 38 D255ndashD266
3 WangY XiaoJ SuzekTO ZhangJ WangJ and BryantSH(2009) PubChem a public information system for analyzingbioactivities of small molecules Nucleic Acids Res 37W623ndashW633
4 BoltonEE WangY ThiessenPA and BryantSH (2008)PubChem integrated platform of small molecules and biologicalactivities Annu Rep Comput Chem 4 217ndash241
5 SayersEW BarrettT BensonDA BoltonE BryantSHCaneseK ChetverninV ChurchDM DiCuccioMFederhenS et al (2011) Database resources of the NationalCenter for Biotechnology Information Nucleic Acids Res 39D38ndashD51
6 ButkiewiczM LoweEW Jr MuellerR MendenhallJLTeixeiraPL WeaverCD and MeilerJ (2013) Benchmarkingligand-based virtual high-throughput screening with the PubChemdatabase Molecules 18 735ndash756
7 SharmanJL BensonHE PawsonAJ LukitoVMpamhangaCP BombailV DavenportAP PetersJASpeddingM and HarmarAJ (2013) IUPHAR-DB updateddatabase content and new features Nucleic Acids Res 41D1083ndashD1088
8 GaultonA BellisLJ BentoAP ChambersJ DaviesMHerseyA LightY McGlincheyS MichalovichD Al-LazikaniB et al (2012) ChEMBL a large-scale bioactivitydatabase for drug discovery Nucleic Acids Res 40D1100ndashD1107
9 MulderKW WangX EscriuC ItoY SchwarzRF GillisJSirokmanyG DonatiG Uribe-LewisS PavlidisP et al (2012)Diverse epigenetic strategies interact to control epidermaldifferentiation Nat Cell Biol 14 753ndash763
10 ChihB LiuP ChinnY ChalouniC KomuvesLG HassPESandovalW and PetersonAS (2012) A ciliopathy complex atthe transition zone protects the cilia as a privileged membranedomain Nat Cell Biol 14 61ndash72
11 Prager-KhoutorskyM LichtensteinA KrishnanRRajendranK MayoA KamZ GeigerB and BershadskyAD(2011) Fibroblast polarization is a matrix-rigidity-dependentprocess controlled by focal adhesion mechanosensing Nat CellBiol 13 1457ndash1465
12 Imberg-KazdanK HaS GreenfieldA PoultneyCSBonneauR LoganSK and GarabedianMJ (2013) A genome-wide RNA interference screen identifies new regulators ofandrogen receptor function in prostate cancer cells Genome Res23 581ndash591
13 PowellML SmithJA SowaME HarperJW IftnerTStubenrauchF and HowleyPM (2010) NCoR1 mediatespapillomavirus E8E2C transcriptional repression J Virol 844451ndash4460
14 GalluzziL MorselliE VitaleI KeppO SenovillaLCriolloA ServantN PaccardC HupeP RobertT et al(2010) miR-181a and miR-630 regulate cisplatin-induced cancercell death Cancer Res 70 1793ndash1803
15 SmithJA WhiteEA SowaME PowellML OttingerMHarperJW and HowleyPM (2010) Genome-wide siRNA screenidentifies SMCX EP400 and Brd4 as E2-dependent regulators ofhuman papillomavirus oncogene expression Proc Natl Acad SciUSA 107 3752ndash3757
16 ZhangSL YerominAV ZhangXH YuY SafrinaOPennaA RoosJ StaudermanKA and CahalanMD (2006)Genome-wide RNAi screen of Ca(2+) influx identifies genes thatregulate Ca(2+) release-activated Ca(2+) channel activity ProcNatl Acad Sci USA 103 9357ndash9362
17 FriedmanA and PerrimonN (2006) A functional RNAi screenfor regulators of receptor tyrosine kinase and ERK signallingNature 444 230ndash234
18 GwackY SharmaS NardoneJ TanasaB IugaASrikanthS OkamuraH BoltonD FeskeS HoganPG et al(2006) A genome-wide Drosophila RNAi screen identifies DYRK-family kinases as regulators of NFAT Nature 441 646ndash650
19 BardF CasanoL MallabiabarrenaA WallaceE SaitoKKitayamaH GuizzuntiG HuY WendlerF DasguptaRet al (2006) Functional genomics reveals genes involved inprotein secretion and Golgi organization Nature 439 604ndash607
20 VigM PeineltC BeckA KoomoaDL RabahD Koblan-HubersonM KraftS TurnerH FleigA PennerR et al(2006) CRACM1 is a plasma membrane protein essential forstore-operated Ca2+ entry Science 312 1220ndash1223
21 DasGuptaR KaykasA MoonRT and PerrimonN (2005)Functional genomic analysis of the Wnt-wingless signalingpathway Science 308 826ndash833
22 NybakkenK VokesSA LinTY McMahonAP andPerrimonN (2005) A genome-wide RNA interference screen inDrosophila melanogaster cells for new components of the Hhsignaling pathway Nat Genet 37 1323ndash1332
8 Nucleic Acids Research 2013
at National Institutes of H
ealth Library on D
ecember 12 2013
httpnaroxfordjournalsorgD
ownloaded from