TCRD and Pharos 2021: mining the human proteome for ...

13
D1334–D1346 Nucleic Acids Research, 2021, Vol. 49, Database issue Published online 6 November 2020 doi: 10.1093/nar/gkaa993 TCRD and Pharos 2021: mining the human proteome for disease biology Timothy K. Sheils 1 , Stephen L. Mathias 2 , Keith J. Kelleher 1 , Vishal B. Siramshetty 1 , Dac-Trung Nguyen 1 , Cristian G. Bologa 2 , Lars Juhl Jensen 3 , Du ˇ sica Vidovi ´ c 4,5 , Amar Koleti 4 , Stephan C. Sch ¨ urer 4,5,6 , Anna Waller 7 , Jeremy J. Yang 2 , Jayme Holmes 2 , Giovanni Bocci 2 , Noel Southall 1 , Poorva Dharkar 1 , Ewy Math ´ e 1 , Anton Simeonov 1 and Tudor I. Oprea 2,3,8,9,* 1 National Center for Advancing Translational Science, 9800 Medical Center Drive, Rockville, MD 20850, USA, 2 Translational Informatics Division, Department of Internal Medicine, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA, 3 Novo Nordisk Foundation Center for Protein Research, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark, 4 Institute for Data Science and Computing, University of Miami, Coral Gables, FL 33146, USA, 5 Department of Molecular and Cellular Pharmacology, Miller School of Medicine, University of Miami, Miami, FL 33136, USA, 6 Sylvester Comprehensive Cancer Center, Miller School of Medicine, University of Miami, Miami, FL 33136, USA, 7 UNM Center for Molecular Discovery, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA, 8 UNM Comprehensive Cancer Center, University of New Mexico Health Sciences Center, Albuquerque, NM 87131, USA and 9 Department of Rheumatology and Inflammation Research, Institute of Medicine, Sahlgrenska Academy at University of Gothenburg, 40530 Gothenburg, Sweden Received September 15, 2020; Revised October 09, 2020; Editorial Decision October 12, 2020; Accepted October 14, 2020 ABSTRACT In 2014, the National Institutes of Health (NIH) initi- ated the Illuminating the Druggable Genome (IDG) program to identify and improve our understand- ing of poorly characterized proteins that can poten- tially be modulated using small molecules or biolog- ics. Two resources produced from these efforts are: The Target Central Resource Database (TCRD) (http: //juniper.health.unm.edu/tcrd/ ) and Pharos (https:// pharos.nih.gov/ ), a web interface to browse the TCRD. The ultimate goal of these resources is to highlight and facilitate research into currently un- derstudied proteins, by aggregating a multitude of data sources, and ranking targets based on the amount of data available, and presenting data in ma- chine learning ready format. Since the 2017 release, both TCRD and Pharos have produced two major releases, which have incorporated or expanded an additional 25 data sources. Recently incorporated data types include human and viral-human protein– protein interactions, protein–disease and protein– phenotype associations, and drug-induced gene sig- natures, among others. These aggregated data have enabled us to generate new visualizations and con- tent sections in Pharos, in order to empower users to find new areas of study in the druggable genome. INTRODUCTION It is widely acknowledged that only a subset of the hu- man genome that is considered ‘druggable’ is subject to scientific inquiry (1). Since 2014, the National Institutes of Health (NIH) initiative Illuminating the Druggable Genome (IDG), has been working toward the goal of shed- ding light on understudied protein targets that could be po- tentially modulated by small molecules or biologics. Pharos and the Target Central Resource Database (TCRD) are open-access resources developed as a part of the IDG pro- gram and jointly serve as the knowledge hub for over 20 000 human protein targets (2,3). First introduced and pub- lished in the 2017 NAR database issue (4), TCRD col- lates information from several gene/protein data sources and Pharos serves as a web interface that presents this in- formation to users. Since its initial launch, the Pharos pa- per was cited in more than 110 times cf. Google Scholar, and the portal is accessed on average by 1600 new vis- itors monthly, with a total of 425 000 pageviews and >20 000 full TCRD database downloads (as of 3 September * To whom correspondence should be addressed. Tel: +1 505 925 7529; Fax: +1 505 925 7625; Email: [email protected] C The Author(s) 2020. Published by Oxford University Press on behalf of Nucleic Acids Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. Downloaded from https://academic.oup.com/nar/article/49/D1/D1334/5958490 by guest on 20 July 2022

Transcript of TCRD and Pharos 2021: mining the human proteome for ...

D1334ndashD1346 Nucleic Acids Research 2021 Vol 49 Database issue Published online 6 November 2020doi 101093nargkaa993

TCRD and Pharos 2021 mining the human proteomefor disease biologyTimothy K Sheils1 Stephen L Mathias2 Keith J Kelleher1 Vishal B Siramshetty1Dac-Trung Nguyen 1 Cristian G Bologa2 Lars Juhl Jensen3 Dusica Vidovic45Amar Koleti4 Stephan C Schurer 456 Anna Waller7 Jeremy J Yang2 Jayme Holmes2Giovanni Bocci2 Noel Southall 1 Poorva Dharkar1 Ewy Mathe1 Anton Simeonov1 andTudor I Oprea 2389

1National Center for Advancing Translational Science 9800 Medical Center Drive Rockville MD 20850 USA2Translational Informatics Division Department of Internal Medicine University of New Mexico Health SciencesCenter Albuquerque NM 87131 USA 3Novo Nordisk Foundation Center for Protein Research Faculty of Health andMedical Sciences University of Copenhagen 2200 Copenhagen Denmark 4Institute for Data Science andComputing University of Miami Coral Gables FL 33146 USA 5Department of Molecular and CellularPharmacology Miller School of Medicine University of Miami Miami FL 33136 USA 6Sylvester ComprehensiveCancer Center Miller School of Medicine University of Miami Miami FL 33136 USA 7UNM Center for MolecularDiscovery University of New Mexico Health Sciences Center Albuquerque NM 87131 USA 8UNM ComprehensiveCancer Center University of New Mexico Health Sciences Center Albuquerque NM 87131 USA and 9Departmentof Rheumatology and Inflammation Research Institute of Medicine Sahlgrenska Academy at University ofGothenburg 40530 Gothenburg Sweden

Received September 15 2020 Revised October 09 2020 Editorial Decision October 12 2020 Accepted October 14 2020

ABSTRACT

In 2014 the National Institutes of Health (NIH) initi-ated the Illuminating the Druggable Genome (IDG)program to identify and improve our understand-ing of poorly characterized proteins that can poten-tially be modulated using small molecules or biolog-ics Two resources produced from these efforts areThe Target Central Resource Database (TCRD) (httpjuniperhealthunmedutcrd) and Pharos (httpspharosnihgov) a web interface to browse theTCRD The ultimate goal of these resources is tohighlight and facilitate research into currently un-derstudied proteins by aggregating a multitude ofdata sources and ranking targets based on theamount of data available and presenting data in ma-chine learning ready format Since the 2017 releaseboth TCRD and Pharos have produced two majorreleases which have incorporated or expanded anadditional 25 data sources Recently incorporateddata types include human and viral-human proteinndashprotein interactions proteinndashdisease and proteinndashphenotype associations and drug-induced gene sig-natures among others These aggregated data have

enabled us to generate new visualizations and con-tent sections in Pharos in order to empower usersto find new areas of study in the druggable genome

INTRODUCTION

It is widely acknowledged that only a subset of the hu-man genome that is considered lsquodruggablersquo is subject toscientific inquiry (1) Since 2014 the National Institutesof Health (NIH) initiative Illuminating the DruggableGenome (IDG) has been working toward the goal of shed-ding light on understudied protein targets that could be po-tentially modulated by small molecules or biologics Pharosand the Target Central Resource Database (TCRD) areopen-access resources developed as a part of the IDG pro-gram and jointly serve as the knowledge hub for over 20000 human protein targets (23) First introduced and pub-lished in the 2017 NAR database issue (4) TCRD col-lates information from several geneprotein data sourcesand Pharos serves as a web interface that presents this in-formation to users Since its initial launch the Pharos pa-per was cited in more than 110 times cf Google Scholarand the portal is accessed on average by sim1600 new vis-itors monthly with a total of sim 425 000 pageviews andgt20 000 full TCRD database downloads (as of 3 September

To whom correspondence should be addressed Tel +1 505 925 7529 Fax +1 505 925 7625 Email topreasaludunmedu

Ccopy The Author(s) 2020 Published by Oxford University Press on behalf of Nucleic Acids ResearchThis is an Open Access article distributed under the terms of the Creative Commons Attribution License (httpcreativecommonsorglicensesby40) whichpermits unrestricted reuse distribution and reproduction in any medium provided the original work is properly cited

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1335

Figure 1 Chart of the TDL changes between TCRD v30 and v67 The decrease of Tdark and subsequent increase of other development levels shows anoverall increase in target illumination

Figure 2 Browse targets page showing new numeric slider facets for Log PubMed Score range (A) searchable protein Family filter panel (B) and improvedtarget card view (C) The Log PubMed Score filter also shows the definition section displayed

2020) TCRD has been enhanced by inclusion of new datatypes and data from emerging resources which were pre-pared for machine-learning readiness The scope of TCRDhas also been expanded moving past the initial area of fo-cus of the druggable genome to aggregate data about the en-tire human proteome The first published version of Pharoswas based on TCRD version 30 the latest version usesTCRD version 67 which currently aggregates data from78 data sources (see Supplementary Information) Further-more the search functionality on the Pharos web server

has been upgraded to use the graph querying language(GraphQL httpsgraphqlorg) API that facilitates fasterdata retrieval directly from TCRD

In the current paper we describe changes implementedfor the 2021 version such as new data sources and how datafrom these sources have been integrated into TCRD andpresented in Pharos The latest architecture of the databaseand both new and improved features implemented in thePharos platform are described in the following sections ofthe paper

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1336 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 3 Target details view for ACE2 showing the new table of contents section (A) Target overview section (B) and TDL section (C)

MATERIALS AND METHODS

The newly added data includes mouse and rat proteins fromUniProt (5) with their associated phenotype data extractedfrom the International Mouse Phenotyping Consortium (6)and the Rat Genome Database (7) respectively The DiseaseOntology (DO) (8) data were further extended and newerontologies such as Rat Disease Ontology (9) and Mam-malian Phenotype Ontology (10) were added to facilitatecomparison of target-disease and target-phenotype associ-ations across multiple species Similarly additional data wasincluded from GWAS (11) and OMIM (httpsomimorg)resources The lsquoDisease detailsrsquo page reflects these changesin addition to the other improvements made in displayingthe list of associated targets Throughout this manuscriptthe term lsquotargetrsquo refers to lsquogene or protein of interestrsquo assometimes attributes are related to genes (eg orthologs)and sometimes to proteins However TCRD is based onlsquoreviewedrsquo (manually curated) human protein entries fromUniProt

Target expression data was primarily extracted fromGTEx (12) the Human Protein Atlas (HPA) (13) UniProtand TISSUES (14) The GTEx dataset was further ex-

tended to include sex-specific expression values In addi-tion cell line expression data was added from HPA andthe Human Proteome Map (15) and expression data wascollected for orthologous genes from 17 different speciesOther expression datasets integrated into the latest versionare the Cancer Cell Line Encyclopedia (16) and cell per-turbation expression data from the Library of IntegratedNetwork-Based Cellular Signatures (17) Furthermore thetarget expression panel was visually upgraded with inter-active anatomograms (18) (httpswwwebiacukgxa) forboth sexes which further provides systematic mappings tothe source tissues

Proteinndashprotein interaction (PPI) is another data typethat was included in the latest version by adding the mostrecent PPI data from STRING 11 (19) Given the COVID-19 pandemic viral-human PPI data were added from P-HIPSTer (20) to help explore PPIs between viral pathogensand human proteins

In addition to the new data from the aforementionedsources TCRD has been continuously updated when newerversions of source databases (eg ChEMBL (21) DrugCen-tral (22) JensenLab PubMed scores (23) are released As the

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1337

Figure 4 IDG generated resources for ACE2 consisting of small molecule reagents (A) and data (B) Targets with mouse cell lines such as GPR68 alsohave information about that resource (C) as well as a mouse tissue expression viewer (D) Other data can include such as the case for CACNA2D4 mousephenotype (E) and cell line (F) data Supplementary Table S2 contains a full breakdown of data types and fields collected

amount of data available for each target changed with eachrelease of TCRD the target development levels (TDLs) forrespective targets were recalculated when the target crite-ria changed This was done using automated scripts Verybriefly the TDL is one of four potential values TclinTchem Tbio or Tdark Tclin are protein drug targets viawhich approved drugs act (24ndash26) which currently includes659 human proteins Tchem are proteins that are not Tclinbut are known to bind small molecules with high potency(currently N = 1607) Tbio includes proteins that have GeneOntology (27) lsquoleafrsquo (lowest level) term annotations basedon experimental evidence or meet two of the followingthree conditions A fractional publication count (28) above5 three or more Gene RIF lsquoReference Into Functionrsquo anno-tations (httpswwwncbinlmnihgovgeneabout-generif)or 50 or more commercial antibodies as counted in the An-tibodypedia portal (29) The fourth category Tdark cur-rently includes sim31 of the human proteins that were man-ually curated at the primary sequence level in UniProt butdo not meet any of the Tclin Tchem or Tbio criteria Fig-ure 1 shows the TDL count changes between version 3and 67

For a further in depth exploration of each additionalTCRD dataset and its database changes we direct thereader to Supplementary Materials

RESULTS

In the first 2 years following the initial release the Pharosteam frequently demoed the site with a focus on obtaininguser feedback which provided the user-centered design in-formation needed as Pharos underwent a ground-up rewritein late 2018 By focusing on user needs such as improv-ing the target details page hierarchy and navigation andadding more detailed explanations of the terms and ideasrepresented within Pharos we were able to streamline targetpresentation and simplify many pages Many tables wereswapped with sortable list elements akin to a shopping siteallowing more data to be shown than a table would allowand also allowing us to display dynamic data as desiredPharos was also redesigned with a focus on mobile usabilityadjusting styles and visualizations depending on the screenused Pharos is also installable as a Progressive Web Appwhich allows users to install Pharos as a mobile applicationon their device increasing the ease of access for Pharos

To increase speed and responsiveness we switched froma fully server-side rendered application to a hybrid clientand server-side application allowing faster page render-ing and data retrieval We replaced the REST API with aGraphQL instance which adds flexibility to data retrievalas well as potentially reducing the amount of data beingsent over the network A discussion of GraphQL bench-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1338 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 5 Target details view for ACE2 showing the improved Protein Data Bank data viewer (A) and predicted viral interactions (B)

marking is summarized in Supplementary Table S1 Doc-umentation for the GraphQL format as well as an interac-tive sandbox with several sample queries can be found athttpspharosnihgovapi

Browse pages

Filter functionality has been expanded and users are ableto examine the entire list of values for each filter as wellas search for text within the filter values In the case of nu-meric values such as novelty and PubMed score a rangeslider allows users to refine results All filters also have alsquorsquo help button that allows users to view a quick definition

of the filter or to visit the original source if desired Thesefeatures are displayed in Figure 2 In addition sub-lists aregenerated on entity (targetdiseaseligand) pages that canbe viewed in their respective list browsers It is therefore pos-sible to browse the list of disease or ligands associated witha specific target or targets associated with a disease or lig-and For associated diseases additional numeric filters suchas association score and interaction scores are available aswell as ligand affinity measurements for associated ligandlists

By registering users are able to select targets of interestand save them as custom lists which are added to the mainfilter panel and are always available This allows users to

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1339

Figure 6 Target details view for ACE2 showing the tissue expression section with highlighted tissues (A) and protein to protein interaction section (B)The frequency of updating for resources integrated in TCRD differs Text-mined sources are updated more frequently and changes in the scientific literatureallow us to track certain associations sooner Therefore selecting one of these sources (as displayed) will show that ACE2 is expressed in the lungs

further refine a filtered list of targets as well as view howthis new list may be further filtered or examine the makeupof the list by the various filter categories

Target details pages

In improving the target details pages we relied on user in-terviews that we had conducted to determine the optimallayout and ordering of data on the page In the newest ver-sion of Pharos target detail pages start off with synonymsidentifiers and a broad overview of the knowledge about thetarget as shown in Figure 3 Next the user is able to exam-

ine the TDL criteria in relation to the specific target Drugand ligand data comes next followed by disease associationdata Target expression and interaction data follows withpublication data amino acid sequence information ndash in-cluding the ProtVista (30) graphical representation and fil-ters for related targets completing the page Sections whereno data are available such as drug and ligand browsers forTdark and Tbio targets are automatically removed Thenavigation panel is always visible on the left hand side al-lowing users to easily jump to the section of interest Mostdata panels have been improved by adding tooltips to fur-ther explain properties and a help panel that contains addi-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1340 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 7 Disease details page for Huntingtonrsquos disease showing Disease Ontology description and hierarchy (A) and TIN-X plot showing novelty targetsand their importance mapped with a log scale (B)

tional information such as definitions explanatory articlesor raw data Additional panels have been added to displayadditional data

TDL descriptions One addition to the previous PharosUI is a description panel While the various TDL criteriaare well documented it was not apparent to users how andwhich TDL criteria are met by a given target This sectiondisplays which criteria have been met as well as the levelor score A highlighted checkbox indicates which TDL cri-teria have been met This provides valuable evidence thatillustrates to the user how TCRD generated this rankingFor Tdark and Tbio this is also a useful indicator of whatcriteria are still deficient and how close to the cutoff theyare potentially driving research in those areas This panel isalso featured in Figure 3

IDG resources A feature of the IDG program that has ex-panded since the initial publication is that of new data orreagents which are generated by IDG Consortium mem-bers Where data or reagents are available a browseable sec-tion is displayed in Pharos allowing users to navigate tothe data set and even order them in the case of physical

resources as shown in Figure 4 For targets with mouse ex-pression data we have incorporated our anatamogram im-age (discussed below) using malefemale and brain mouseimages We refer the reader to Supplementary Materials foran in depth discussion of IDG Consortium resources avail-able in Pharos

Improved liganddrug section Drug and ligand browsingis now done via a pageable list Users can now page throughthe entire list of approved drugs or active ligands or openthe entire list in the browse view allowing for filtering ofthese lists while specifying the number of targets each lig-and is active on

Disease associations The disease associations panel hasbeen greatly expanded both in the amount of featured re-sources now incorporating DisGeNet (31) and eRAM (32)as well as in size and placement The source of each dis-ease association is displayed as a collapsible panel that bun-dles multiple sources for the same disease Upon expandingthis panel users are able to examine the evidence used tomake this association This allows users to discover prove-nance an essential attribute of data aggregated in TCRD

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1341

Figure 8 Ligand details page for acetazolamide shows ligand description information as well as synonyms and identifiers (A) Target activity is shownwith expanded activity for CA2 (B)

It is also possible to view the list of associated diseases asa browseable list again allowing for filtering and more re-fined data analysis

PDB visualizations While PDB identifiers have alwaysbeen available in Pharos they were previously displayed asa list of linkouts with no additional information (33) InPharos 30 the PDB section was expanded and redesignedto display three-dimensional (3D) structures for proteinsincluding bound ligands where available This 3D visualiza-tion employing NGL Viewer (3435) is highly interactiveallowing users to drag zoom pan and spin the structure tothoroughly examine it This interface also enables users tobrowse the PDB identifiers and structures associated witha given target Clicking on a table row loads the respectivePDB entry and displays the ligands shown in the structureOther details include the reference source for the id as wellas the method used to generate the structure This feature ishighlighted in Figure 5

Predicted viral interactions Predicted viral-human proteininteractions from the P-HIPSTer atlas are displayed in aseparate panel that shows a list of viruses and their as-sociated taxonomic data specifically for the targeted hu-man protein Each virus also lists the viral proteins thatthe human protein is predicted to interact with in addi-tion to a likelihood ratio which measures the strength ofthe sequence- and structure-based prediction These fea-tures are shown in Figure 5 using ACE2 the angiotensin

converting enzyme 2 as example ACE2 is the functionalentry receptor for both severe acute respiratory syndrome(SARS) coronaviruses SARS-CoV (36) and SARS-CoV-2(37)

Target expression data The target expression data panelhas also been expanded as shown in Figure 6 We replacedthe static human figure with an expanded interactive anato-mogram (18) This visualization contains both male and fe-male figures allowing sex-specific tissue information (egfrom GTEx) to be highlighted Users can toggle the brainimage separately for more fine-grained brain tissue expres-sion visualization All figures are zoomable and pannableto improve visibility When possible all source tissues havebeen mapped to standardized Uberon (38) identifiers en-abling us to merge repetitive tissues Expression data can beviewed by source and tissue values are searchable Click-ing on a tissue panel like the disease associations sectionallows users to view detailed provenance of the data bothshowing the source and relevant confidence values for thetissue expression Gradient coloration allows us to displaythe tissue expression level either a qualitative value (lowmedium high) or a numeric value scale depending onthe data source giving users a quick visualization of thisdata We note that several sources have an overlap in dataand content however they may also present unique dataPharos displays the variety between sources which can helpusers decide what sources to focus on It also illustrates thatthe text-mining-based resources tend to be updated more

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1342 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 9 Target list described in Target List Search use case (A) and target list described in Binding partner search use case (B) Both panels show selectedfilters including the ability to filter target lists by IDG featured sub-lists

often than frequently accessed scientific databases that ag-gregate experimental data When available cell lines and or-thologs are displayed with expandable evidence panels

Proteinndashprotein interactions The introduction of datafrom Bioplex (39) STRING and Reactome (40) has allowedus to display rich PPI data and allows users to examine howmultiple targets may interact as shown in Figure 6 Thispanel consists of a pageable list of targets known to interactwith the current target These targets display the illumina-tion graph which gives users a quick visual representationof the level and types of knowledge available for the tar-get The originating data source is also displayed as well asthe confidence of this interaction This list of PPIs may beviewed either directly in STRING or viewed in the browsepage as a filterable list

Disease details pages

While Pharos remains target-centric it is equally importantto be able to examine diseases in relation to their associ-ated targets As such Pharos has always included diseasesas a browseable list as well as a disease details page Pharos30 expands on the data collected and displayed about dis-eases When available a description from the Disease On-tology is shown The hierarchy of each specific disease isalso shown allowing users to view parent or child diseasesor syndromes as well as the siblings for each disease Inaddition a breakdown of the targets associated with eachdisease is shown which can also be opened in the targetbrowse page allowing users to filter the list of associatedtargets The target-disease Novelty score is shown in theTarget-Importance Novelty eXplorer (TIN-X (41) panelSimilar to the target based TIN-X view this is a scatter-plot of novel targets related to the disease of interest Nov-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1335

Figure 1 Chart of the TDL changes between TCRD v30 and v67 The decrease of Tdark and subsequent increase of other development levels shows anoverall increase in target illumination

Figure 2 Browse targets page showing new numeric slider facets for Log PubMed Score range (A) searchable protein Family filter panel (B) and improvedtarget card view (C) The Log PubMed Score filter also shows the definition section displayed

2020) TCRD has been enhanced by inclusion of new datatypes and data from emerging resources which were pre-pared for machine-learning readiness The scope of TCRDhas also been expanded moving past the initial area of fo-cus of the druggable genome to aggregate data about the en-tire human proteome The first published version of Pharoswas based on TCRD version 30 the latest version usesTCRD version 67 which currently aggregates data from78 data sources (see Supplementary Information) Further-more the search functionality on the Pharos web server

has been upgraded to use the graph querying language(GraphQL httpsgraphqlorg) API that facilitates fasterdata retrieval directly from TCRD

In the current paper we describe changes implementedfor the 2021 version such as new data sources and how datafrom these sources have been integrated into TCRD andpresented in Pharos The latest architecture of the databaseand both new and improved features implemented in thePharos platform are described in the following sections ofthe paper

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1336 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 3 Target details view for ACE2 showing the new table of contents section (A) Target overview section (B) and TDL section (C)

MATERIALS AND METHODS

The newly added data includes mouse and rat proteins fromUniProt (5) with their associated phenotype data extractedfrom the International Mouse Phenotyping Consortium (6)and the Rat Genome Database (7) respectively The DiseaseOntology (DO) (8) data were further extended and newerontologies such as Rat Disease Ontology (9) and Mam-malian Phenotype Ontology (10) were added to facilitatecomparison of target-disease and target-phenotype associ-ations across multiple species Similarly additional data wasincluded from GWAS (11) and OMIM (httpsomimorg)resources The lsquoDisease detailsrsquo page reflects these changesin addition to the other improvements made in displayingthe list of associated targets Throughout this manuscriptthe term lsquotargetrsquo refers to lsquogene or protein of interestrsquo assometimes attributes are related to genes (eg orthologs)and sometimes to proteins However TCRD is based onlsquoreviewedrsquo (manually curated) human protein entries fromUniProt

Target expression data was primarily extracted fromGTEx (12) the Human Protein Atlas (HPA) (13) UniProtand TISSUES (14) The GTEx dataset was further ex-

tended to include sex-specific expression values In addi-tion cell line expression data was added from HPA andthe Human Proteome Map (15) and expression data wascollected for orthologous genes from 17 different speciesOther expression datasets integrated into the latest versionare the Cancer Cell Line Encyclopedia (16) and cell per-turbation expression data from the Library of IntegratedNetwork-Based Cellular Signatures (17) Furthermore thetarget expression panel was visually upgraded with inter-active anatomograms (18) (httpswwwebiacukgxa) forboth sexes which further provides systematic mappings tothe source tissues

Proteinndashprotein interaction (PPI) is another data typethat was included in the latest version by adding the mostrecent PPI data from STRING 11 (19) Given the COVID-19 pandemic viral-human PPI data were added from P-HIPSTer (20) to help explore PPIs between viral pathogensand human proteins

In addition to the new data from the aforementionedsources TCRD has been continuously updated when newerversions of source databases (eg ChEMBL (21) DrugCen-tral (22) JensenLab PubMed scores (23) are released As the

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1337

Figure 4 IDG generated resources for ACE2 consisting of small molecule reagents (A) and data (B) Targets with mouse cell lines such as GPR68 alsohave information about that resource (C) as well as a mouse tissue expression viewer (D) Other data can include such as the case for CACNA2D4 mousephenotype (E) and cell line (F) data Supplementary Table S2 contains a full breakdown of data types and fields collected

amount of data available for each target changed with eachrelease of TCRD the target development levels (TDLs) forrespective targets were recalculated when the target crite-ria changed This was done using automated scripts Verybriefly the TDL is one of four potential values TclinTchem Tbio or Tdark Tclin are protein drug targets viawhich approved drugs act (24ndash26) which currently includes659 human proteins Tchem are proteins that are not Tclinbut are known to bind small molecules with high potency(currently N = 1607) Tbio includes proteins that have GeneOntology (27) lsquoleafrsquo (lowest level) term annotations basedon experimental evidence or meet two of the followingthree conditions A fractional publication count (28) above5 three or more Gene RIF lsquoReference Into Functionrsquo anno-tations (httpswwwncbinlmnihgovgeneabout-generif)or 50 or more commercial antibodies as counted in the An-tibodypedia portal (29) The fourth category Tdark cur-rently includes sim31 of the human proteins that were man-ually curated at the primary sequence level in UniProt butdo not meet any of the Tclin Tchem or Tbio criteria Fig-ure 1 shows the TDL count changes between version 3and 67

For a further in depth exploration of each additionalTCRD dataset and its database changes we direct thereader to Supplementary Materials

RESULTS

In the first 2 years following the initial release the Pharosteam frequently demoed the site with a focus on obtaininguser feedback which provided the user-centered design in-formation needed as Pharos underwent a ground-up rewritein late 2018 By focusing on user needs such as improv-ing the target details page hierarchy and navigation andadding more detailed explanations of the terms and ideasrepresented within Pharos we were able to streamline targetpresentation and simplify many pages Many tables wereswapped with sortable list elements akin to a shopping siteallowing more data to be shown than a table would allowand also allowing us to display dynamic data as desiredPharos was also redesigned with a focus on mobile usabilityadjusting styles and visualizations depending on the screenused Pharos is also installable as a Progressive Web Appwhich allows users to install Pharos as a mobile applicationon their device increasing the ease of access for Pharos

To increase speed and responsiveness we switched froma fully server-side rendered application to a hybrid clientand server-side application allowing faster page render-ing and data retrieval We replaced the REST API with aGraphQL instance which adds flexibility to data retrievalas well as potentially reducing the amount of data beingsent over the network A discussion of GraphQL bench-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1338 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 5 Target details view for ACE2 showing the improved Protein Data Bank data viewer (A) and predicted viral interactions (B)

marking is summarized in Supplementary Table S1 Doc-umentation for the GraphQL format as well as an interac-tive sandbox with several sample queries can be found athttpspharosnihgovapi

Browse pages

Filter functionality has been expanded and users are ableto examine the entire list of values for each filter as wellas search for text within the filter values In the case of nu-meric values such as novelty and PubMed score a rangeslider allows users to refine results All filters also have alsquorsquo help button that allows users to view a quick definition

of the filter or to visit the original source if desired Thesefeatures are displayed in Figure 2 In addition sub-lists aregenerated on entity (targetdiseaseligand) pages that canbe viewed in their respective list browsers It is therefore pos-sible to browse the list of disease or ligands associated witha specific target or targets associated with a disease or lig-and For associated diseases additional numeric filters suchas association score and interaction scores are available aswell as ligand affinity measurements for associated ligandlists

By registering users are able to select targets of interestand save them as custom lists which are added to the mainfilter panel and are always available This allows users to

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1339

Figure 6 Target details view for ACE2 showing the tissue expression section with highlighted tissues (A) and protein to protein interaction section (B)The frequency of updating for resources integrated in TCRD differs Text-mined sources are updated more frequently and changes in the scientific literatureallow us to track certain associations sooner Therefore selecting one of these sources (as displayed) will show that ACE2 is expressed in the lungs

further refine a filtered list of targets as well as view howthis new list may be further filtered or examine the makeupof the list by the various filter categories

Target details pages

In improving the target details pages we relied on user in-terviews that we had conducted to determine the optimallayout and ordering of data on the page In the newest ver-sion of Pharos target detail pages start off with synonymsidentifiers and a broad overview of the knowledge about thetarget as shown in Figure 3 Next the user is able to exam-

ine the TDL criteria in relation to the specific target Drugand ligand data comes next followed by disease associationdata Target expression and interaction data follows withpublication data amino acid sequence information ndash in-cluding the ProtVista (30) graphical representation and fil-ters for related targets completing the page Sections whereno data are available such as drug and ligand browsers forTdark and Tbio targets are automatically removed Thenavigation panel is always visible on the left hand side al-lowing users to easily jump to the section of interest Mostdata panels have been improved by adding tooltips to fur-ther explain properties and a help panel that contains addi-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1340 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 7 Disease details page for Huntingtonrsquos disease showing Disease Ontology description and hierarchy (A) and TIN-X plot showing novelty targetsand their importance mapped with a log scale (B)

tional information such as definitions explanatory articlesor raw data Additional panels have been added to displayadditional data

TDL descriptions One addition to the previous PharosUI is a description panel While the various TDL criteriaare well documented it was not apparent to users how andwhich TDL criteria are met by a given target This sectiondisplays which criteria have been met as well as the levelor score A highlighted checkbox indicates which TDL cri-teria have been met This provides valuable evidence thatillustrates to the user how TCRD generated this rankingFor Tdark and Tbio this is also a useful indicator of whatcriteria are still deficient and how close to the cutoff theyare potentially driving research in those areas This panel isalso featured in Figure 3

IDG resources A feature of the IDG program that has ex-panded since the initial publication is that of new data orreagents which are generated by IDG Consortium mem-bers Where data or reagents are available a browseable sec-tion is displayed in Pharos allowing users to navigate tothe data set and even order them in the case of physical

resources as shown in Figure 4 For targets with mouse ex-pression data we have incorporated our anatamogram im-age (discussed below) using malefemale and brain mouseimages We refer the reader to Supplementary Materials foran in depth discussion of IDG Consortium resources avail-able in Pharos

Improved liganddrug section Drug and ligand browsingis now done via a pageable list Users can now page throughthe entire list of approved drugs or active ligands or openthe entire list in the browse view allowing for filtering ofthese lists while specifying the number of targets each lig-and is active on

Disease associations The disease associations panel hasbeen greatly expanded both in the amount of featured re-sources now incorporating DisGeNet (31) and eRAM (32)as well as in size and placement The source of each dis-ease association is displayed as a collapsible panel that bun-dles multiple sources for the same disease Upon expandingthis panel users are able to examine the evidence used tomake this association This allows users to discover prove-nance an essential attribute of data aggregated in TCRD

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1341

Figure 8 Ligand details page for acetazolamide shows ligand description information as well as synonyms and identifiers (A) Target activity is shownwith expanded activity for CA2 (B)

It is also possible to view the list of associated diseases asa browseable list again allowing for filtering and more re-fined data analysis

PDB visualizations While PDB identifiers have alwaysbeen available in Pharos they were previously displayed asa list of linkouts with no additional information (33) InPharos 30 the PDB section was expanded and redesignedto display three-dimensional (3D) structures for proteinsincluding bound ligands where available This 3D visualiza-tion employing NGL Viewer (3435) is highly interactiveallowing users to drag zoom pan and spin the structure tothoroughly examine it This interface also enables users tobrowse the PDB identifiers and structures associated witha given target Clicking on a table row loads the respectivePDB entry and displays the ligands shown in the structureOther details include the reference source for the id as wellas the method used to generate the structure This feature ishighlighted in Figure 5

Predicted viral interactions Predicted viral-human proteininteractions from the P-HIPSTer atlas are displayed in aseparate panel that shows a list of viruses and their as-sociated taxonomic data specifically for the targeted hu-man protein Each virus also lists the viral proteins thatthe human protein is predicted to interact with in addi-tion to a likelihood ratio which measures the strength ofthe sequence- and structure-based prediction These fea-tures are shown in Figure 5 using ACE2 the angiotensin

converting enzyme 2 as example ACE2 is the functionalentry receptor for both severe acute respiratory syndrome(SARS) coronaviruses SARS-CoV (36) and SARS-CoV-2(37)

Target expression data The target expression data panelhas also been expanded as shown in Figure 6 We replacedthe static human figure with an expanded interactive anato-mogram (18) This visualization contains both male and fe-male figures allowing sex-specific tissue information (egfrom GTEx) to be highlighted Users can toggle the brainimage separately for more fine-grained brain tissue expres-sion visualization All figures are zoomable and pannableto improve visibility When possible all source tissues havebeen mapped to standardized Uberon (38) identifiers en-abling us to merge repetitive tissues Expression data can beviewed by source and tissue values are searchable Click-ing on a tissue panel like the disease associations sectionallows users to view detailed provenance of the data bothshowing the source and relevant confidence values for thetissue expression Gradient coloration allows us to displaythe tissue expression level either a qualitative value (lowmedium high) or a numeric value scale depending onthe data source giving users a quick visualization of thisdata We note that several sources have an overlap in dataand content however they may also present unique dataPharos displays the variety between sources which can helpusers decide what sources to focus on It also illustrates thatthe text-mining-based resources tend to be updated more

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1342 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 9 Target list described in Target List Search use case (A) and target list described in Binding partner search use case (B) Both panels show selectedfilters including the ability to filter target lists by IDG featured sub-lists

often than frequently accessed scientific databases that ag-gregate experimental data When available cell lines and or-thologs are displayed with expandable evidence panels

Proteinndashprotein interactions The introduction of datafrom Bioplex (39) STRING and Reactome (40) has allowedus to display rich PPI data and allows users to examine howmultiple targets may interact as shown in Figure 6 Thispanel consists of a pageable list of targets known to interactwith the current target These targets display the illumina-tion graph which gives users a quick visual representationof the level and types of knowledge available for the tar-get The originating data source is also displayed as well asthe confidence of this interaction This list of PPIs may beviewed either directly in STRING or viewed in the browsepage as a filterable list

Disease details pages

While Pharos remains target-centric it is equally importantto be able to examine diseases in relation to their associ-ated targets As such Pharos has always included diseasesas a browseable list as well as a disease details page Pharos30 expands on the data collected and displayed about dis-eases When available a description from the Disease On-tology is shown The hierarchy of each specific disease isalso shown allowing users to view parent or child diseasesor syndromes as well as the siblings for each disease Inaddition a breakdown of the targets associated with eachdisease is shown which can also be opened in the targetbrowse page allowing users to filter the list of associatedtargets The target-disease Novelty score is shown in theTarget-Importance Novelty eXplorer (TIN-X (41) panelSimilar to the target based TIN-X view this is a scatter-plot of novel targets related to the disease of interest Nov-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1336 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 3 Target details view for ACE2 showing the new table of contents section (A) Target overview section (B) and TDL section (C)

MATERIALS AND METHODS

The newly added data includes mouse and rat proteins fromUniProt (5) with their associated phenotype data extractedfrom the International Mouse Phenotyping Consortium (6)and the Rat Genome Database (7) respectively The DiseaseOntology (DO) (8) data were further extended and newerontologies such as Rat Disease Ontology (9) and Mam-malian Phenotype Ontology (10) were added to facilitatecomparison of target-disease and target-phenotype associ-ations across multiple species Similarly additional data wasincluded from GWAS (11) and OMIM (httpsomimorg)resources The lsquoDisease detailsrsquo page reflects these changesin addition to the other improvements made in displayingthe list of associated targets Throughout this manuscriptthe term lsquotargetrsquo refers to lsquogene or protein of interestrsquo assometimes attributes are related to genes (eg orthologs)and sometimes to proteins However TCRD is based onlsquoreviewedrsquo (manually curated) human protein entries fromUniProt

Target expression data was primarily extracted fromGTEx (12) the Human Protein Atlas (HPA) (13) UniProtand TISSUES (14) The GTEx dataset was further ex-

tended to include sex-specific expression values In addi-tion cell line expression data was added from HPA andthe Human Proteome Map (15) and expression data wascollected for orthologous genes from 17 different speciesOther expression datasets integrated into the latest versionare the Cancer Cell Line Encyclopedia (16) and cell per-turbation expression data from the Library of IntegratedNetwork-Based Cellular Signatures (17) Furthermore thetarget expression panel was visually upgraded with inter-active anatomograms (18) (httpswwwebiacukgxa) forboth sexes which further provides systematic mappings tothe source tissues

Proteinndashprotein interaction (PPI) is another data typethat was included in the latest version by adding the mostrecent PPI data from STRING 11 (19) Given the COVID-19 pandemic viral-human PPI data were added from P-HIPSTer (20) to help explore PPIs between viral pathogensand human proteins

In addition to the new data from the aforementionedsources TCRD has been continuously updated when newerversions of source databases (eg ChEMBL (21) DrugCen-tral (22) JensenLab PubMed scores (23) are released As the

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1337

Figure 4 IDG generated resources for ACE2 consisting of small molecule reagents (A) and data (B) Targets with mouse cell lines such as GPR68 alsohave information about that resource (C) as well as a mouse tissue expression viewer (D) Other data can include such as the case for CACNA2D4 mousephenotype (E) and cell line (F) data Supplementary Table S2 contains a full breakdown of data types and fields collected

amount of data available for each target changed with eachrelease of TCRD the target development levels (TDLs) forrespective targets were recalculated when the target crite-ria changed This was done using automated scripts Verybriefly the TDL is one of four potential values TclinTchem Tbio or Tdark Tclin are protein drug targets viawhich approved drugs act (24ndash26) which currently includes659 human proteins Tchem are proteins that are not Tclinbut are known to bind small molecules with high potency(currently N = 1607) Tbio includes proteins that have GeneOntology (27) lsquoleafrsquo (lowest level) term annotations basedon experimental evidence or meet two of the followingthree conditions A fractional publication count (28) above5 three or more Gene RIF lsquoReference Into Functionrsquo anno-tations (httpswwwncbinlmnihgovgeneabout-generif)or 50 or more commercial antibodies as counted in the An-tibodypedia portal (29) The fourth category Tdark cur-rently includes sim31 of the human proteins that were man-ually curated at the primary sequence level in UniProt butdo not meet any of the Tclin Tchem or Tbio criteria Fig-ure 1 shows the TDL count changes between version 3and 67

For a further in depth exploration of each additionalTCRD dataset and its database changes we direct thereader to Supplementary Materials

RESULTS

In the first 2 years following the initial release the Pharosteam frequently demoed the site with a focus on obtaininguser feedback which provided the user-centered design in-formation needed as Pharos underwent a ground-up rewritein late 2018 By focusing on user needs such as improv-ing the target details page hierarchy and navigation andadding more detailed explanations of the terms and ideasrepresented within Pharos we were able to streamline targetpresentation and simplify many pages Many tables wereswapped with sortable list elements akin to a shopping siteallowing more data to be shown than a table would allowand also allowing us to display dynamic data as desiredPharos was also redesigned with a focus on mobile usabilityadjusting styles and visualizations depending on the screenused Pharos is also installable as a Progressive Web Appwhich allows users to install Pharos as a mobile applicationon their device increasing the ease of access for Pharos

To increase speed and responsiveness we switched froma fully server-side rendered application to a hybrid clientand server-side application allowing faster page render-ing and data retrieval We replaced the REST API with aGraphQL instance which adds flexibility to data retrievalas well as potentially reducing the amount of data beingsent over the network A discussion of GraphQL bench-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1338 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 5 Target details view for ACE2 showing the improved Protein Data Bank data viewer (A) and predicted viral interactions (B)

marking is summarized in Supplementary Table S1 Doc-umentation for the GraphQL format as well as an interac-tive sandbox with several sample queries can be found athttpspharosnihgovapi

Browse pages

Filter functionality has been expanded and users are ableto examine the entire list of values for each filter as wellas search for text within the filter values In the case of nu-meric values such as novelty and PubMed score a rangeslider allows users to refine results All filters also have alsquorsquo help button that allows users to view a quick definition

of the filter or to visit the original source if desired Thesefeatures are displayed in Figure 2 In addition sub-lists aregenerated on entity (targetdiseaseligand) pages that canbe viewed in their respective list browsers It is therefore pos-sible to browse the list of disease or ligands associated witha specific target or targets associated with a disease or lig-and For associated diseases additional numeric filters suchas association score and interaction scores are available aswell as ligand affinity measurements for associated ligandlists

By registering users are able to select targets of interestand save them as custom lists which are added to the mainfilter panel and are always available This allows users to

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1339

Figure 6 Target details view for ACE2 showing the tissue expression section with highlighted tissues (A) and protein to protein interaction section (B)The frequency of updating for resources integrated in TCRD differs Text-mined sources are updated more frequently and changes in the scientific literatureallow us to track certain associations sooner Therefore selecting one of these sources (as displayed) will show that ACE2 is expressed in the lungs

further refine a filtered list of targets as well as view howthis new list may be further filtered or examine the makeupof the list by the various filter categories

Target details pages

In improving the target details pages we relied on user in-terviews that we had conducted to determine the optimallayout and ordering of data on the page In the newest ver-sion of Pharos target detail pages start off with synonymsidentifiers and a broad overview of the knowledge about thetarget as shown in Figure 3 Next the user is able to exam-

ine the TDL criteria in relation to the specific target Drugand ligand data comes next followed by disease associationdata Target expression and interaction data follows withpublication data amino acid sequence information ndash in-cluding the ProtVista (30) graphical representation and fil-ters for related targets completing the page Sections whereno data are available such as drug and ligand browsers forTdark and Tbio targets are automatically removed Thenavigation panel is always visible on the left hand side al-lowing users to easily jump to the section of interest Mostdata panels have been improved by adding tooltips to fur-ther explain properties and a help panel that contains addi-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1340 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 7 Disease details page for Huntingtonrsquos disease showing Disease Ontology description and hierarchy (A) and TIN-X plot showing novelty targetsand their importance mapped with a log scale (B)

tional information such as definitions explanatory articlesor raw data Additional panels have been added to displayadditional data

TDL descriptions One addition to the previous PharosUI is a description panel While the various TDL criteriaare well documented it was not apparent to users how andwhich TDL criteria are met by a given target This sectiondisplays which criteria have been met as well as the levelor score A highlighted checkbox indicates which TDL cri-teria have been met This provides valuable evidence thatillustrates to the user how TCRD generated this rankingFor Tdark and Tbio this is also a useful indicator of whatcriteria are still deficient and how close to the cutoff theyare potentially driving research in those areas This panel isalso featured in Figure 3

IDG resources A feature of the IDG program that has ex-panded since the initial publication is that of new data orreagents which are generated by IDG Consortium mem-bers Where data or reagents are available a browseable sec-tion is displayed in Pharos allowing users to navigate tothe data set and even order them in the case of physical

resources as shown in Figure 4 For targets with mouse ex-pression data we have incorporated our anatamogram im-age (discussed below) using malefemale and brain mouseimages We refer the reader to Supplementary Materials foran in depth discussion of IDG Consortium resources avail-able in Pharos

Improved liganddrug section Drug and ligand browsingis now done via a pageable list Users can now page throughthe entire list of approved drugs or active ligands or openthe entire list in the browse view allowing for filtering ofthese lists while specifying the number of targets each lig-and is active on

Disease associations The disease associations panel hasbeen greatly expanded both in the amount of featured re-sources now incorporating DisGeNet (31) and eRAM (32)as well as in size and placement The source of each dis-ease association is displayed as a collapsible panel that bun-dles multiple sources for the same disease Upon expandingthis panel users are able to examine the evidence used tomake this association This allows users to discover prove-nance an essential attribute of data aggregated in TCRD

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1341

Figure 8 Ligand details page for acetazolamide shows ligand description information as well as synonyms and identifiers (A) Target activity is shownwith expanded activity for CA2 (B)

It is also possible to view the list of associated diseases asa browseable list again allowing for filtering and more re-fined data analysis

PDB visualizations While PDB identifiers have alwaysbeen available in Pharos they were previously displayed asa list of linkouts with no additional information (33) InPharos 30 the PDB section was expanded and redesignedto display three-dimensional (3D) structures for proteinsincluding bound ligands where available This 3D visualiza-tion employing NGL Viewer (3435) is highly interactiveallowing users to drag zoom pan and spin the structure tothoroughly examine it This interface also enables users tobrowse the PDB identifiers and structures associated witha given target Clicking on a table row loads the respectivePDB entry and displays the ligands shown in the structureOther details include the reference source for the id as wellas the method used to generate the structure This feature ishighlighted in Figure 5

Predicted viral interactions Predicted viral-human proteininteractions from the P-HIPSTer atlas are displayed in aseparate panel that shows a list of viruses and their as-sociated taxonomic data specifically for the targeted hu-man protein Each virus also lists the viral proteins thatthe human protein is predicted to interact with in addi-tion to a likelihood ratio which measures the strength ofthe sequence- and structure-based prediction These fea-tures are shown in Figure 5 using ACE2 the angiotensin

converting enzyme 2 as example ACE2 is the functionalentry receptor for both severe acute respiratory syndrome(SARS) coronaviruses SARS-CoV (36) and SARS-CoV-2(37)

Target expression data The target expression data panelhas also been expanded as shown in Figure 6 We replacedthe static human figure with an expanded interactive anato-mogram (18) This visualization contains both male and fe-male figures allowing sex-specific tissue information (egfrom GTEx) to be highlighted Users can toggle the brainimage separately for more fine-grained brain tissue expres-sion visualization All figures are zoomable and pannableto improve visibility When possible all source tissues havebeen mapped to standardized Uberon (38) identifiers en-abling us to merge repetitive tissues Expression data can beviewed by source and tissue values are searchable Click-ing on a tissue panel like the disease associations sectionallows users to view detailed provenance of the data bothshowing the source and relevant confidence values for thetissue expression Gradient coloration allows us to displaythe tissue expression level either a qualitative value (lowmedium high) or a numeric value scale depending onthe data source giving users a quick visualization of thisdata We note that several sources have an overlap in dataand content however they may also present unique dataPharos displays the variety between sources which can helpusers decide what sources to focus on It also illustrates thatthe text-mining-based resources tend to be updated more

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1342 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 9 Target list described in Target List Search use case (A) and target list described in Binding partner search use case (B) Both panels show selectedfilters including the ability to filter target lists by IDG featured sub-lists

often than frequently accessed scientific databases that ag-gregate experimental data When available cell lines and or-thologs are displayed with expandable evidence panels

Proteinndashprotein interactions The introduction of datafrom Bioplex (39) STRING and Reactome (40) has allowedus to display rich PPI data and allows users to examine howmultiple targets may interact as shown in Figure 6 Thispanel consists of a pageable list of targets known to interactwith the current target These targets display the illumina-tion graph which gives users a quick visual representationof the level and types of knowledge available for the tar-get The originating data source is also displayed as well asthe confidence of this interaction This list of PPIs may beviewed either directly in STRING or viewed in the browsepage as a filterable list

Disease details pages

While Pharos remains target-centric it is equally importantto be able to examine diseases in relation to their associ-ated targets As such Pharos has always included diseasesas a browseable list as well as a disease details page Pharos30 expands on the data collected and displayed about dis-eases When available a description from the Disease On-tology is shown The hierarchy of each specific disease isalso shown allowing users to view parent or child diseasesor syndromes as well as the siblings for each disease Inaddition a breakdown of the targets associated with eachdisease is shown which can also be opened in the targetbrowse page allowing users to filter the list of associatedtargets The target-disease Novelty score is shown in theTarget-Importance Novelty eXplorer (TIN-X (41) panelSimilar to the target based TIN-X view this is a scatter-plot of novel targets related to the disease of interest Nov-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1337

Figure 4 IDG generated resources for ACE2 consisting of small molecule reagents (A) and data (B) Targets with mouse cell lines such as GPR68 alsohave information about that resource (C) as well as a mouse tissue expression viewer (D) Other data can include such as the case for CACNA2D4 mousephenotype (E) and cell line (F) data Supplementary Table S2 contains a full breakdown of data types and fields collected

amount of data available for each target changed with eachrelease of TCRD the target development levels (TDLs) forrespective targets were recalculated when the target crite-ria changed This was done using automated scripts Verybriefly the TDL is one of four potential values TclinTchem Tbio or Tdark Tclin are protein drug targets viawhich approved drugs act (24ndash26) which currently includes659 human proteins Tchem are proteins that are not Tclinbut are known to bind small molecules with high potency(currently N = 1607) Tbio includes proteins that have GeneOntology (27) lsquoleafrsquo (lowest level) term annotations basedon experimental evidence or meet two of the followingthree conditions A fractional publication count (28) above5 three or more Gene RIF lsquoReference Into Functionrsquo anno-tations (httpswwwncbinlmnihgovgeneabout-generif)or 50 or more commercial antibodies as counted in the An-tibodypedia portal (29) The fourth category Tdark cur-rently includes sim31 of the human proteins that were man-ually curated at the primary sequence level in UniProt butdo not meet any of the Tclin Tchem or Tbio criteria Fig-ure 1 shows the TDL count changes between version 3and 67

For a further in depth exploration of each additionalTCRD dataset and its database changes we direct thereader to Supplementary Materials

RESULTS

In the first 2 years following the initial release the Pharosteam frequently demoed the site with a focus on obtaininguser feedback which provided the user-centered design in-formation needed as Pharos underwent a ground-up rewritein late 2018 By focusing on user needs such as improv-ing the target details page hierarchy and navigation andadding more detailed explanations of the terms and ideasrepresented within Pharos we were able to streamline targetpresentation and simplify many pages Many tables wereswapped with sortable list elements akin to a shopping siteallowing more data to be shown than a table would allowand also allowing us to display dynamic data as desiredPharos was also redesigned with a focus on mobile usabilityadjusting styles and visualizations depending on the screenused Pharos is also installable as a Progressive Web Appwhich allows users to install Pharos as a mobile applicationon their device increasing the ease of access for Pharos

To increase speed and responsiveness we switched froma fully server-side rendered application to a hybrid clientand server-side application allowing faster page render-ing and data retrieval We replaced the REST API with aGraphQL instance which adds flexibility to data retrievalas well as potentially reducing the amount of data beingsent over the network A discussion of GraphQL bench-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1338 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 5 Target details view for ACE2 showing the improved Protein Data Bank data viewer (A) and predicted viral interactions (B)

marking is summarized in Supplementary Table S1 Doc-umentation for the GraphQL format as well as an interac-tive sandbox with several sample queries can be found athttpspharosnihgovapi

Browse pages

Filter functionality has been expanded and users are ableto examine the entire list of values for each filter as wellas search for text within the filter values In the case of nu-meric values such as novelty and PubMed score a rangeslider allows users to refine results All filters also have alsquorsquo help button that allows users to view a quick definition

of the filter or to visit the original source if desired Thesefeatures are displayed in Figure 2 In addition sub-lists aregenerated on entity (targetdiseaseligand) pages that canbe viewed in their respective list browsers It is therefore pos-sible to browse the list of disease or ligands associated witha specific target or targets associated with a disease or lig-and For associated diseases additional numeric filters suchas association score and interaction scores are available aswell as ligand affinity measurements for associated ligandlists

By registering users are able to select targets of interestand save them as custom lists which are added to the mainfilter panel and are always available This allows users to

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1339

Figure 6 Target details view for ACE2 showing the tissue expression section with highlighted tissues (A) and protein to protein interaction section (B)The frequency of updating for resources integrated in TCRD differs Text-mined sources are updated more frequently and changes in the scientific literatureallow us to track certain associations sooner Therefore selecting one of these sources (as displayed) will show that ACE2 is expressed in the lungs

further refine a filtered list of targets as well as view howthis new list may be further filtered or examine the makeupof the list by the various filter categories

Target details pages

In improving the target details pages we relied on user in-terviews that we had conducted to determine the optimallayout and ordering of data on the page In the newest ver-sion of Pharos target detail pages start off with synonymsidentifiers and a broad overview of the knowledge about thetarget as shown in Figure 3 Next the user is able to exam-

ine the TDL criteria in relation to the specific target Drugand ligand data comes next followed by disease associationdata Target expression and interaction data follows withpublication data amino acid sequence information ndash in-cluding the ProtVista (30) graphical representation and fil-ters for related targets completing the page Sections whereno data are available such as drug and ligand browsers forTdark and Tbio targets are automatically removed Thenavigation panel is always visible on the left hand side al-lowing users to easily jump to the section of interest Mostdata panels have been improved by adding tooltips to fur-ther explain properties and a help panel that contains addi-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1340 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 7 Disease details page for Huntingtonrsquos disease showing Disease Ontology description and hierarchy (A) and TIN-X plot showing novelty targetsand their importance mapped with a log scale (B)

tional information such as definitions explanatory articlesor raw data Additional panels have been added to displayadditional data

TDL descriptions One addition to the previous PharosUI is a description panel While the various TDL criteriaare well documented it was not apparent to users how andwhich TDL criteria are met by a given target This sectiondisplays which criteria have been met as well as the levelor score A highlighted checkbox indicates which TDL cri-teria have been met This provides valuable evidence thatillustrates to the user how TCRD generated this rankingFor Tdark and Tbio this is also a useful indicator of whatcriteria are still deficient and how close to the cutoff theyare potentially driving research in those areas This panel isalso featured in Figure 3

IDG resources A feature of the IDG program that has ex-panded since the initial publication is that of new data orreagents which are generated by IDG Consortium mem-bers Where data or reagents are available a browseable sec-tion is displayed in Pharos allowing users to navigate tothe data set and even order them in the case of physical

resources as shown in Figure 4 For targets with mouse ex-pression data we have incorporated our anatamogram im-age (discussed below) using malefemale and brain mouseimages We refer the reader to Supplementary Materials foran in depth discussion of IDG Consortium resources avail-able in Pharos

Improved liganddrug section Drug and ligand browsingis now done via a pageable list Users can now page throughthe entire list of approved drugs or active ligands or openthe entire list in the browse view allowing for filtering ofthese lists while specifying the number of targets each lig-and is active on

Disease associations The disease associations panel hasbeen greatly expanded both in the amount of featured re-sources now incorporating DisGeNet (31) and eRAM (32)as well as in size and placement The source of each dis-ease association is displayed as a collapsible panel that bun-dles multiple sources for the same disease Upon expandingthis panel users are able to examine the evidence used tomake this association This allows users to discover prove-nance an essential attribute of data aggregated in TCRD

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1341

Figure 8 Ligand details page for acetazolamide shows ligand description information as well as synonyms and identifiers (A) Target activity is shownwith expanded activity for CA2 (B)

It is also possible to view the list of associated diseases asa browseable list again allowing for filtering and more re-fined data analysis

PDB visualizations While PDB identifiers have alwaysbeen available in Pharos they were previously displayed asa list of linkouts with no additional information (33) InPharos 30 the PDB section was expanded and redesignedto display three-dimensional (3D) structures for proteinsincluding bound ligands where available This 3D visualiza-tion employing NGL Viewer (3435) is highly interactiveallowing users to drag zoom pan and spin the structure tothoroughly examine it This interface also enables users tobrowse the PDB identifiers and structures associated witha given target Clicking on a table row loads the respectivePDB entry and displays the ligands shown in the structureOther details include the reference source for the id as wellas the method used to generate the structure This feature ishighlighted in Figure 5

Predicted viral interactions Predicted viral-human proteininteractions from the P-HIPSTer atlas are displayed in aseparate panel that shows a list of viruses and their as-sociated taxonomic data specifically for the targeted hu-man protein Each virus also lists the viral proteins thatthe human protein is predicted to interact with in addi-tion to a likelihood ratio which measures the strength ofthe sequence- and structure-based prediction These fea-tures are shown in Figure 5 using ACE2 the angiotensin

converting enzyme 2 as example ACE2 is the functionalentry receptor for both severe acute respiratory syndrome(SARS) coronaviruses SARS-CoV (36) and SARS-CoV-2(37)

Target expression data The target expression data panelhas also been expanded as shown in Figure 6 We replacedthe static human figure with an expanded interactive anato-mogram (18) This visualization contains both male and fe-male figures allowing sex-specific tissue information (egfrom GTEx) to be highlighted Users can toggle the brainimage separately for more fine-grained brain tissue expres-sion visualization All figures are zoomable and pannableto improve visibility When possible all source tissues havebeen mapped to standardized Uberon (38) identifiers en-abling us to merge repetitive tissues Expression data can beviewed by source and tissue values are searchable Click-ing on a tissue panel like the disease associations sectionallows users to view detailed provenance of the data bothshowing the source and relevant confidence values for thetissue expression Gradient coloration allows us to displaythe tissue expression level either a qualitative value (lowmedium high) or a numeric value scale depending onthe data source giving users a quick visualization of thisdata We note that several sources have an overlap in dataand content however they may also present unique dataPharos displays the variety between sources which can helpusers decide what sources to focus on It also illustrates thatthe text-mining-based resources tend to be updated more

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1342 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 9 Target list described in Target List Search use case (A) and target list described in Binding partner search use case (B) Both panels show selectedfilters including the ability to filter target lists by IDG featured sub-lists

often than frequently accessed scientific databases that ag-gregate experimental data When available cell lines and or-thologs are displayed with expandable evidence panels

Proteinndashprotein interactions The introduction of datafrom Bioplex (39) STRING and Reactome (40) has allowedus to display rich PPI data and allows users to examine howmultiple targets may interact as shown in Figure 6 Thispanel consists of a pageable list of targets known to interactwith the current target These targets display the illumina-tion graph which gives users a quick visual representationof the level and types of knowledge available for the tar-get The originating data source is also displayed as well asthe confidence of this interaction This list of PPIs may beviewed either directly in STRING or viewed in the browsepage as a filterable list

Disease details pages

While Pharos remains target-centric it is equally importantto be able to examine diseases in relation to their associ-ated targets As such Pharos has always included diseasesas a browseable list as well as a disease details page Pharos30 expands on the data collected and displayed about dis-eases When available a description from the Disease On-tology is shown The hierarchy of each specific disease isalso shown allowing users to view parent or child diseasesor syndromes as well as the siblings for each disease Inaddition a breakdown of the targets associated with eachdisease is shown which can also be opened in the targetbrowse page allowing users to filter the list of associatedtargets The target-disease Novelty score is shown in theTarget-Importance Novelty eXplorer (TIN-X (41) panelSimilar to the target based TIN-X view this is a scatter-plot of novel targets related to the disease of interest Nov-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1338 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 5 Target details view for ACE2 showing the improved Protein Data Bank data viewer (A) and predicted viral interactions (B)

marking is summarized in Supplementary Table S1 Doc-umentation for the GraphQL format as well as an interac-tive sandbox with several sample queries can be found athttpspharosnihgovapi

Browse pages

Filter functionality has been expanded and users are ableto examine the entire list of values for each filter as wellas search for text within the filter values In the case of nu-meric values such as novelty and PubMed score a rangeslider allows users to refine results All filters also have alsquorsquo help button that allows users to view a quick definition

of the filter or to visit the original source if desired Thesefeatures are displayed in Figure 2 In addition sub-lists aregenerated on entity (targetdiseaseligand) pages that canbe viewed in their respective list browsers It is therefore pos-sible to browse the list of disease or ligands associated witha specific target or targets associated with a disease or lig-and For associated diseases additional numeric filters suchas association score and interaction scores are available aswell as ligand affinity measurements for associated ligandlists

By registering users are able to select targets of interestand save them as custom lists which are added to the mainfilter panel and are always available This allows users to

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1339

Figure 6 Target details view for ACE2 showing the tissue expression section with highlighted tissues (A) and protein to protein interaction section (B)The frequency of updating for resources integrated in TCRD differs Text-mined sources are updated more frequently and changes in the scientific literatureallow us to track certain associations sooner Therefore selecting one of these sources (as displayed) will show that ACE2 is expressed in the lungs

further refine a filtered list of targets as well as view howthis new list may be further filtered or examine the makeupof the list by the various filter categories

Target details pages

In improving the target details pages we relied on user in-terviews that we had conducted to determine the optimallayout and ordering of data on the page In the newest ver-sion of Pharos target detail pages start off with synonymsidentifiers and a broad overview of the knowledge about thetarget as shown in Figure 3 Next the user is able to exam-

ine the TDL criteria in relation to the specific target Drugand ligand data comes next followed by disease associationdata Target expression and interaction data follows withpublication data amino acid sequence information ndash in-cluding the ProtVista (30) graphical representation and fil-ters for related targets completing the page Sections whereno data are available such as drug and ligand browsers forTdark and Tbio targets are automatically removed Thenavigation panel is always visible on the left hand side al-lowing users to easily jump to the section of interest Mostdata panels have been improved by adding tooltips to fur-ther explain properties and a help panel that contains addi-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1340 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 7 Disease details page for Huntingtonrsquos disease showing Disease Ontology description and hierarchy (A) and TIN-X plot showing novelty targetsand their importance mapped with a log scale (B)

tional information such as definitions explanatory articlesor raw data Additional panels have been added to displayadditional data

TDL descriptions One addition to the previous PharosUI is a description panel While the various TDL criteriaare well documented it was not apparent to users how andwhich TDL criteria are met by a given target This sectiondisplays which criteria have been met as well as the levelor score A highlighted checkbox indicates which TDL cri-teria have been met This provides valuable evidence thatillustrates to the user how TCRD generated this rankingFor Tdark and Tbio this is also a useful indicator of whatcriteria are still deficient and how close to the cutoff theyare potentially driving research in those areas This panel isalso featured in Figure 3

IDG resources A feature of the IDG program that has ex-panded since the initial publication is that of new data orreagents which are generated by IDG Consortium mem-bers Where data or reagents are available a browseable sec-tion is displayed in Pharos allowing users to navigate tothe data set and even order them in the case of physical

resources as shown in Figure 4 For targets with mouse ex-pression data we have incorporated our anatamogram im-age (discussed below) using malefemale and brain mouseimages We refer the reader to Supplementary Materials foran in depth discussion of IDG Consortium resources avail-able in Pharos

Improved liganddrug section Drug and ligand browsingis now done via a pageable list Users can now page throughthe entire list of approved drugs or active ligands or openthe entire list in the browse view allowing for filtering ofthese lists while specifying the number of targets each lig-and is active on

Disease associations The disease associations panel hasbeen greatly expanded both in the amount of featured re-sources now incorporating DisGeNet (31) and eRAM (32)as well as in size and placement The source of each dis-ease association is displayed as a collapsible panel that bun-dles multiple sources for the same disease Upon expandingthis panel users are able to examine the evidence used tomake this association This allows users to discover prove-nance an essential attribute of data aggregated in TCRD

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1341

Figure 8 Ligand details page for acetazolamide shows ligand description information as well as synonyms and identifiers (A) Target activity is shownwith expanded activity for CA2 (B)

It is also possible to view the list of associated diseases asa browseable list again allowing for filtering and more re-fined data analysis

PDB visualizations While PDB identifiers have alwaysbeen available in Pharos they were previously displayed asa list of linkouts with no additional information (33) InPharos 30 the PDB section was expanded and redesignedto display three-dimensional (3D) structures for proteinsincluding bound ligands where available This 3D visualiza-tion employing NGL Viewer (3435) is highly interactiveallowing users to drag zoom pan and spin the structure tothoroughly examine it This interface also enables users tobrowse the PDB identifiers and structures associated witha given target Clicking on a table row loads the respectivePDB entry and displays the ligands shown in the structureOther details include the reference source for the id as wellas the method used to generate the structure This feature ishighlighted in Figure 5

Predicted viral interactions Predicted viral-human proteininteractions from the P-HIPSTer atlas are displayed in aseparate panel that shows a list of viruses and their as-sociated taxonomic data specifically for the targeted hu-man protein Each virus also lists the viral proteins thatthe human protein is predicted to interact with in addi-tion to a likelihood ratio which measures the strength ofthe sequence- and structure-based prediction These fea-tures are shown in Figure 5 using ACE2 the angiotensin

converting enzyme 2 as example ACE2 is the functionalentry receptor for both severe acute respiratory syndrome(SARS) coronaviruses SARS-CoV (36) and SARS-CoV-2(37)

Target expression data The target expression data panelhas also been expanded as shown in Figure 6 We replacedthe static human figure with an expanded interactive anato-mogram (18) This visualization contains both male and fe-male figures allowing sex-specific tissue information (egfrom GTEx) to be highlighted Users can toggle the brainimage separately for more fine-grained brain tissue expres-sion visualization All figures are zoomable and pannableto improve visibility When possible all source tissues havebeen mapped to standardized Uberon (38) identifiers en-abling us to merge repetitive tissues Expression data can beviewed by source and tissue values are searchable Click-ing on a tissue panel like the disease associations sectionallows users to view detailed provenance of the data bothshowing the source and relevant confidence values for thetissue expression Gradient coloration allows us to displaythe tissue expression level either a qualitative value (lowmedium high) or a numeric value scale depending onthe data source giving users a quick visualization of thisdata We note that several sources have an overlap in dataand content however they may also present unique dataPharos displays the variety between sources which can helpusers decide what sources to focus on It also illustrates thatthe text-mining-based resources tend to be updated more

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1342 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 9 Target list described in Target List Search use case (A) and target list described in Binding partner search use case (B) Both panels show selectedfilters including the ability to filter target lists by IDG featured sub-lists

often than frequently accessed scientific databases that ag-gregate experimental data When available cell lines and or-thologs are displayed with expandable evidence panels

Proteinndashprotein interactions The introduction of datafrom Bioplex (39) STRING and Reactome (40) has allowedus to display rich PPI data and allows users to examine howmultiple targets may interact as shown in Figure 6 Thispanel consists of a pageable list of targets known to interactwith the current target These targets display the illumina-tion graph which gives users a quick visual representationof the level and types of knowledge available for the tar-get The originating data source is also displayed as well asthe confidence of this interaction This list of PPIs may beviewed either directly in STRING or viewed in the browsepage as a filterable list

Disease details pages

While Pharos remains target-centric it is equally importantto be able to examine diseases in relation to their associ-ated targets As such Pharos has always included diseasesas a browseable list as well as a disease details page Pharos30 expands on the data collected and displayed about dis-eases When available a description from the Disease On-tology is shown The hierarchy of each specific disease isalso shown allowing users to view parent or child diseasesor syndromes as well as the siblings for each disease Inaddition a breakdown of the targets associated with eachdisease is shown which can also be opened in the targetbrowse page allowing users to filter the list of associatedtargets The target-disease Novelty score is shown in theTarget-Importance Novelty eXplorer (TIN-X (41) panelSimilar to the target based TIN-X view this is a scatter-plot of novel targets related to the disease of interest Nov-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1339

Figure 6 Target details view for ACE2 showing the tissue expression section with highlighted tissues (A) and protein to protein interaction section (B)The frequency of updating for resources integrated in TCRD differs Text-mined sources are updated more frequently and changes in the scientific literatureallow us to track certain associations sooner Therefore selecting one of these sources (as displayed) will show that ACE2 is expressed in the lungs

further refine a filtered list of targets as well as view howthis new list may be further filtered or examine the makeupof the list by the various filter categories

Target details pages

In improving the target details pages we relied on user in-terviews that we had conducted to determine the optimallayout and ordering of data on the page In the newest ver-sion of Pharos target detail pages start off with synonymsidentifiers and a broad overview of the knowledge about thetarget as shown in Figure 3 Next the user is able to exam-

ine the TDL criteria in relation to the specific target Drugand ligand data comes next followed by disease associationdata Target expression and interaction data follows withpublication data amino acid sequence information ndash in-cluding the ProtVista (30) graphical representation and fil-ters for related targets completing the page Sections whereno data are available such as drug and ligand browsers forTdark and Tbio targets are automatically removed Thenavigation panel is always visible on the left hand side al-lowing users to easily jump to the section of interest Mostdata panels have been improved by adding tooltips to fur-ther explain properties and a help panel that contains addi-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1340 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 7 Disease details page for Huntingtonrsquos disease showing Disease Ontology description and hierarchy (A) and TIN-X plot showing novelty targetsand their importance mapped with a log scale (B)

tional information such as definitions explanatory articlesor raw data Additional panels have been added to displayadditional data

TDL descriptions One addition to the previous PharosUI is a description panel While the various TDL criteriaare well documented it was not apparent to users how andwhich TDL criteria are met by a given target This sectiondisplays which criteria have been met as well as the levelor score A highlighted checkbox indicates which TDL cri-teria have been met This provides valuable evidence thatillustrates to the user how TCRD generated this rankingFor Tdark and Tbio this is also a useful indicator of whatcriteria are still deficient and how close to the cutoff theyare potentially driving research in those areas This panel isalso featured in Figure 3

IDG resources A feature of the IDG program that has ex-panded since the initial publication is that of new data orreagents which are generated by IDG Consortium mem-bers Where data or reagents are available a browseable sec-tion is displayed in Pharos allowing users to navigate tothe data set and even order them in the case of physical

resources as shown in Figure 4 For targets with mouse ex-pression data we have incorporated our anatamogram im-age (discussed below) using malefemale and brain mouseimages We refer the reader to Supplementary Materials foran in depth discussion of IDG Consortium resources avail-able in Pharos

Improved liganddrug section Drug and ligand browsingis now done via a pageable list Users can now page throughthe entire list of approved drugs or active ligands or openthe entire list in the browse view allowing for filtering ofthese lists while specifying the number of targets each lig-and is active on

Disease associations The disease associations panel hasbeen greatly expanded both in the amount of featured re-sources now incorporating DisGeNet (31) and eRAM (32)as well as in size and placement The source of each dis-ease association is displayed as a collapsible panel that bun-dles multiple sources for the same disease Upon expandingthis panel users are able to examine the evidence used tomake this association This allows users to discover prove-nance an essential attribute of data aggregated in TCRD

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1341

Figure 8 Ligand details page for acetazolamide shows ligand description information as well as synonyms and identifiers (A) Target activity is shownwith expanded activity for CA2 (B)

It is also possible to view the list of associated diseases asa browseable list again allowing for filtering and more re-fined data analysis

PDB visualizations While PDB identifiers have alwaysbeen available in Pharos they were previously displayed asa list of linkouts with no additional information (33) InPharos 30 the PDB section was expanded and redesignedto display three-dimensional (3D) structures for proteinsincluding bound ligands where available This 3D visualiza-tion employing NGL Viewer (3435) is highly interactiveallowing users to drag zoom pan and spin the structure tothoroughly examine it This interface also enables users tobrowse the PDB identifiers and structures associated witha given target Clicking on a table row loads the respectivePDB entry and displays the ligands shown in the structureOther details include the reference source for the id as wellas the method used to generate the structure This feature ishighlighted in Figure 5

Predicted viral interactions Predicted viral-human proteininteractions from the P-HIPSTer atlas are displayed in aseparate panel that shows a list of viruses and their as-sociated taxonomic data specifically for the targeted hu-man protein Each virus also lists the viral proteins thatthe human protein is predicted to interact with in addi-tion to a likelihood ratio which measures the strength ofthe sequence- and structure-based prediction These fea-tures are shown in Figure 5 using ACE2 the angiotensin

converting enzyme 2 as example ACE2 is the functionalentry receptor for both severe acute respiratory syndrome(SARS) coronaviruses SARS-CoV (36) and SARS-CoV-2(37)

Target expression data The target expression data panelhas also been expanded as shown in Figure 6 We replacedthe static human figure with an expanded interactive anato-mogram (18) This visualization contains both male and fe-male figures allowing sex-specific tissue information (egfrom GTEx) to be highlighted Users can toggle the brainimage separately for more fine-grained brain tissue expres-sion visualization All figures are zoomable and pannableto improve visibility When possible all source tissues havebeen mapped to standardized Uberon (38) identifiers en-abling us to merge repetitive tissues Expression data can beviewed by source and tissue values are searchable Click-ing on a tissue panel like the disease associations sectionallows users to view detailed provenance of the data bothshowing the source and relevant confidence values for thetissue expression Gradient coloration allows us to displaythe tissue expression level either a qualitative value (lowmedium high) or a numeric value scale depending onthe data source giving users a quick visualization of thisdata We note that several sources have an overlap in dataand content however they may also present unique dataPharos displays the variety between sources which can helpusers decide what sources to focus on It also illustrates thatthe text-mining-based resources tend to be updated more

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1342 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 9 Target list described in Target List Search use case (A) and target list described in Binding partner search use case (B) Both panels show selectedfilters including the ability to filter target lists by IDG featured sub-lists

often than frequently accessed scientific databases that ag-gregate experimental data When available cell lines and or-thologs are displayed with expandable evidence panels

Proteinndashprotein interactions The introduction of datafrom Bioplex (39) STRING and Reactome (40) has allowedus to display rich PPI data and allows users to examine howmultiple targets may interact as shown in Figure 6 Thispanel consists of a pageable list of targets known to interactwith the current target These targets display the illumina-tion graph which gives users a quick visual representationof the level and types of knowledge available for the tar-get The originating data source is also displayed as well asthe confidence of this interaction This list of PPIs may beviewed either directly in STRING or viewed in the browsepage as a filterable list

Disease details pages

While Pharos remains target-centric it is equally importantto be able to examine diseases in relation to their associ-ated targets As such Pharos has always included diseasesas a browseable list as well as a disease details page Pharos30 expands on the data collected and displayed about dis-eases When available a description from the Disease On-tology is shown The hierarchy of each specific disease isalso shown allowing users to view parent or child diseasesor syndromes as well as the siblings for each disease Inaddition a breakdown of the targets associated with eachdisease is shown which can also be opened in the targetbrowse page allowing users to filter the list of associatedtargets The target-disease Novelty score is shown in theTarget-Importance Novelty eXplorer (TIN-X (41) panelSimilar to the target based TIN-X view this is a scatter-plot of novel targets related to the disease of interest Nov-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1340 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 7 Disease details page for Huntingtonrsquos disease showing Disease Ontology description and hierarchy (A) and TIN-X plot showing novelty targetsand their importance mapped with a log scale (B)

tional information such as definitions explanatory articlesor raw data Additional panels have been added to displayadditional data

TDL descriptions One addition to the previous PharosUI is a description panel While the various TDL criteriaare well documented it was not apparent to users how andwhich TDL criteria are met by a given target This sectiondisplays which criteria have been met as well as the levelor score A highlighted checkbox indicates which TDL cri-teria have been met This provides valuable evidence thatillustrates to the user how TCRD generated this rankingFor Tdark and Tbio this is also a useful indicator of whatcriteria are still deficient and how close to the cutoff theyare potentially driving research in those areas This panel isalso featured in Figure 3

IDG resources A feature of the IDG program that has ex-panded since the initial publication is that of new data orreagents which are generated by IDG Consortium mem-bers Where data or reagents are available a browseable sec-tion is displayed in Pharos allowing users to navigate tothe data set and even order them in the case of physical

resources as shown in Figure 4 For targets with mouse ex-pression data we have incorporated our anatamogram im-age (discussed below) using malefemale and brain mouseimages We refer the reader to Supplementary Materials foran in depth discussion of IDG Consortium resources avail-able in Pharos

Improved liganddrug section Drug and ligand browsingis now done via a pageable list Users can now page throughthe entire list of approved drugs or active ligands or openthe entire list in the browse view allowing for filtering ofthese lists while specifying the number of targets each lig-and is active on

Disease associations The disease associations panel hasbeen greatly expanded both in the amount of featured re-sources now incorporating DisGeNet (31) and eRAM (32)as well as in size and placement The source of each dis-ease association is displayed as a collapsible panel that bun-dles multiple sources for the same disease Upon expandingthis panel users are able to examine the evidence used tomake this association This allows users to discover prove-nance an essential attribute of data aggregated in TCRD

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1341

Figure 8 Ligand details page for acetazolamide shows ligand description information as well as synonyms and identifiers (A) Target activity is shownwith expanded activity for CA2 (B)

It is also possible to view the list of associated diseases asa browseable list again allowing for filtering and more re-fined data analysis

PDB visualizations While PDB identifiers have alwaysbeen available in Pharos they were previously displayed asa list of linkouts with no additional information (33) InPharos 30 the PDB section was expanded and redesignedto display three-dimensional (3D) structures for proteinsincluding bound ligands where available This 3D visualiza-tion employing NGL Viewer (3435) is highly interactiveallowing users to drag zoom pan and spin the structure tothoroughly examine it This interface also enables users tobrowse the PDB identifiers and structures associated witha given target Clicking on a table row loads the respectivePDB entry and displays the ligands shown in the structureOther details include the reference source for the id as wellas the method used to generate the structure This feature ishighlighted in Figure 5

Predicted viral interactions Predicted viral-human proteininteractions from the P-HIPSTer atlas are displayed in aseparate panel that shows a list of viruses and their as-sociated taxonomic data specifically for the targeted hu-man protein Each virus also lists the viral proteins thatthe human protein is predicted to interact with in addi-tion to a likelihood ratio which measures the strength ofthe sequence- and structure-based prediction These fea-tures are shown in Figure 5 using ACE2 the angiotensin

converting enzyme 2 as example ACE2 is the functionalentry receptor for both severe acute respiratory syndrome(SARS) coronaviruses SARS-CoV (36) and SARS-CoV-2(37)

Target expression data The target expression data panelhas also been expanded as shown in Figure 6 We replacedthe static human figure with an expanded interactive anato-mogram (18) This visualization contains both male and fe-male figures allowing sex-specific tissue information (egfrom GTEx) to be highlighted Users can toggle the brainimage separately for more fine-grained brain tissue expres-sion visualization All figures are zoomable and pannableto improve visibility When possible all source tissues havebeen mapped to standardized Uberon (38) identifiers en-abling us to merge repetitive tissues Expression data can beviewed by source and tissue values are searchable Click-ing on a tissue panel like the disease associations sectionallows users to view detailed provenance of the data bothshowing the source and relevant confidence values for thetissue expression Gradient coloration allows us to displaythe tissue expression level either a qualitative value (lowmedium high) or a numeric value scale depending onthe data source giving users a quick visualization of thisdata We note that several sources have an overlap in dataand content however they may also present unique dataPharos displays the variety between sources which can helpusers decide what sources to focus on It also illustrates thatthe text-mining-based resources tend to be updated more

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1342 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 9 Target list described in Target List Search use case (A) and target list described in Binding partner search use case (B) Both panels show selectedfilters including the ability to filter target lists by IDG featured sub-lists

often than frequently accessed scientific databases that ag-gregate experimental data When available cell lines and or-thologs are displayed with expandable evidence panels

Proteinndashprotein interactions The introduction of datafrom Bioplex (39) STRING and Reactome (40) has allowedus to display rich PPI data and allows users to examine howmultiple targets may interact as shown in Figure 6 Thispanel consists of a pageable list of targets known to interactwith the current target These targets display the illumina-tion graph which gives users a quick visual representationof the level and types of knowledge available for the tar-get The originating data source is also displayed as well asthe confidence of this interaction This list of PPIs may beviewed either directly in STRING or viewed in the browsepage as a filterable list

Disease details pages

While Pharos remains target-centric it is equally importantto be able to examine diseases in relation to their associ-ated targets As such Pharos has always included diseasesas a browseable list as well as a disease details page Pharos30 expands on the data collected and displayed about dis-eases When available a description from the Disease On-tology is shown The hierarchy of each specific disease isalso shown allowing users to view parent or child diseasesor syndromes as well as the siblings for each disease Inaddition a breakdown of the targets associated with eachdisease is shown which can also be opened in the targetbrowse page allowing users to filter the list of associatedtargets The target-disease Novelty score is shown in theTarget-Importance Novelty eXplorer (TIN-X (41) panelSimilar to the target based TIN-X view this is a scatter-plot of novel targets related to the disease of interest Nov-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1341

Figure 8 Ligand details page for acetazolamide shows ligand description information as well as synonyms and identifiers (A) Target activity is shownwith expanded activity for CA2 (B)

It is also possible to view the list of associated diseases asa browseable list again allowing for filtering and more re-fined data analysis

PDB visualizations While PDB identifiers have alwaysbeen available in Pharos they were previously displayed asa list of linkouts with no additional information (33) InPharos 30 the PDB section was expanded and redesignedto display three-dimensional (3D) structures for proteinsincluding bound ligands where available This 3D visualiza-tion employing NGL Viewer (3435) is highly interactiveallowing users to drag zoom pan and spin the structure tothoroughly examine it This interface also enables users tobrowse the PDB identifiers and structures associated witha given target Clicking on a table row loads the respectivePDB entry and displays the ligands shown in the structureOther details include the reference source for the id as wellas the method used to generate the structure This feature ishighlighted in Figure 5

Predicted viral interactions Predicted viral-human proteininteractions from the P-HIPSTer atlas are displayed in aseparate panel that shows a list of viruses and their as-sociated taxonomic data specifically for the targeted hu-man protein Each virus also lists the viral proteins thatthe human protein is predicted to interact with in addi-tion to a likelihood ratio which measures the strength ofthe sequence- and structure-based prediction These fea-tures are shown in Figure 5 using ACE2 the angiotensin

converting enzyme 2 as example ACE2 is the functionalentry receptor for both severe acute respiratory syndrome(SARS) coronaviruses SARS-CoV (36) and SARS-CoV-2(37)

Target expression data The target expression data panelhas also been expanded as shown in Figure 6 We replacedthe static human figure with an expanded interactive anato-mogram (18) This visualization contains both male and fe-male figures allowing sex-specific tissue information (egfrom GTEx) to be highlighted Users can toggle the brainimage separately for more fine-grained brain tissue expres-sion visualization All figures are zoomable and pannableto improve visibility When possible all source tissues havebeen mapped to standardized Uberon (38) identifiers en-abling us to merge repetitive tissues Expression data can beviewed by source and tissue values are searchable Click-ing on a tissue panel like the disease associations sectionallows users to view detailed provenance of the data bothshowing the source and relevant confidence values for thetissue expression Gradient coloration allows us to displaythe tissue expression level either a qualitative value (lowmedium high) or a numeric value scale depending onthe data source giving users a quick visualization of thisdata We note that several sources have an overlap in dataand content however they may also present unique dataPharos displays the variety between sources which can helpusers decide what sources to focus on It also illustrates thatthe text-mining-based resources tend to be updated more

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1342 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 9 Target list described in Target List Search use case (A) and target list described in Binding partner search use case (B) Both panels show selectedfilters including the ability to filter target lists by IDG featured sub-lists

often than frequently accessed scientific databases that ag-gregate experimental data When available cell lines and or-thologs are displayed with expandable evidence panels

Proteinndashprotein interactions The introduction of datafrom Bioplex (39) STRING and Reactome (40) has allowedus to display rich PPI data and allows users to examine howmultiple targets may interact as shown in Figure 6 Thispanel consists of a pageable list of targets known to interactwith the current target These targets display the illumina-tion graph which gives users a quick visual representationof the level and types of knowledge available for the tar-get The originating data source is also displayed as well asthe confidence of this interaction This list of PPIs may beviewed either directly in STRING or viewed in the browsepage as a filterable list

Disease details pages

While Pharos remains target-centric it is equally importantto be able to examine diseases in relation to their associ-ated targets As such Pharos has always included diseasesas a browseable list as well as a disease details page Pharos30 expands on the data collected and displayed about dis-eases When available a description from the Disease On-tology is shown The hierarchy of each specific disease isalso shown allowing users to view parent or child diseasesor syndromes as well as the siblings for each disease Inaddition a breakdown of the targets associated with eachdisease is shown which can also be opened in the targetbrowse page allowing users to filter the list of associatedtargets The target-disease Novelty score is shown in theTarget-Importance Novelty eXplorer (TIN-X (41) panelSimilar to the target based TIN-X view this is a scatter-plot of novel targets related to the disease of interest Nov-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1342 Nucleic Acids Research 2021 Vol 49 Database issue

Figure 9 Target list described in Target List Search use case (A) and target list described in Binding partner search use case (B) Both panels show selectedfilters including the ability to filter target lists by IDG featured sub-lists

often than frequently accessed scientific databases that ag-gregate experimental data When available cell lines and or-thologs are displayed with expandable evidence panels

Proteinndashprotein interactions The introduction of datafrom Bioplex (39) STRING and Reactome (40) has allowedus to display rich PPI data and allows users to examine howmultiple targets may interact as shown in Figure 6 Thispanel consists of a pageable list of targets known to interactwith the current target These targets display the illumina-tion graph which gives users a quick visual representationof the level and types of knowledge available for the tar-get The originating data source is also displayed as well asthe confidence of this interaction This list of PPIs may beviewed either directly in STRING or viewed in the browsepage as a filterable list

Disease details pages

While Pharos remains target-centric it is equally importantto be able to examine diseases in relation to their associ-ated targets As such Pharos has always included diseasesas a browseable list as well as a disease details page Pharos30 expands on the data collected and displayed about dis-eases When available a description from the Disease On-tology is shown The hierarchy of each specific disease isalso shown allowing users to view parent or child diseasesor syndromes as well as the siblings for each disease Inaddition a breakdown of the targets associated with eachdisease is shown which can also be opened in the targetbrowse page allowing users to filter the list of associatedtargets The target-disease Novelty score is shown in theTarget-Importance Novelty eXplorer (TIN-X (41) panelSimilar to the target based TIN-X view this is a scatter-plot of novel targets related to the disease of interest Nov-

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1343

Figure 10 List of targets associated with asthma filtered by Data Source Expression Atlas and filtered by Expression Atlas log2foldchange value (A)Also shown is the Disease Association Details column with additional association-specific details (B)

elty estimates the scarcity of publications about a targetwhereas Importance estimates the strength of the associa-tion between that target and a specific disease (42) The X-axis shows the Novelty of the target-disease relationshipwhile the Y-axis shows the Importance of that target to thedisease both on a log10 scale This feature is displayed inFigure 7 When available help definition buttons are alsoshown for each section of data

Ligand details pages

The ligand details page has been expanded to include a de-scription of the ligand where available A 2D depiction ofthe ligand chemical structure is shown together with syn-onyms and identifiers from PubChem (43) Guide to Phar-macology (44) ChEMBL and DrugCentral Activity valuesare displayed in collapsible panels and sorted by target Byclicking on a target header users are able to examine ac-tivity values for that ligand on the selected target Activitytype value and mechanism of action are shown in additionto reference publication for each activity allowing users todirectly view the source Figure 8 illustrates this page dis-play

Use cases

Target list search A neuroscientist who studies ion chan-nels is seeking understudied proteins for a student projectFiltering to targets listed by the IDG Consortium as targetsof interest (httpsdruggablegenomenetIDGProteinList)in the ion channel family with a nervous system phenotypeas described by the Mouse Phenotype Ontology from theJackson Laboratory yields 16 potential targets of interestThe refined list view is shown in Figure 9

Binding partner search A scientist is studying the functionof CAMKK2 and hypothesizing on its role in behaviorFrom the CAMKK2 target details page she clicks lsquoExploreInteracting Targetsrsquo to open the potential binding partnersin the Target List Selecting Tclin she filters the list to 10 po-tential binding partners that have approved drugs with newtherapeutic effects and drug-related side effects that can as-sist in her research She is also able to save this list of targetsas a custom list allowing her to easily revisit and analyzethese targets This workflow is also shown in Figure 9

Disease-based search A researcher studying asthmasearches for this top level disease term and enters theasthma disease details page He wants to find proteins thatare differentially expressed in asthma patients Startingfrom the Asthma disease details page he clicks lsquoExploreAssociated Targetsrsquo to see 470 potential targets Filteringthose with data from the Expression Atlas (18) usingthe lsquoExpression Atlasrsquo value from the lsquoDisease DataSourcersquo filter yields 78 targets which can then be sorted bylog2foldChange using the drop-down sort feature as shownabove the target list in Figure 10 He finds several withhigh log2foldChange meaning the expression level changesdrastically between disease and non-disease conditions andlow p-value meaning the results are statistically significantAlso of note for lists of associated targets an additionalcolumn has been added to the target card showing diseaseassociation details such as the evidence associated with agiven data source

Finding experimental tools A researcher has found that aVoltage-gated potassium channel KCNS2 is expressed inthe nucleus they are working on Itrsquos role is unknown Thetarget details page for KCNS2 lists three approved drugs

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1344 Nucleic Acids Research 2021 Vol 49 Database issue

(eg amifampridine) which are known to block this channelThe IDG Resources show a cell line and a mouse geneticconstruct which can be ordered from the IDG Consortium(UCSF in this case) The researcher is now able to design aset of in vitro and in vivo experiments to help elucidate therole of the channel

Machine learning based on PPI networks to prioritize targetsfor a disease A novel target ranking approach could be de-veloped that relies on deep network representation learningof a PPI network annotated with disease-specific knowledgeattributes derived from TCRDPharos This involves map-ping the enriched PPI network into a feature space using theneural network framework Gat2Vec (45) Gat2Vec employsa shallow neural network model to facilitate joint learningon the structural and attribute contexts of a given networkIn this case the PPIs provide the structural context and thedisease-related knowledge provides the attribute contextsThe feature space generated can be used to develop ma-chine learning models that can predict the ranking of pro-tein targets in the context of a disease Additional proto-cols for extracting available data for specific targets of in-terest as well the differences in knowledge availability be-tween understudied and more studied targets are discussedelsewhere (46)

DISCUSSION

With the newly added data interactive visualizations andenhanced search capabilities TCRD and Pharos in theircurrent forms serve as IDG resources that facilitate bet-ter exploration of the dark and understudied regions of thehuman genome The central idea of the resource contin-ues to be enriching knowledge around human targets andmonitoring their therapeutic development levels One areathat received significant attention in the current update isthe search mechanism which was upgraded with the imple-mentation of GraphQL API To improve user interactionwith the new API example queries have been made avail-able on the website The TDL descriptions panel and thetarget expression anatomograms are the significantly im-proved components of the Pharos web interface Furthermost data that have been recently integrated into TCRD fa-cilitate application of artificial intelligencemachine learn-ing (AIML) techniques for target evaluation and drugrepositioning (47) and are accessible in ML ready formatMost of the above described developments are based on thefeedback obtained directly from the website visitors severaldemo sessions and presentations at scientific meetings

Ongoing work focuses on utilizing the AIML-ready datasuch as target-disease target-phenotype and PPI data to de-velop AIML models for prioritization of dark targets andbetter understanding of disease biology Currently we areevaluating ways to aggregate experimental data uncertain-ties specifically data quality and reliability (48) similar tothe in silico toxicology protocols (49) We continue to extendour efforts in enhancing the disease page to list associatedtargets rank them by the consensus strength of associationto a disease and further facilitate filtering to narrow downto targets of potential interest to researchers The databasecan be expected to be further enriched with new data and

data types that contribute to the ultimate goal of illuminat-ing the dark targets

DATA AVAILABILITY

TCRD is an open source database that can be accessed athttpjuniperhealthunmedutcrdPharos is an open source web platform that can be ac-

cessed athttpspharosnihgovThe Pharos resources have been split into frontend and

backend repositoriesThe front end code can be found on Githubhttpsgithubcomncatspharos frontendThe backend GraphQL implementation code can be

found on Githubhttpsgithubcomncatspharos-graphql-serverGraphQL resource documentation can be found on

Pharoshttpspharosnihgovapi

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online

ACKNOWLEDGEMENTS

Rajarshi Guha for his immeasurable time spent not onlyworking on and promoting Pharos 10 but also for his men-torship as we began the process of revamping Pharos OlegUrsu for developing the first machine learning ready ver-sion of TCRD Gorka Lasso for supplying the P-HIPSTervirusndashhuman protein interactions dataset

FUNDING

National Institutes of Health (NIH) Common Fund[CA224370 to SLM CGB LLJ AW GB JJYTIO U24TR002278 to DV AK JH AW SCSand TIO] Novo Nordisk Foundation [NNF14CC0001to LJJ] Intramural Research Program Division of Pre-clinical Innovation NIH NCATS (to DTN KK TSNS PD EW AS) Funding for open access charge NI-HCA224370Conflict of interest statement LJJ is co-founder and scien-tific advisory board member of Intomics AS TIO hasreceived honoraria or consulted for Abbott AstraZenecaChiron Genentech Infinity Pharmaceuticals Merz Phar-maceuticals Merck Darmstadt Mitsubishi Tanabe Novar-tis Ono Pharmaceuticals Pfizer Roche Sanofi and WyethHe is on the scientific advisory board of ChemDiv Inc andInSilico Medicine

REFERENCES1 EdwardsAM IsserlinR BaderGD FryeSV WillsonTM and

YuFH (2011) Too many roads not taken Nature 470 163ndash1652 NguyenD-T MathiasS BologaC BrunakS FernandezN

GaultonA HerseyA HolmesJ JensenLJ KarlssonA et al(2017) Pharos Collating protein information to shed light on thedruggable genome Nucleic Acids Res 45 D995ndashD1002

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

Nucleic Acids Research 2021 Vol 49 Database issue D1345

3 OpreaTI BologaCG BrunakS CampbellA GanGNGaultonA GomezSM GuhaR HerseyA HolmesJ et al(2018) Unexplored therapeutic opportunities in the human genomeNat Rev Drug Discov 17 317ndash332

4 GalperinMY Fernandez-SuarezXM and RigdenDJ (2017) The24th annual Nucleic Acids Research database issue a look back andupcoming changes Nucleic Acids Res 45 D1ndashD11

5 ConsortiumUP (2019) UniProt a worldwide hub of proteinknowledge Nucleic Acids Res 47 D506ndashD515

6 DickinsonME FlennikenAM JiX TeboulL WongMDWhiteJK MeehanTF WeningerWJ WesterbergH AdissuHet al (2016) High-throughput discovery of novel developmentalphenotypes Nature 537 508ndash514

7 SmithJR HaymanGT WangS-J LaulederkindSJFHoffmanMJ KaldunskiML TutajM ThotaJ NalaboluHSEllankiSLR et al (2020) The Year of the Rat The Rat GenomeDatabase at 20 a multi-species knowledgebase and analysis platformNucleic Acids Res 48 D731ndashD742

8 SchrimlLM ArzeC NadendlaS ChangY-WW MazaitisMFelixV FengG and KibbeWA (2012) Disease Ontology abackbone for disease semantic integration Nucleic Acids Res 40D940ndashD946

9 HaymanGT LaulederkindSJF SmithJR WangS-J PetriVNigamR TutajM De PonsJ DwinellMR and ShimoyamaM(2016) The Disease Portals disease-gene annotation and the RGDdisease ontology at the Rat Genome Database Database 2016baw034

10 SmithCL and EppigJT (2009) The mammalian phenotypeontology enabling robust annotation and comparative analysisWiley Interdiscip Rev Syst Biol Med 1 390ndash399

11 BunielloA MacArthurJAL CerezoM HarrisLW HayhurstJMalangoneC McMahonA MoralesJ MountjoyE SollisEet al (2019) The NHGRI-EBI GWAS Catalog of publishedgenome-wide association studies targeted arrays and summarystatistics 2019 Nucleic Acids Res 47 D1005ndashD1012

12 Consortium GTEx (2020) The GTEx Consortium atlas of geneticregulatory effects across human tissues Science 369 1318ndash1330

13 ThulPJ and LindskogC (2018) The human protein atlas a spatialmap of the human proteome Protein Sci 27 233ndash244

14 PalascaO SantosA StolteC GorodkinJ and JensenLJ (2018)TISSUES 20 an integrative web resource on mammalian tissueexpression Database 2018 bay003

15 KimM-S PintoSM GetnetD NirujogiRS MandaSSChaerkadyR MadugunduAK KelkarDS IsserlinR JainSet al (2014) A draft map of the human proteome Nature 509575ndash581

16 BarretinaJ CaponigroG StranskyN VenkatesanKMargolinAA KimS WilsonCJ LeharJ KryukovGVSonkinD et al (2012) The Cancer Cell Line Encyclopedia enablespredictive modelling of anticancer drug sensitivity Nature 483603ndash607

17 StathiasV TurnerJ KoletiA VidovicD CooperDFazel-NajafabadiM PilarczykM TerrynR ChungCUmeanoA et al (2020) LINCS Data Portal 20 next generationaccess point for perturbation-response signatures Nucleic Acids Res48 D431ndashD439

18 PapatheodorouI MorenoP ManningJ FuentesAM-PGeorgeN FexovaS FonsecaNA FullgrabeA GreenMHuangN et al (2020) Expression Atlas update from tissues to singlecells Nucleic Acids Res 48 D77ndashD83

19 SzklarczykD GableAL LyonD JungeA WyderSHuerta-CepasJ SimonovicM DonchevaNT MorrisJH BorkPet al (2019) STRING v11 protein-protein association networks withincreased coverage supporting functional discovery in genome-wideexperimental datasets Nucleic Acids Res 47 D607ndashD613

20 LassoG MayerSV WinkelmannER ChuT ElliotOPatino-GalindoJA ParkK RabadanR HonigB andShapiraSD (2019) A structure-informed atlas of human-virusinteractions Cell 178 1526ndash1541

21 GaultonA HerseyA NowotkaM BentoAP ChambersJMendezD MutowoP AtkinsonF BellisLJ Cibrian-UhalteEet al (2017) The ChEMBL database in 2017 Nucleic Acids Res 45D945ndashD954

22 UrsuO HolmesJ BologaCG YangJJ MathiasSL StathiasVNguyenD-T SchurerS and OpreaT (2019) DrugCentral 2018 anupdate Nucleic Acids Res 47 D963ndashD970

23 Pletscher-FrankildS PallejaA TsafouK BinderJX andJensenLJ (2015) DISEASES text mining and data integration ofdisease-gene associations Methods 74 83ndash89

24 SantosR UrsuO GaultonA BentoAP DonadiRSBologaCG KarlssonA Al-LazikaniB HerseyA OpreaTIet al (2017) A comprehensive map of molecular drug targets NatRev Drug Discov 16 19ndash34

25 UrsuO GlickM and OpreaT (2019) Novel drug targets in 2018Nat Rev Drug Discov 18 328

26 AvramS HalipL CurpanR and OpreaTI (2020) Novel drugtargets in 2019 Nat Rev Drug Discov 19 300

27 AshburnerM BallCA BlakeJA BotsteinD ButlerHCherryJM DavisAP DolinskiK DwightSS EppigJT et al(2000) Gene ontology tool for the unification of biology The GeneOntology Consortium Nat Genet 25 25ndash29

28 PafilisE FrankildSP FaniniL FaulwetterS PavloudiCVasileiadouA ArvanitidisC and JensenLJ (2013) The SPECIESand ORGANISMS resources for fast and accurate identification oftaxonomic names in text PLoS One 8 e65390

29 BjorlingE and UhlenM (2008) Antibodypedia a portal for sharingantibody and antigen validation data Mol Cell Proteomics 72028ndash2037

30 WatkinsX GarciaLJ PundirS MartinMJ and ConsortiumUP(2017) ProtVista visualization of protein sequence annotationsBioinformatics 33 2040ndash2041

31 PineroJ Ramırez-AnguitaJM Sauch-PitarchJ RonzanoFCentenoE SanzF and FurlongLI (2020) The DisGeNETknowledge platform for disease genomics 2019 update Nucleic AcidsRes 48 D845ndashD855

32 JiaJ AnZ MingY GuoY LiW LiangY GuoD LiX TaiJChenG et al (2018) eRAM encyclopedia of rare disease annotationsfor precision medicine Nucleic Acids Res 46 D937ndashD943

33 BermanHM WestbrookJ FengZ GillilandG BhatTNWeissigH ShindyalovIN and BournePE (2000) The Protein DataBank Nucleic Acids Res 28 235ndash242

34 RoseAS BradleyAR ValasatavaY DuarteJM PrlicA andRosePW (2018) NGL viewer web-based molecular graphics forlarge complexes Bioinformatics 34 3755ndash3758

35 RoseAS and HildebrandPW (2015) NGL Viewer a webapplication for molecular visualization Nucleic Acids Res 43W576ndashW579

36 LiW MooreMJ VasilievaN SuiJ WongSK BerneMASomasundaranM SullivanJL LuzuriagaK GreenoughTCet al (2003) Angiotensin-converting enzyme 2 is a functional receptorfor the SARS coronavirus Nature 426 450ndash454

37 HoffmannM Kleine-WeberH SchroederS KrugerN HerrlerTErichsenS SchiergensTS HerrlerG WuN-H NitscheA et al(2020) SARS-CoV-2 Cell Entry Depends on ACE2 and TMPRSS2and Is Blocked by a Clinically Proven Protease Inhibitor Cell 181271ndash280

38 MungallCJ TorniaiC GkoutosGV LewisSE andHaendelMA (2012) Uberon an integrative multi-species anatomyontology Genome Biol 13 R5

39 HuttlinEL BrucknerRJ Navarrete-PereaJ CannonJRBaltierK GebreabF GygiMP ThornockA ZarragaG TamSet al (2020) Dual proteome-scale networks reveal cell-specificremodeling of the human interactome bioRxiv doihttpsdoiorg10110120200119905109 19 January 2020 preprintnot peer reviewed

40 JassalB MatthewsL ViteriG GongC LorenteP FabregatASidiropoulosK CookJ GillespieM HawR et al (2020) Thereactome pathway knowledgebase Nucleic Acids Res 48D498ndashD503

41 CannonDC YangJJ MathiasSL UrsuO ManiS WallerASchurerSC JensenLJ SklarLA BologaCG et al (2017)TIN-X target importance and novelty explorer Bioinformatics 332601ndash2603

42 OpreaTI (2019) Exploring the dark genome implications forprecision medicine Mamm Genome 30 192ndash200

43 KimS ChenJ ChengT GindulyteA HeJ HeS LiQShoemakerBA ThiessenPA YuB et al (2019) PubChem 2019

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022

D1346 Nucleic Acids Research 2021 Vol 49 Database issue

update improved access to chemical data Nucleic Acids Res 47D1102ndashD1109

44 ArmstrongJF FaccendaE HardingSD PawsonAJSouthanC SharmanJL CampoB CavanaghDRAlexanderSPH DavenportAP et al (2020) The IUPHARBPSguide to Pharmacology in 2020 extending immunopharmacologycontent and introducing the IUPHARMMV guide to MalariaPharmacology Nucleic Acids Res 48 D1006ndashD1021

45 SheikhN KefatoZ and MontresorA (2019) gat2vecrepresentation learning for attributed graphs Computing 101187ndash209

46 SheilsT MathiasSL SiramshettyVB BocciG BologaCGYangJJ WallerA SouthallN NguyenD-T and OpreaTI (2020)

How to illuminate the druggable genome using pharos Curr ProtocBioinformatics 69 e92

47 LevinJM OpreaTI DavidovichS ClozelT OveringtonJPVanhaelenQ CantorCR BischofE and ZhavoronkovA (2020)Artificial intelligence drug repurposing and peer review NatBiotechnol 38 1127ndash1131

48 KlimischHJ AndreaeM and TillmannU (1997) A systematicapproach for evaluating the quality of experimental toxicological andecotoxicological data Regul Toxicol Pharmacol 25 1ndash5

49 MyattGJ AhlbergE AkahoriY AllenD AmbergAAngerLT AptulaA AuerbachS BeilkeL BellionP et al (2018)In silico toxicology protocols Regul Toxicol Pharmacol 96 1ndash17

Dow

nloaded from httpsacadem

icoupcomnararticle49D

1D13345958490 by guest on 20 July 2022