A new era in chemical information: PubChem, DiscoveryGate, and Chemistry Central

6
16 www,oHfcemag.net

Transcript of A new era in chemical information: PubChem, DiscoveryGate, and Chemistry Central

16 www,oHfcemag.net

A New Era i n "Chemical Information:PubChem, DiscoveryGate, and Chemistry Central

By Svetla Baykoucheva

Selecting relevant information and suppressing details is the sortof pragmatic fudging everyone does every day. It's a way of copingvv/t/i too much information. For almost everything you see, hear, taste,smell, or touch, you have the choice between examining detailsby scrutinizing very closely and looking at the "big picture" with itsother priorities.

-Lisa Randall (2006). Warped Passages: Unravelingthe Mysteries of the Universe's Hidden Dimensions, p. 29

r i U W we go ahout finding the infor-mation we need, whether it's chemicalor not, depends on how detailed wewant this information to be and how wewant it organized. Researchers andinformation specialists are accustomedto searching bibliographic databasesto find references to articles publishedin scientific journals. In some cases,though, having the information abouta chemical compound summarized inone place rather than trying to extractspecific data from different paperssaves a lot of time. To use an analogyfrom Lisa Randall's book, Warped Pas-sages: Unraveling the Mysteries of theUniverse's Hidden Dimensions (HarperPerennial, 2006), we could live with one-dimensional information (titles and ab-stracts of papers) or we could add extradimensions by using resources that pro-vide summarized reports of the physi-cal, chemical, and biological propertiesof chemical compounds.

This article will discuss how threeimportant events that have happenedin the past 2-3 years—the emergenceof PubChem, the introduction ofDiscoveryGate, and the creation ofChemistry Central—are changing thefield of chemical information.

PUBCHEMPubChem (http://pubchem.ncbi

.nlm.nih.gov) is a free service that waslaunched by the U.S. National Insti-tutes of Health (NIH) in 2004. It consistsof three databases (PubChem Com-pound, PubChem Substance, and Pub-Chem Bio-Assay) linked together and in-corporated in the Entrez informationretrieval system of the National Centerfor Biotechnology Information (NCBl).PubChem is also an important part ofthe NIH's Molecular Libraries RoadmapInitiative (http://nihroadmap.nih.gov).

PubChem Compound contains morethan 10 million unique structures and

provides biological property informa-tion for each compound through linksto other Entrez databases.

PubChem Substance contains morethan 17 million records of substancesthat were deposited by other organiza-tions—publishers, suppliers of chemi-cals, and other vendors. It providesdescriptions of chemicals and links toPubMed, protein 3-D structures, andbiological screening results. If thecomposition of a chemical sample isknown, the description includes linksto PubChem Compound.

While the PubChem Compounddatabase contains records of chemicalsthat have already been identified, thePubChem Substance database con-tains records of all chemical substancesthat have been submitted to PubChem.If a component of a sample can beidentified as an individual compound,it becomes a candidate for inclusion inthe PubChem Compound database.

SEP I OCT 2007 CDNi-INt 17

PubChem BioAssay can be searched to find informationabout bioassays using specific terms pertinent to the bioas-say. It is also possible to browse or download PubChem Bio-Assay results. The bioassays' descriptions are also searchable.

You can search PubChem by names of chemicals, ele-ments, or groups of elements. Other capabilities includesearching by property range, such as temperatures betweenone number and another, or drawing a structure and usingthe drawing as a query. It is also possible to identify com-pounds that are similar to those studied.

PubChem has a limited scope of coverage—its goal is toprovide information on small organic molecules. With its 17million substances and 10 million unique structures, it issmaller than the CAS Registry File, which has 31 millionorganic and inorganic substances and more than 58 millionsequences. You can access the latter through STN, SciFinder,and SciFinder Scholar, which are subscription-based prod-ucts from the Chemical Abstracts Service (www.cas.org}, abranch of the American Chemical Society. The rate, though,with which PubChem has been growing is really impressive.Many vendors are now feeding information about chemicalcompounds into it. ln 2005, Nature Publishing Group was thefirst big commercial publisher to start doing so, followed byother vendors.

PubChem is a free service. Regardless of whether you area librarian, a researcher, a physician, a teacher, a student, orsomeone who is simply concerned about a chemical in thenew shampoo that you just bought, PubChem can provideyou with information about the physical, chemical, and bio-logical properties of many chemicals, including some drugs.

DISCOVERYGATEDiscoveryGate (DG; www.discoverygate.com) is an online

service from Elsevier MDL that provides access to more than20 databases through a unified platform called CompoundIndex. The databases in DG contain extensive experimental

data derived from journals, patents, chemical catalogs,major reference works, and Food and Drug Administration(FDA) documents. They cover more than 27 million struc-tures, 17 million reactions, and more than 500 million exper-imentally proven values, which makes DG the largest collec-tion of chemical properties.

In DG it is possible to search for information on chemical,physical, and biological properties of chemicals (includingnew drugs at different stages of development); to fmd sup-pliers and compare prices of chemicals; to discover methodsto synthesize new and existing compounds; to associatestructures with experimentally determined property dataretrieved from articles and patents; and to view, in one doc-ument, data retrieved from different sources. You search DGvia a series of forms, with each form having its own set ofcommand buttons. DG also allows exporting the results toreports that can combine data from multiple searches in onereport that can be printed, saved, or exported to an HTMLfile. Some of the most important databases in DG are dis-cussed in what follows. A comprehensive list of the DG data-bases is avaiiabie at the MDL Web site (www.mdli.com/solutions/solutions_for/academics/dg_academics.jsp).

CrossFire Beilstein provides access to the buge Beilsteindatabase, which is the crown jewel of DG. It is the world's mostcomprehensive database for compounds, reactions, proper-ties, and citations. It covers organic chemistry dating from1771 and contains more than 9.9 million chemical structures(with 6.5 million of them being unique). It is the world's largestreaction database, with more than 10 million reactions. Mostof the 320 million chemical properties in it have been exper-imentally verified. The database also provides access to900,000 abstracts dating from 1980.

CrossFire Beilstein allows conducting structure search-ing, reaction searching, and data searching. Until recently,the Beilstein database was not included in the CompoundIndex and could be searched only separately. Because MDL

NatlnulLlbnir>

or Medklne

Text Search

PubOiem provides mfbrmation an the biological actimtias of small moiecutes, t t s a campunsrii• r NIH'i Molecular Libranes RcadmAp Inittative If you would like to iBim more about how Couse the PubChem reaourcss. please go to our nalp page.

PiibChem tontans maie than 17 million substancus and SOD bioassays.

Sliuctures frcm ChamSpldar 31s now available in PubChem.

Mo 'p PubChem announcements .

^PubChem Compound: Search umquB chemical struclursj using names, synonvms orkey-ords Unks to available biological propaitv inftffmation ara providad fbr each

Ttie main search screen otPubCtiem

New '

A

-

H

Ll

Us

K

Rb

CB

=

n

Be

Mg

Ca

Sr

Ba

=

A

?

Sc

Y

Lu

cin'

0

Del

- I I

•? V

Sc V

Y V

Lu "v

Qry

B

Al

Ga

In

TI

MDLMolfile v

X

0+C

SI

Ge

Sn

Pb

NCC

N

P

As

St)

Bl

S(A

®

0

S

Se

Te

Po

4-D'A

|0_

CCCM

F

CI

Br

1

AI

HSID

^

He

Ne

Ar

Kr

Xe

Rn

Done

The structure drawing screen in PubChem

18 www.onlinemag.net

acquired the Beilstein database in March 2007 from the non-profit Beilstein-Institut (although Elsevier had been involvedwith tbe database's production and marketing since 1998), itis now integrated into the Compound Index of DG.

CrossFire Gmelin is another important database in DG,with information about inorganic chemicals. It has not yetbeen integrated with the Compound Index and must besearched separately.

MDL Comprehensive Medicinal Chemistry Datahase isderived from the Drug Compendium in Pergamon's Com-prehensive Medicinal Cbemistry (CMC). The MDL Compre-hensive Medicinal Chemistry database provides 3-D modelsand important hiochemical properties for more than 8,400pharmaceutical compounds (1900 to tbe present). It allowsscientists to view more than 7,500 hioactive compounds usedas medicinal agents.

MDL Toxicity Database is a structure-searcbable bio-activity database of toxic cbemical substances. It containsdata from studies on tbe toxicity, mutagenicity, skin and eyeirritation, tumorigenicity and carcinogenicity, and effects onreproduction. References to tbe original publicationsreporting the toxicity data and to relevant review articleswben applicable are also included.

MDL Metabolite Database consists of a database, a regis-trar, and a browser. All data in tbis system is organized intoschemes that depict the transformations and fate of parentcompounds. These could be drugs, agricultural chemicals,industrial chemicals, or environmental contaminants. Wbilecompiling xenobiotic transformations and metabolicscbemes from reliable sources, the vendor also allows com-panies to register and store results of in-bouse metabolismstudies and to share these results with researchers in tbesame enterprise. It is possible for a researcher to create struc-ture-searchable metabolic scbemes on a desktop and toupload tbem to the corporate database. Tbis allows organiza-tions to get better insigbts Into metabolic activity and the fate

A NEW ERA IN CHEMICAL INFORMATION

of candidates for drugs and to coordinate their discovery pro-grams. Using the browser, scientists can search by structureto retrieve molecules of interest, as well as structural frag-ments or specific transformations.

xPharm is a unique resource providing pharmacologicaldata to researcbers involved in tbe preciinical or discoverypbase of drug development. A detailed articie (Marie K.Saimbert, "xPharm: A Preciinical Information Portal forDiscovery Researchers in Institutions of Higher Learning,"Journal of Electronic Resources in Medical Libraries. 3(1),2006, p. 63; www.mdl.com/products/pdfs/xpbarm.pdf) pro-vides information about tbis resource.

The main users of DG are tbe pharmaceutical, chemical,and biotechnology companies, as well as some academicinstitutions thai can afford it. With its ricb content, DG couldbe very useful to researchers not only in the chemistry field,but also to those working in the life sciences.

CHEMISTRY CENTRALChemistry Central (www.chemistrycentral.com) is a new

open access service for cbemistry and related disciplines thatwas launched by BioMed Central in 2006. It allows freeaccess to peer-reviewed articles published in tbe ChemistryCentral Journal and tbe BioMed journals, as well as to arti-cles published in independent journals that are using theBioMed publisbing services.

Chemistry Central currently provides access to tbeBeilstein Journal of Organic Chemistry. BMC Biochemistry.BMC Chemical Biology, BMC Structural Biology, CarbonBalance and Management, Chemistry Central Journal, andCeochemical Transactions. As with many other open accessventures, Cbemistry Central tests new husiness models. BrianVickery, editorial director of Chemistry Central, discussedthese more fully in a recent interview (Svetla Baykoucbeva,"Paving the Road to More Open Access for Cbemistry: AnInterviewwitbBrianVickery, Editorial Director of Chemistry,"

DiscoveryGate's main interface

Search screen of CrossFire Beilstein in DG. The structure of beta-aminopyridine was drawn and used to search for information on ttiis compound

rather than searching by its name.

SEP I OCT 2007 ONLINe 19

OlscoveryGate* | MDL* Database BrowsBf

FMInFW

——

; -

-

,1111 riB m a n • • • 1

cm luit,iim

1 WP cnmmnaMil Munnei

«i-i«-|T J

INBitl.

f\II4 HH«f:<B ^uCmiKILn fa

Searc/) screen of the MDL Medicinal Chemistry database in DG.The search is for Lipitor, also known as Atorvastatin,

a widely prescribed cholesterol-lowering drug.

MDL® Metabolite DatabaseTom Racarai: 24

8»iici iM HUflmn I cmtlwt w cmwm

Bw FffsLiKs ar

TmnstoriTniifltrtrmcinSatemtui

CNCtimaJtvaUiHgOBU

s iraniftKmadDiia: Vrev*

onoi Psr

Ili

l»M(1,

!I!M.(!8|

rfte properties page of the MDL Metabolite Database, showing informationabout the metabolism of Lipitor

Chemistry Central's home page

Chemical Information Bulletin.V. 59, No. 1, pp. 13-15; bttp://acscinf.org/docs/publications/Interviews/Vickery/2007).

NOT JUST FOR CHEMISTSAlthougb I bave often used tbe pbrase "cbemical infor-

mation" tbrougbout tbis article, I would like to empbasizetbat cbetnistry and the life sciences bave becotne so inter-twined that tbe resources I discuss here can greatly benefitresearcbers in tbe life sciences too. Very often, though, tbeyare not familiar witb tbese resources and/or are not awarethat their organizations subscribe to tbem. To belp, a

comprebensive list of cbemistry databases bas been posted(www.cbembiogrid.org/related/resources/about.html).

Researcbers and practitioners in the biomedical field canfind useful information about drugs througb PubChem,wbicb is now cross-indexed with the Compound Index of DG.Users of one of tbese two information systems can immedi-ately see if related information exists in tbe other system.Only subscribers to DG, tbougb, could get full access to bothsystems. PubCbem users get free searching of both systemsand full access to PubCbem results.

ACCEPTANCE FACTORSSeveral factors will have an impact on how DG is accepted

by librarians and end users. It is an expensive service tbatmany organizations cannot afford, especially witb ever-shrinking library budgets. At tbe University of Maryland,wbicb is not one of tbe wealthiest academic institutions, webave a campuswide license for DG tbat allows everyoneaffiliated witb the university to use tbis service, even fromhome (tbrougb a virtual private network, or VPN). We wereable to switch, for the same price, from tbe Commander plat-form for CrossFire Beilstein and CrossFire Gmelin toDiscoveryGate, whicb is a mucb richer resource. I include DGin all my classes for tbe cbemistry courses I teach, and tbestudents are fascinated by wbat tbey can do with it. I bavealso done several seminars for librarians, showing tbem howto find relevant and unbiased information about drugs.

Tbe other issue with DG—learning bow to use it—can bea long and frustrating experience, even for cbemists. ElsevierMDL provides WebEx and on-site training, as well as excel-lent instructional materials, but DG is a sophisticated prod-uct tbat requires a lot of time and dedication on the part ofanyone wanting to use it. Tbe rewards for researcbers, tbougb,are enormous. Tbere are some librarians and faculty mem-bers who are still reluctant to embrace DG; tbey continue toaccess CrossFire Beilstein and CrossFire Gmelin tbrougb tbeold Commander interface ratber than through DG.

The collaboration between a free (PubCbem) and a pro-prietary (DG) service is a new model for providing scientificinformation. It is of great benefit to researcbers, as the con-tent of different information systems can be used togetber,thus eliminating unnecessary, repetitious searches per-formed in separate databases.

Not that long ago, you would often hear people saying thattbe cbemistry field had been lagging behind otber disciplinesin enjoying free access to information. The events discussedin this article are proving that this cliche is outdated.

Svetla Baykoucheva ([email protected]) is head of the WliiteMemorial Chemistry Library of rhe Unirersily of Maryland-Coilege Parkand the editor of the Chemical Information Bulletin, published by theChemical Information Division of the American Chemical Society (ACS).For 8 years, from 1997 until 2005. she was manager af the ACS Library andInformation Center in Washington, D.C.

Comments? Email letters to the editor to [email protected].

20 www.onlinemag.net