Post on 04-May-2023
EMBL-‐EBI
17/02/2014 KNIME UGM 2014 3
Genomes & variaCon • Ensembl • Ensembl Genomes • Genome-‐phenome archive • Metagenomics
NucleoCde sequences • European NucleoCde
Archive (ENA)
Expression • Array Express • Expression Atlas • PRIDE • R-‐Workbench
Proteins • The Universal Protein
Resource (UniProt) • InterPro
Chemical biology • ChEMBL • ChEBI
Literature & ontologies • Europe PubMed Central • Gene Ontology
Biomolecule structures • Protein Data Bank in Europe • PDBsum • ProFunc
Pathways • IntAct • Reactome • Metabolights
Systems • BioModels • Enzyme Portal • BioSamples
KNIME at the EBI • Provide KNIME training to scienCsts and researchers • CDK community nodes development • Access the ChEBI and ChEMBL databases via KNIME nodes
• Trusted community nodes
KNIME UGM 2014 17/02/2014 4
Overview of EMBL-‐EBI chemistry resources
KNIME UGM 2014 17/02/2014 5
UniChem – InChI-‐based resolver (full + relaxed ‘lenses’)
3rd Party Data
ZINC, PubChem, ThomsonPharma DOTF, IUPHAR, DrugBank, KEGG,
NIH NCC, eMolecules,
mcule, FDA SRS, PharmGKB, Selleck, ….
ChEMBL
BioacCvity data from literature
and deposiCons
ChEBI
Nomenclature of primary and secondary metabolites. Chemical & FuncConal Ontology
Atlas
Ligand induced transcript response
PDBe
Ligand structures
from structurally defined protein
complexes
SureChEMBL
Molecule structures from
patent literature
RDF and REST API interfaces
REST API Interface
15K 750 15M 1.5M 24K
65M
The story • EMBL-‐EBI have acquired SureChem – a leading ‘chemistry
patent mining’ product from Digital Science, Macmillan Group • SureChem not aligned with core future academic business
• User base • Free (SureChemOpen) • Paying (SureChemPro)
• EMBL-‐EBI will support exisCng licensees • Plans to provide an ongoing, free, open resource to enCre
community • Rebrand to SureChEMBL
KNIME UGM 2014 17/02/2014 8
Chemistry patents? • patere (LaCn) = to lay open • Legal and technical documents • Disclosure of invenCon in exchange for exclusive rights
• Usually 20 years • Driver for innovaCon • Most of the knowledge in (chemical) patents will never
appear anywhere else
KNIME UGM 2014 17/02/2014 9
SureChEMBL System Overview
17/02/2014 KNIME UGM 2014 10
WO
EP ApplicaCons& Granted
US ApplicaCons & granted
JP Abstracts
Patent Offices
Processed patents (service)
Name to Structure (five methods)
Image to Structure (one method)
Database
Chemistry Database
Patent PDFs
(service)
ApplicaCon Server
EnCty RecogniCon
The Cloud - Amazon Web Services Users
API
SureChem IP
SureChem System
1-‐[4-‐ethoxy-‐3-‐(6,7-‐dihydro-‐1-‐methyl-‐7-‐oxo-‐3-‐propyl-‐1H-‐pyrazolo[4,3-‐d]pyrimidin-‐5-‐yl)phenylsulfonyl]-‐4-‐methylpiperazine
17/02/2014 KNIME UGM 2014 11
Keyword search Filter by authority
Structure sketch
Filter by document secCon help
Paste SMILES, MOL, name
Types of chemistry search
Filter by date
http://www.surechembl.org/
help
Patent number search
SureChEMBL KNIME Nodes • Developed by Max Recall InformaCon Systems GmbH • Main funcConality
• Keyword search • Lucene syntax and Boolean operators • pa:(Bayer OR Genentech OR Merck) AND desc:(chemotherap* AND
(PhosphoinosiCde kinase OR Pi3K))
• Structure search • AddiConal phys/chem filters
• Retrieve patent biblio and full text • Extract chemistry from patent
• AddiConal filters • Document secCon counts • Chemical corpus counts
KNIME UGM 2014 17/02/2014 16
More use cases within KNIME • ChemoinformaCcs
• Chemistry landscape for a parCcular biological target/disease • R-‐group analysis for a parCcular patent family claimed chemistry • Novelty checking
• CompeCCve intelligence • ReporCng • Patent alerts • Per target/disease
• Prior art checking • Further text-‐mining and annotaCon • Network analysis of citaCons
KNIME UGM 2014 17/02/2014 20
Timeframes and plans • About 2-‐3 months for full transfer of operaCons • Refactor authenCcaCon system
• Consider fair use • Future ideas for development – dependent on funding!
• Add sequence searching • Add disease terms and target indexing • Add chemical structure tagging & search to full text content of Europe PMC
KNIME UGM 2014 17/02/2014 21
Acknowledgements • ChEMBL group
• John Overington • Mark Davies
• ChEBI group • Stephan Beisken
• SureChem • MaxRecall
• Michael Digenbach
• KNIME • KNIME community
KNIME UGM 2014 17/02/2014 22