Delivering on the promise of a chemistry data repository for the world

36
Delivering on the promise of a chemistry data repository for the world Antony Williams Going Native Panel Discussion at the Microsoft eScience Workshop 0000-0002-2668- 4821

Transcript of Delivering on the promise of a chemistry data repository for the world

Delivering on the promise of a chemistry

data repository for the world

Antony WilliamsGoing Native Panel Discussion at the Microsoft eScience Workshop

0000-0002-2668-4821

A Question to Start…• Who in the room has an ORCID?

New Horizons….

• Let’s map together all historical chemistry data and build systems to integrate it

• Heck, let’s integrate chemistry and biology data and add in disease data too

• Let’s model the data and see if we can extract new relationships – quantitative and qualitative

• Let’s take what we learn from historical data and build better solutions for modern data

• Let’s make it all available on the web…

What about this….

• We’re going to map the world• We’re going to take photos of as

many places as we can and link them together

• We’ll let people annotate and curate the map

• Then let’s make it available free on the web

• We’ll make it available for decision making

• Put it on Mobile Devices, give it away…

Chemistry data is of value?

• Reference databases generate hundreds of millions of dollars/euros per year

• So much data generated that could go public

• Maybe 5% of all data generated is published

• There is no “Journal of Failed Experiments”

• Funding agencies start to demand Open Data

• Scientists want funding but also recognition

A shift to Openness

Open Data is here…

Chemistry data is of value?

• Reference databases generate hundreds of millions of dollars/euros per year

• So much data generated that could go public

• Maybe 5% of all data generated is published

• There is no “Journal of Failed Experiments”

• Funding agencies start to demand Open Data

• Scientists want funding but also recognition

• …so who will fund and build the platforms?

Going Native… speaka da lingo

Chemists clearly benefit from accessing data

What we found…• Data quality on the internet can

be very poor

• Everyone wants access to high quality data but very few are willing to contribute

• The primary concerns for contributors• It needs to be easy• Data licensing• Recognition for contributions

Recognition: need to have Impact

Quantitating scientists?

National Information Standards Organization and “Altmetrics”

http://www.niso.org/apps/group_public/download.php/13295/niso_altmetrics_white_paper_draft_v4.pdf

Research Outputs

• Blogs• Research datasets• Scientific software• Posters and presentations at

conferences• Electronic theses and

dissertations• Performances in film and audio• Lectures, online classes and

teaching activities

Recognizing Contribution

• In order to encourage participation maybe we need to provide recognition of impact

• How do we measure impact for:• Performing peer review?• Contributions to more “public

platforms”?...

Christmas Curating Wikipedia

Wikipedia Chemboxes

• http://en.wikipedia.org/wiki/Glucose

19

Three days of discussion

Three days of discussion

• If you want to understand Wikipedia definitely Go Native and get involved!

Does ONE bond matter???

A short intro to chirality

A short intro to chirality

Educating chemists in data

• Chemists are more likely to know basic HTML over data formats in chemistry

• Even international standards for data interchange and standardization are unknown

• Standards are ideal for computers to handle

Can we MAKE Quality Data?

• We are building systems for everyone to validate and standardize their data

Where to host research data?

• Containers for chemical compounds, chemical reactions, analytical data, tabular data, etc.

• Algorithms for data validation and standardization

• Domain specific search technologies• A platform for modeling data

• Progressing the RSC Data Repository…

Compounds

Reactions

Analytical data

Generating models from data

New Horizons….are here

• Let’s map together all historical chemistry data and build systems to integrate it

• Heck, let’s integrate chemistry and biology data and add in disease data too

• Let’s model the data and see if we can extract new relationships – quantitative and qualitative

• Let’s take what we learn from historical data and build better solutions for modern data

• Let’s make it all available on the web…

So we DON’T have to do this…

ORIGINAL

FIGURE

EXTRACTED FIGURE

The path forward• Mesh and aggregate published data• Encourage deposition of RESEARCH

data – that will never be published• Provide open APIs for data access• Educate chemists in digital

literacy

• Funding agencies should mandate data access

• Collaboration is key – don’t do it alone

Thank you

Email: [email protected]: 0000-0002-2668-4821 Twitter: @ChemConnectorPersonal Blog: www.chemconnector.com SLIDES: www.slideshare.net/AntonyWilliams