Research Data Management - eScholarship.org

33
Academic Librarians & Open Access of Data CHALLENGES & OPPORTUNITIES IN RESEARCH DATA MANAGEMENT DANIEL C. TSANG DISTINGUISHED LIBRARIAN DATA LI BRARIAN & BI BLIOGRAPHER FOR ASI AN AMERI CAN STUDI ES, POLI TI CAL SCI ENCE, AND ECONOMI CS UNIVERSITY OF CALIFORNIA, IRVINE, LIBRARIES PREPARED FOR PRESENTATION @ THE UNIVERSITY OF CALIFORNIA, RIVERSIDE, 12 MARCH 2015 E-mail: [email protected]

Transcript of Research Data Management - eScholarship.org

Academic Librarians & Open Access of DataCHALLENGES & OPPORTUNITIES IN RESEARCH DATA MANAGEMENT

DANIEL C. TSANGDISTINGUISHED LIBRARIAN

DATA LIBRARIAN & BIBLIOGRAPHER FOR ASIAN AMERICAN STUDIES, POLITICAL SCIENCE, AND ECONOMICS

UNIVERSITY OF CALIFORNIA, IRVINE, LIBRARIES

PREPARED FOR PRESENTATION @ THE UNIVERSITY OF CALIFORNIA, RIVERSIDE, 12 MARCH 2015

E-mail: [email protected]

University of California Open Access Policy

The Academic Senate of the University of California passed an Open Access Policy on July 24, 2013, ensuring that future research articles authored by faculty at all 10 campuses of UC will be made available to the public at no charge.

The policy covers more than 8,000 UC faculty and as many as 40,000 publications a year. By granting a license to the University of California prior to any contractual arrangement with publishers, faculty members can now make their research widely and publicly available, re-use it for various purposes, or modify it for future research publications.

Faculty on three campuses (UCLA, UCI and UCSF) began depositing articles in eScholarship on November 1, 2013.

Adapted from: http://osc.universityofcalifornia.edu/open-access-policy/

StakeholdersAn OCLC Research report, “Starting the Conversation: University-wide Research Data Management Policy” (December 2013) lists the following stakeholders for “starting the conversation” about research data management policy on academic campuses:

The University The Office of Research The Research Compliance Office The Information Technology Department The Researchers The Academic Units The Library

Source: http://oclc.org/research/publications/library/2013/2013-08r.html

Elements of the Conversation

Who owns the data? What Requirements are Imposed By Others? Which Data Should Be Retained? For How Long Should Data Be Maintained? How Should Digital Data Be Preserved? Are there Ethical Considerations? How are Data Accessed? How Open Should the Data Be? How Will Costs Be Managed? What are the Alternatives to Local Data Management?Source: http://oclc.org/research/publications/library/2013/2013-08r.html

Faculty Assessment of the State of Research Computing (FASRC) at

University of California, Irvine (2013)“Most critical research computing need” Our assessment is that long-term research data storage, and associated data

management, is the single most critical research computing need not being met on campus.

The FASRC committee believes that a well-run data storage service would allow many faculty groups to coordinate data storage using a centralized system, foster research collaboration, and provide access to archived research data.

Faculty expressed a need for having a secure place to archive their data, if not centrally, elsewhere on or off campus.

As a major component of the University’s scholarly product, research data must not only be stored securely but preserved and curated in trusted repositories so that the data remain accessible to the research community after a project is completed. Such accessibility enables secondary analysis of research data originally collected by University faculty and researchers

Source: http://sites.uci.edu/fasrc/files/2012/11/Faculty-Assessment-of-the-State-of-Research-Computing-Report.pdf

Online Courses University of Edinburgh offers an excellent array on self-paced online instruction on a variety of topics:

Research data explainedData management plansOrganising dataFile formats & transformationDocumentation & metadataStorage & securityData protection, rights & accessSharing, preservation & licensing Software practicals

Source: http://datalib.edina.ac.uk/mantra/

DIY Training Kit for Librarians (Edinburgh University)

Promotional slides for the RDM Training Kit

Training schedule

Research Data MANTRA online course by EDINA and Data Library, University of Edinburgh

Reflective writing questions

Selected group exercises (with answers) from UK Data Archive, University of Essex

Podcasts for short talks by the original Edinburgh speakers if running course without ‘live’ speakers

Evaluation forms

Independent study assignment: Interview with a researcher, based on Data Curation Profile

Source: http://datalib.edina.ac.uk/mantra/libtraining.html

Data Management for Clinical Research (Online

course,Vanderbilt University)“This course is designed to teach important concepts related to research data planning, collection, storage and dissemination. Instructors will offer information and best-practice guidelines for 1) investigator-initiated & sponsored research studies, 2) single- & multi-center studies, and 3) prospective data collection & secondary-reuse of clinical data for purposes of research. The curriculum will balance theoretical guidelines with the use of practical tools designed to assist in planning and conducting research. Real-world research examples, problem solving exercises and hands-on training will ensure students are comfortable with all concepts.”Source: https://www.coursera.org/course/datamanagement

UC Irvine Libraries Initiatives The UCI Libraries are partners with the Office of Information Technology

(OIT) and the Office of Research in an effort to define the long term direction and priorities for research computing and electronic research services on this campus.

A committee made up of UCI faculty, and staff from the Libraries and OIT, has made a set of recommendations to campus administration based on an online survey and focus groups with faculty. Among the proposals made are the need to develop a much faster network for the movement of research data across campus and externally; more support staff to enhance services offered, such as management, preservation, and organization of research project data; and development of a research data storage system for long term, secure storage of both raw and processed data sets.

I currently sit on an OIT/Faculty/Staff advisory committee on improving its computer network capacity.

Finding faculty for potential partnerships

NSF awards to your institutionhttp://www.nsf.gov/awardsearch/

NIH funded grants to your institution

http://report.nih.gov/award/index.cfm Data Management Plans

Past faculty contributing content to institutional repositories

Google Scholar Data Citation Index

ORCID

Talking Points with FacultyUCI Libraries draft in progress

Questions to ask Faculty

Can you tell me a bit about your research and what sort of data is involved?

Are you collecting your own or re-using existing data?

Where is your data currently stored?

What software and tools do you use to manage or analyze your data?

Do you currently share your data? Would you like to share it in the future?

Do you link your datasets to associated research publications?

How is your research funded and does the funding agency require data sharing or preservation?

Have you completed a Data Management Plan?

Trusted Repositories: The Data Seal of Approval

Data Seal of Approval The Data Seal of Approval is one such assessment initiative. Created by the Data Archiving and

Networked Services (DANS) archive in The Netherlands and overseen by an international board, the Data Seal of Approval is meant to demonstrate to researchers that data repositories are taking appropriate measures to ensure the long-term availability and quality of data they hold.

The seal sets forth 16 guidelines related to trustworthy data management and stewardship. ICPSR was one of the first six data repositories to earn the Data Seal of Approval in 2011. You can read the ICPSR self-assessment here --http://assessment.datasealofapproval.org/assessment_28/seal/html/. The other five archives awarded the Data Seal of Approval are the Archaeology Data Service (United Kingdom); the DANS Electronic Archiving System (Netherlands); the Platform for Archiving CINES (France); the Language Archive of the Max Planck Institute for Psycholinguistics (Netherlands); and the UK Data Archive.

The seal is awarded after an online self-assessment regarding a data repository's adherence to the guidelines. The assessment is then reviewed by the DSA Board before the seal is given.

In Europe, the Data Seal of Approval serves as a Basic Certification step in an integrated framework for auditing and certifying digital repositories.

Source: http://www.icpsr.umich.edu/icpsrweb/content/datamanagement/preservation/trust.html

Data Seal of Approval 2.0 (2013)

Fundamental to the following guidelines are five criteria, that together determine whether or not the digital research data may be qualified as sustainably archived: The research data can be found on the Internet. The research data are accessible, while taking into account relevant legislation

with regard to personal information and intellectual property of the data. The research data are available in a usable format. The research data are reliable. The research data can be referred to.

Source: https://assessment.datasealofapproval.org/media/files/DSA_booklets/DSA-booklet_1_June2010_1.pdfAssessment manual: https://assessment.datasealofapproval.org/guidelines_52/pdf/

Purdue University Research Repository

Start Your Research Project Create a Data Management Plan: Learn about the detailed

requirements for your data management plan (DMP). Funding agency requirements are very specific and our DMP resources can help you to clear up any confusion.

Upload Research Data to Your ProjectCreate a project to upload and share your data with collaborators using our step-by-step form to guide you through the process. Invite collaborators from other institutions to join your project.

Publish your DatasetPackage, describe, and publish your dataset with a Datacite DOI. Publishing will ensure your dataset is citable, reusable, and archived for the long-term.

Source: https://purr.purdue.edu/

Developments from UCI Libraries Established in March 2015 a new unit for E-Research & Digital

Scholarship Services New unit is within Collection Development Department

Arrival of the head of the unit; unit beefed up with 2 additional staff (metadata librarian plus a programmer, transferred from Cataloging and Library IT.

Local implementation of Dash for deposit of and access to research data generated by UCI faculty and researchers

Collaboration with faculty on archiving research conducted from and around Orange County in an OC (California) Data Portal

Discussion on using ORCID to identify faculty researchers

UCI Digital Scholarship Services

http://www.lib.uci.edu/dss/

Dash at UCs

https://dash.cdlib.org/

DataOne Dash

https://oneshare.cdlib.org/xtf/search

UCI Dash

https://dash.lib.uci.edu/

Orange County Data Portal

https://dash.lib.uci.edu/xtf/search?smode=orangecounty-home

Creative Commons License Zero

http://creativecommons.org/publicdomain/zero/1.0/

CC0 use for data

https://wiki.creativecommons.org/CC0_use_for_data

CDL on Data Licensing

http://cdluc3.github.io/dash/licensing/

Challenges to Data Sharing

On the researcher side… Data sharing culture varies by discipline & by country Sharing data different from sharing published articles Not wanting someone to steal your ideas or get access to your data Some research, e.g. with personal identifiers cannot be shared Repurposing of data may be precluded Further consent from respondents necessary Privacy of respondents must be protected National security may come into play Complexity and size of dataset Meaningful metadata not present File naming inconsistent

Challenges to Data Sharing…

On the librarian side … Institutional support missing? Skill-set missing? Not knowledgeable about faculty research Not interested or knowledgeable about a subdiscipline Full work load already Reference/Bibliographer Model Cannot attend training off-campus New model of service - helping in “publishing” research Lack of experience in data stewardship or data curation Economic constraints of institution – who’s going to pay for archiving &

access?

Domain Repositories Seek Funding for Research Data ManagementCurrent mandates for researchers to manage data have one drawback – the false assumption that it will be cost-free, or that the state will automatically support it.Hence, 25 domain repositories, meeting in Ann Arbor at ICPSR in 2013, drafted an open letter urging one national authority to commit funds to support managing of research data and its preservation and accessibility.

See “Sustaining Domain Repositories for Digital Data: A Call for Change from an Interdisciplinary Working Group of Domain Repositories” -http://tinyurl.com/domainrepositories25

Challenge for Domain Repositories

Despite the growing demand for data sharing and access, domain repositories face an uncertain financial future in the United States. The need for data archives is rising due to open access mandates, research innovations, and the growing volume of scientific data that needs to be curated, preserved, and disseminated. Yet funding for domain repositories remains unpredictable and inadequate for the task at hand. Of particular concern is the mismatch between the long-term commitments to preservation inherent in the work of archiving, and the short-term and episodic funding upon which this work is based. Many archives rely primarily on project-based grants, even though the expectation of stakeholders is that data will be available and usable indefinitely.

Another concern is that the push towards open access, while creating more equity of access for the community of users, creates more of a burden for domain repositories because it narrows their funding possibilities. Without care, this may create a different kind of inequity-- less well-funded scholars or institutions will be less likely to have their products of research preserved for the future.

Source: “Sustaining Domain Repositories for Digital Data: A Call for Change from an Interdisciplinary Working Group of Domain Repositories” - http://tinyurl.com/domainrepositories25

Domain Repositories’Call For Change

A Call for Change Domain repositories must be funded as the essential piece of the

U.S. research infrastructure that they are. This means:Ensuring funding streams that are long-term, uninterrupted, and flexible

Creating systems that promote good scientific practice Assuring equity in participation and accessThere may not be one solution to the problem. Repositories may very well need different funding models across domain and repository type. But in every case, creating sustainable funding streams will require the coordinated response of multiple stakeholders in the scientific, archival, academic, funding, and policy communities.Source: “Sustaining Domain Repositories for Digital Data: A Call for Change from an Interdisciplinary Working Group of Domain Repositories” - http://tinyurl.com/domainrepositories25

Benefits to Sharing Data

Contributing to scientific knowledge Potential of new discoveries & understanding based on secondary

analysis Researchers get cited more often if their data is “published” Publicly funded research made transparent The public benefits from open access to data Scientific fraud risk diminished Improves profile of institution Enhances scholarly collaboration & communication Repurposes the library mission for the foreseeable future

Panton Principles for Open Data

http://pantonprinciples.org/

Final Thoughts These data repositories are places to archive data and not usually

places to allow enhanced online analysis of data, such as enabling cross-tabulation of survey data results

Dash & DataOne use Creative Commons Licenses – allowing commercial entities to reuse the data, which may be problematic for some researchers

Web archiving is another way to capture at risk web data and archive it

At UCI we are beginning to consider web archiving a collection development responsibility

Not all data is worth archiving of course but grant funding agencies mandate archiving of funded research data

Acknowledgements

Thanks to all the online sources from which I have included material for this presentation.

And my gratitude to UCR Libraries, especially Rhonda Neugebauer and LAUC-R for inviting me to speak and to Judy Lee for introducing me.

Thank you!

Thank you for your participation in this session!

[Read the Learning Library blog for more of my presentations]

http://sites.uci.edu/learninglibrary/author/dtsang/