Transitioning to the Next Generation of Metadata - OCLC

54
Transitioning to the Next Generation of Metadata Karen Smith-Yoshimura OCLC RESEARCH REPORT

Transcript of Transitioning to the Next Generation of Metadata - OCLC

Transitioning to the Next Generation of MetadataKaren Smith-Yoshimura

O C L C R E S E A R C H R E P O R T

Transitioning to the Next Generation of Metadata

Karen Smith-YoshimuraSenior Program Officer

copy 2020 OCLC This work is licensed under a Creative Commons Attribution 40 International License httpcreativecommonsorglicensesby40

September 2020

OCLC Research Dublin Ohio 43017 USA wwwoclcorg

ISBN 978-1-55653-167-5 DOI 11025333rqgd-b343 OCLC Control Number 1197990500

ORCID iDs Karen Smith-Yoshimura httpsorcidorg0000-0002-8757-2962

Please direct correspondence to OCLC Research oclcresearchoclcorg

Suggested citation Smith-Yoshimura Karen 2020 Transitioning to the Next Generation of Metadata Dublin OH OCLC Research httpsdoiorg1025333rqgd-b343

C O N T E N T S

Executive Summary vi

Introduction 1

The Transition to Linked Data and Identifiers 4Expanding the use of persistent identifiers 4

Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo 9

Addressing the need for multiple vocabularies and equity diversity and inclusion 11

Linked data challenges 14

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections 15Archival collections 16

Archived websites 17

Audio and video collections 18

Image collections 19

Research data 22

Evolution of ldquoMetadata as a Servicerdquo 24Metrics 24

Consultancy 25

New applications 25

Bibliometrics 26

Semantic indexing 27

Preparing for Future Staffing Requirements 28The culture shift 28

Learning opportunities 29

New tools and skills 29

Self-education 30

Addressing staff turnover 31

Impact 32

Acknowledgments 33

Appendix 34

Notes 35

F I G U R E S

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research 4

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters 6

FIGURE 3 Examples of some DOI and ARK identifiers 8

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages 9

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership 13

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections 19

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections 21

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon 26

FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata 27

E X E C U T I V E S U M M A R Y

The OCLC Research Library Partners Metadata Managers Focus Group first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP) a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

This report Transitioning to the Next Generation of Metadata synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions

Yet metadata is changing Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff This report traces how metadata is evolving and considers the impact this transition may have on library services posing such questions as

bull Why is metadata changing

bull How is the creation process changing

bull How is the metadata itself changing

bull What impact will these changes have on future staffing requirements and how canlibraries prepare

The future of linked data is tied to the future of metadata the metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for future linked data innovations as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

vi

I N T R O D U C T I O N

The OCLC Research Library Partners Metadata Managers Focus Group (hereafter referenced as the Focus Group)1 first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP)2 a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions Metadata provides the research infrastructure necessary for all librariesrsquo ldquovalue delivery systemsrdquo fulfilling their communityrsquos requests for information and resources Metadata is crucial for transitioning to next generations of library and discovery systems Good metadata created today can easily be reused in a linked data environment in the future3 As noted in the British Libraryrsquos Foundations for the Future ldquoOur vision is that by 2023 the Libraryrsquos collection metadata assets will be unified on a single sustainable standards-based infrastructure offering improved options for access collaboration and open reuserdquo4

Format-specific metadata management based on curated text strings in

bibliographic records understood only by library systems is nearing obsolescence

both conceptually and technically

Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff ldquoTraditional methods of metadata generation management and disseminationrdquo suggests the British Libraryrsquos Collection Management Strategy ldquoare not scalable or appropriate to an era of rapid digital change rising audience expectations and diminishing resourcesrdquo5 Focus Group members are eager to unleash the power of metadata in legacy records for different interactions and uses by both machines and end-users in the future Consistent metadata created according to past rules or standards need to be transformed into new structures

1

2 Transitioning to the Next Generation of Metadata

Why is metadata changing

Traditional library metadata was and is made by librarians conforming to rules that are mainly used and understood by librarians It is record-centered expensive to produce and has historic size limitations Metadata is limited in its coverage notably not including articles within scholarly journals or other scholarly outputs The infrastructure has been inadequate for managing corrections and enhancements inducing an emphasis on perfection that has exacerbated the slowness of metadata creation In short the metadata could be better there is not enough of it and the metadata that does exist is not used widely outside the library domain

How is the creation process changing

Metadata is no longer created by library staff alone Today publishers authors and other interested parties are equally involved in metadata creation Metadata creation has also been pushed forward in the scholarly life cycle with publishers creating metadata records earlier than in the traditional cataloging process Metadata can now be enhanced or corrected by machines or by crowdsourcing

How is the metadata itself changing

Machine-readable cataloging (MARC) was created to replicate the metadata traditionally found on library catalog cards We are transitioning from MARC records to assemblages of well-coded and shareable linkable components with an emphasis on references and we are eliminating anachronistic abbreviations not understood by machines Instead of relying only on library vocabularies such as subject headings and coded lists the developing assemblages can accommodate vocabularies created for specific domains expanding the metadatarsquos potential audiences

In short the metadata could be better there is not enough of it and the

metadata that does exist is not used widely outside the library domain

The Focus Grouprsquos composition has fluctuated over time and currently comprises representatives from 63 RLP Partners in 11 countries spanning four continents6 The group includes both past and incoming chairs of the Program for Cooperative Cataloging (PCC)7 providing cross-fertilization between the two Topics for group discussions can be proposed by any Focus Group member and are selected by an eight-member Planning Group (see appendix) who then write ldquocontext statementsrdquo explaining why the topic is considered timely and important and then develop question sets that delve into the topic Context statements and question sets are then distributed to all Focus Group members who are given three to five weeks to submit their responses Compilations of the Focus Grouprsquos responses inform face-to-face discussions held in conjunction with the American Library Association conferences8 and in subsequent virtual meetings

As the Focus Group facilitator I have summarized and synthesized these discussions in a series of OCLC Research Hanging Together Blog publications9 Nearly 40 blog posts on a wide range of metadata-related topics have been published on this forum over the past six years

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata

Karen Smith-YoshimuraSenior Program Officer

copy 2020 OCLC This work is licensed under a Creative Commons Attribution 40 International License httpcreativecommonsorglicensesby40

September 2020

OCLC Research Dublin Ohio 43017 USA wwwoclcorg

ISBN 978-1-55653-167-5 DOI 11025333rqgd-b343 OCLC Control Number 1197990500

ORCID iDs Karen Smith-Yoshimura httpsorcidorg0000-0002-8757-2962

Please direct correspondence to OCLC Research oclcresearchoclcorg

Suggested citation Smith-Yoshimura Karen 2020 Transitioning to the Next Generation of Metadata Dublin OH OCLC Research httpsdoiorg1025333rqgd-b343

C O N T E N T S

Executive Summary vi

Introduction 1

The Transition to Linked Data and Identifiers 4Expanding the use of persistent identifiers 4

Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo 9

Addressing the need for multiple vocabularies and equity diversity and inclusion 11

Linked data challenges 14

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections 15Archival collections 16

Archived websites 17

Audio and video collections 18

Image collections 19

Research data 22

Evolution of ldquoMetadata as a Servicerdquo 24Metrics 24

Consultancy 25

New applications 25

Bibliometrics 26

Semantic indexing 27

Preparing for Future Staffing Requirements 28The culture shift 28

Learning opportunities 29

New tools and skills 29

Self-education 30

Addressing staff turnover 31

Impact 32

Acknowledgments 33

Appendix 34

Notes 35

F I G U R E S

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research 4

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters 6

FIGURE 3 Examples of some DOI and ARK identifiers 8

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages 9

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership 13

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections 19

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections 21

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon 26

FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata 27

E X E C U T I V E S U M M A R Y

The OCLC Research Library Partners Metadata Managers Focus Group first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP) a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

This report Transitioning to the Next Generation of Metadata synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions

Yet metadata is changing Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff This report traces how metadata is evolving and considers the impact this transition may have on library services posing such questions as

bull Why is metadata changing

bull How is the creation process changing

bull How is the metadata itself changing

bull What impact will these changes have on future staffing requirements and how canlibraries prepare

The future of linked data is tied to the future of metadata the metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for future linked data innovations as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

vi

I N T R O D U C T I O N

The OCLC Research Library Partners Metadata Managers Focus Group (hereafter referenced as the Focus Group)1 first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP)2 a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions Metadata provides the research infrastructure necessary for all librariesrsquo ldquovalue delivery systemsrdquo fulfilling their communityrsquos requests for information and resources Metadata is crucial for transitioning to next generations of library and discovery systems Good metadata created today can easily be reused in a linked data environment in the future3 As noted in the British Libraryrsquos Foundations for the Future ldquoOur vision is that by 2023 the Libraryrsquos collection metadata assets will be unified on a single sustainable standards-based infrastructure offering improved options for access collaboration and open reuserdquo4

Format-specific metadata management based on curated text strings in

bibliographic records understood only by library systems is nearing obsolescence

both conceptually and technically

Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff ldquoTraditional methods of metadata generation management and disseminationrdquo suggests the British Libraryrsquos Collection Management Strategy ldquoare not scalable or appropriate to an era of rapid digital change rising audience expectations and diminishing resourcesrdquo5 Focus Group members are eager to unleash the power of metadata in legacy records for different interactions and uses by both machines and end-users in the future Consistent metadata created according to past rules or standards need to be transformed into new structures

1

2 Transitioning to the Next Generation of Metadata

Why is metadata changing

Traditional library metadata was and is made by librarians conforming to rules that are mainly used and understood by librarians It is record-centered expensive to produce and has historic size limitations Metadata is limited in its coverage notably not including articles within scholarly journals or other scholarly outputs The infrastructure has been inadequate for managing corrections and enhancements inducing an emphasis on perfection that has exacerbated the slowness of metadata creation In short the metadata could be better there is not enough of it and the metadata that does exist is not used widely outside the library domain

How is the creation process changing

Metadata is no longer created by library staff alone Today publishers authors and other interested parties are equally involved in metadata creation Metadata creation has also been pushed forward in the scholarly life cycle with publishers creating metadata records earlier than in the traditional cataloging process Metadata can now be enhanced or corrected by machines or by crowdsourcing

How is the metadata itself changing

Machine-readable cataloging (MARC) was created to replicate the metadata traditionally found on library catalog cards We are transitioning from MARC records to assemblages of well-coded and shareable linkable components with an emphasis on references and we are eliminating anachronistic abbreviations not understood by machines Instead of relying only on library vocabularies such as subject headings and coded lists the developing assemblages can accommodate vocabularies created for specific domains expanding the metadatarsquos potential audiences

In short the metadata could be better there is not enough of it and the

metadata that does exist is not used widely outside the library domain

The Focus Grouprsquos composition has fluctuated over time and currently comprises representatives from 63 RLP Partners in 11 countries spanning four continents6 The group includes both past and incoming chairs of the Program for Cooperative Cataloging (PCC)7 providing cross-fertilization between the two Topics for group discussions can be proposed by any Focus Group member and are selected by an eight-member Planning Group (see appendix) who then write ldquocontext statementsrdquo explaining why the topic is considered timely and important and then develop question sets that delve into the topic Context statements and question sets are then distributed to all Focus Group members who are given three to five weeks to submit their responses Compilations of the Focus Grouprsquos responses inform face-to-face discussions held in conjunction with the American Library Association conferences8 and in subsequent virtual meetings

As the Focus Group facilitator I have summarized and synthesized these discussions in a series of OCLC Research Hanging Together Blog publications9 Nearly 40 blog posts on a wide range of metadata-related topics have been published on this forum over the past six years

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

copy 2020 OCLC This work is licensed under a Creative Commons Attribution 40 International License httpcreativecommonsorglicensesby40

September 2020

OCLC Research Dublin Ohio 43017 USA wwwoclcorg

ISBN 978-1-55653-167-5 DOI 11025333rqgd-b343 OCLC Control Number 1197990500

ORCID iDs Karen Smith-Yoshimura httpsorcidorg0000-0002-8757-2962

Please direct correspondence to OCLC Research oclcresearchoclcorg

Suggested citation Smith-Yoshimura Karen 2020 Transitioning to the Next Generation of Metadata Dublin OH OCLC Research httpsdoiorg1025333rqgd-b343

C O N T E N T S

Executive Summary vi

Introduction 1

The Transition to Linked Data and Identifiers 4Expanding the use of persistent identifiers 4

Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo 9

Addressing the need for multiple vocabularies and equity diversity and inclusion 11

Linked data challenges 14

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections 15Archival collections 16

Archived websites 17

Audio and video collections 18

Image collections 19

Research data 22

Evolution of ldquoMetadata as a Servicerdquo 24Metrics 24

Consultancy 25

New applications 25

Bibliometrics 26

Semantic indexing 27

Preparing for Future Staffing Requirements 28The culture shift 28

Learning opportunities 29

New tools and skills 29

Self-education 30

Addressing staff turnover 31

Impact 32

Acknowledgments 33

Appendix 34

Notes 35

F I G U R E S

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research 4

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters 6

FIGURE 3 Examples of some DOI and ARK identifiers 8

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages 9

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership 13

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections 19

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections 21

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon 26

FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata 27

E X E C U T I V E S U M M A R Y

The OCLC Research Library Partners Metadata Managers Focus Group first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP) a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

This report Transitioning to the Next Generation of Metadata synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions

Yet metadata is changing Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff This report traces how metadata is evolving and considers the impact this transition may have on library services posing such questions as

bull Why is metadata changing

bull How is the creation process changing

bull How is the metadata itself changing

bull What impact will these changes have on future staffing requirements and how canlibraries prepare

The future of linked data is tied to the future of metadata the metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for future linked data innovations as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

vi

I N T R O D U C T I O N

The OCLC Research Library Partners Metadata Managers Focus Group (hereafter referenced as the Focus Group)1 first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP)2 a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions Metadata provides the research infrastructure necessary for all librariesrsquo ldquovalue delivery systemsrdquo fulfilling their communityrsquos requests for information and resources Metadata is crucial for transitioning to next generations of library and discovery systems Good metadata created today can easily be reused in a linked data environment in the future3 As noted in the British Libraryrsquos Foundations for the Future ldquoOur vision is that by 2023 the Libraryrsquos collection metadata assets will be unified on a single sustainable standards-based infrastructure offering improved options for access collaboration and open reuserdquo4

Format-specific metadata management based on curated text strings in

bibliographic records understood only by library systems is nearing obsolescence

both conceptually and technically

Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff ldquoTraditional methods of metadata generation management and disseminationrdquo suggests the British Libraryrsquos Collection Management Strategy ldquoare not scalable or appropriate to an era of rapid digital change rising audience expectations and diminishing resourcesrdquo5 Focus Group members are eager to unleash the power of metadata in legacy records for different interactions and uses by both machines and end-users in the future Consistent metadata created according to past rules or standards need to be transformed into new structures

1

2 Transitioning to the Next Generation of Metadata

Why is metadata changing

Traditional library metadata was and is made by librarians conforming to rules that are mainly used and understood by librarians It is record-centered expensive to produce and has historic size limitations Metadata is limited in its coverage notably not including articles within scholarly journals or other scholarly outputs The infrastructure has been inadequate for managing corrections and enhancements inducing an emphasis on perfection that has exacerbated the slowness of metadata creation In short the metadata could be better there is not enough of it and the metadata that does exist is not used widely outside the library domain

How is the creation process changing

Metadata is no longer created by library staff alone Today publishers authors and other interested parties are equally involved in metadata creation Metadata creation has also been pushed forward in the scholarly life cycle with publishers creating metadata records earlier than in the traditional cataloging process Metadata can now be enhanced or corrected by machines or by crowdsourcing

How is the metadata itself changing

Machine-readable cataloging (MARC) was created to replicate the metadata traditionally found on library catalog cards We are transitioning from MARC records to assemblages of well-coded and shareable linkable components with an emphasis on references and we are eliminating anachronistic abbreviations not understood by machines Instead of relying only on library vocabularies such as subject headings and coded lists the developing assemblages can accommodate vocabularies created for specific domains expanding the metadatarsquos potential audiences

In short the metadata could be better there is not enough of it and the

metadata that does exist is not used widely outside the library domain

The Focus Grouprsquos composition has fluctuated over time and currently comprises representatives from 63 RLP Partners in 11 countries spanning four continents6 The group includes both past and incoming chairs of the Program for Cooperative Cataloging (PCC)7 providing cross-fertilization between the two Topics for group discussions can be proposed by any Focus Group member and are selected by an eight-member Planning Group (see appendix) who then write ldquocontext statementsrdquo explaining why the topic is considered timely and important and then develop question sets that delve into the topic Context statements and question sets are then distributed to all Focus Group members who are given three to five weeks to submit their responses Compilations of the Focus Grouprsquos responses inform face-to-face discussions held in conjunction with the American Library Association conferences8 and in subsequent virtual meetings

As the Focus Group facilitator I have summarized and synthesized these discussions in a series of OCLC Research Hanging Together Blog publications9 Nearly 40 blog posts on a wide range of metadata-related topics have been published on this forum over the past six years

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

C O N T E N T S

Executive Summary vi

Introduction 1

The Transition to Linked Data and Identifiers 4Expanding the use of persistent identifiers 4

Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo 9

Addressing the need for multiple vocabularies and equity diversity and inclusion 11

Linked data challenges 14

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections 15Archival collections 16

Archived websites 17

Audio and video collections 18

Image collections 19

Research data 22

Evolution of ldquoMetadata as a Servicerdquo 24Metrics 24

Consultancy 25

New applications 25

Bibliometrics 26

Semantic indexing 27

Preparing for Future Staffing Requirements 28The culture shift 28

Learning opportunities 29

New tools and skills 29

Self-education 30

Addressing staff turnover 31

Impact 32

Acknowledgments 33

Appendix 34

Notes 35

F I G U R E S

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research 4

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters 6

FIGURE 3 Examples of some DOI and ARK identifiers 8

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages 9

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership 13

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections 19

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections 21

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon 26

FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata 27

E X E C U T I V E S U M M A R Y

The OCLC Research Library Partners Metadata Managers Focus Group first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP) a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

This report Transitioning to the Next Generation of Metadata synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions

Yet metadata is changing Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff This report traces how metadata is evolving and considers the impact this transition may have on library services posing such questions as

bull Why is metadata changing

bull How is the creation process changing

bull How is the metadata itself changing

bull What impact will these changes have on future staffing requirements and how canlibraries prepare

The future of linked data is tied to the future of metadata the metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for future linked data innovations as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

vi

I N T R O D U C T I O N

The OCLC Research Library Partners Metadata Managers Focus Group (hereafter referenced as the Focus Group)1 first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP)2 a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions Metadata provides the research infrastructure necessary for all librariesrsquo ldquovalue delivery systemsrdquo fulfilling their communityrsquos requests for information and resources Metadata is crucial for transitioning to next generations of library and discovery systems Good metadata created today can easily be reused in a linked data environment in the future3 As noted in the British Libraryrsquos Foundations for the Future ldquoOur vision is that by 2023 the Libraryrsquos collection metadata assets will be unified on a single sustainable standards-based infrastructure offering improved options for access collaboration and open reuserdquo4

Format-specific metadata management based on curated text strings in

bibliographic records understood only by library systems is nearing obsolescence

both conceptually and technically

Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff ldquoTraditional methods of metadata generation management and disseminationrdquo suggests the British Libraryrsquos Collection Management Strategy ldquoare not scalable or appropriate to an era of rapid digital change rising audience expectations and diminishing resourcesrdquo5 Focus Group members are eager to unleash the power of metadata in legacy records for different interactions and uses by both machines and end-users in the future Consistent metadata created according to past rules or standards need to be transformed into new structures

1

2 Transitioning to the Next Generation of Metadata

Why is metadata changing

Traditional library metadata was and is made by librarians conforming to rules that are mainly used and understood by librarians It is record-centered expensive to produce and has historic size limitations Metadata is limited in its coverage notably not including articles within scholarly journals or other scholarly outputs The infrastructure has been inadequate for managing corrections and enhancements inducing an emphasis on perfection that has exacerbated the slowness of metadata creation In short the metadata could be better there is not enough of it and the metadata that does exist is not used widely outside the library domain

How is the creation process changing

Metadata is no longer created by library staff alone Today publishers authors and other interested parties are equally involved in metadata creation Metadata creation has also been pushed forward in the scholarly life cycle with publishers creating metadata records earlier than in the traditional cataloging process Metadata can now be enhanced or corrected by machines or by crowdsourcing

How is the metadata itself changing

Machine-readable cataloging (MARC) was created to replicate the metadata traditionally found on library catalog cards We are transitioning from MARC records to assemblages of well-coded and shareable linkable components with an emphasis on references and we are eliminating anachronistic abbreviations not understood by machines Instead of relying only on library vocabularies such as subject headings and coded lists the developing assemblages can accommodate vocabularies created for specific domains expanding the metadatarsquos potential audiences

In short the metadata could be better there is not enough of it and the

metadata that does exist is not used widely outside the library domain

The Focus Grouprsquos composition has fluctuated over time and currently comprises representatives from 63 RLP Partners in 11 countries spanning four continents6 The group includes both past and incoming chairs of the Program for Cooperative Cataloging (PCC)7 providing cross-fertilization between the two Topics for group discussions can be proposed by any Focus Group member and are selected by an eight-member Planning Group (see appendix) who then write ldquocontext statementsrdquo explaining why the topic is considered timely and important and then develop question sets that delve into the topic Context statements and question sets are then distributed to all Focus Group members who are given three to five weeks to submit their responses Compilations of the Focus Grouprsquos responses inform face-to-face discussions held in conjunction with the American Library Association conferences8 and in subsequent virtual meetings

As the Focus Group facilitator I have summarized and synthesized these discussions in a series of OCLC Research Hanging Together Blog publications9 Nearly 40 blog posts on a wide range of metadata-related topics have been published on this forum over the past six years

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Impact 32

Acknowledgments 33

Appendix 34

Notes 35

F I G U R E S

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research 4

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters 6

FIGURE 3 Examples of some DOI and ARK identifiers 8

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages 9

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership 13

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections 19

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections 21

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon 26

FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata 27

E X E C U T I V E S U M M A R Y

The OCLC Research Library Partners Metadata Managers Focus Group first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP) a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

This report Transitioning to the Next Generation of Metadata synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions

Yet metadata is changing Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff This report traces how metadata is evolving and considers the impact this transition may have on library services posing such questions as

bull Why is metadata changing

bull How is the creation process changing

bull How is the metadata itself changing

bull What impact will these changes have on future staffing requirements and how canlibraries prepare

The future of linked data is tied to the future of metadata the metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for future linked data innovations as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

vi

I N T R O D U C T I O N

The OCLC Research Library Partners Metadata Managers Focus Group (hereafter referenced as the Focus Group)1 first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP)2 a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions Metadata provides the research infrastructure necessary for all librariesrsquo ldquovalue delivery systemsrdquo fulfilling their communityrsquos requests for information and resources Metadata is crucial for transitioning to next generations of library and discovery systems Good metadata created today can easily be reused in a linked data environment in the future3 As noted in the British Libraryrsquos Foundations for the Future ldquoOur vision is that by 2023 the Libraryrsquos collection metadata assets will be unified on a single sustainable standards-based infrastructure offering improved options for access collaboration and open reuserdquo4

Format-specific metadata management based on curated text strings in

bibliographic records understood only by library systems is nearing obsolescence

both conceptually and technically

Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff ldquoTraditional methods of metadata generation management and disseminationrdquo suggests the British Libraryrsquos Collection Management Strategy ldquoare not scalable or appropriate to an era of rapid digital change rising audience expectations and diminishing resourcesrdquo5 Focus Group members are eager to unleash the power of metadata in legacy records for different interactions and uses by both machines and end-users in the future Consistent metadata created according to past rules or standards need to be transformed into new structures

1

2 Transitioning to the Next Generation of Metadata

Why is metadata changing

Traditional library metadata was and is made by librarians conforming to rules that are mainly used and understood by librarians It is record-centered expensive to produce and has historic size limitations Metadata is limited in its coverage notably not including articles within scholarly journals or other scholarly outputs The infrastructure has been inadequate for managing corrections and enhancements inducing an emphasis on perfection that has exacerbated the slowness of metadata creation In short the metadata could be better there is not enough of it and the metadata that does exist is not used widely outside the library domain

How is the creation process changing

Metadata is no longer created by library staff alone Today publishers authors and other interested parties are equally involved in metadata creation Metadata creation has also been pushed forward in the scholarly life cycle with publishers creating metadata records earlier than in the traditional cataloging process Metadata can now be enhanced or corrected by machines or by crowdsourcing

How is the metadata itself changing

Machine-readable cataloging (MARC) was created to replicate the metadata traditionally found on library catalog cards We are transitioning from MARC records to assemblages of well-coded and shareable linkable components with an emphasis on references and we are eliminating anachronistic abbreviations not understood by machines Instead of relying only on library vocabularies such as subject headings and coded lists the developing assemblages can accommodate vocabularies created for specific domains expanding the metadatarsquos potential audiences

In short the metadata could be better there is not enough of it and the

metadata that does exist is not used widely outside the library domain

The Focus Grouprsquos composition has fluctuated over time and currently comprises representatives from 63 RLP Partners in 11 countries spanning four continents6 The group includes both past and incoming chairs of the Program for Cooperative Cataloging (PCC)7 providing cross-fertilization between the two Topics for group discussions can be proposed by any Focus Group member and are selected by an eight-member Planning Group (see appendix) who then write ldquocontext statementsrdquo explaining why the topic is considered timely and important and then develop question sets that delve into the topic Context statements and question sets are then distributed to all Focus Group members who are given three to five weeks to submit their responses Compilations of the Focus Grouprsquos responses inform face-to-face discussions held in conjunction with the American Library Association conferences8 and in subsequent virtual meetings

As the Focus Group facilitator I have summarized and synthesized these discussions in a series of OCLC Research Hanging Together Blog publications9 Nearly 40 blog posts on a wide range of metadata-related topics have been published on this forum over the past six years

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

F I G U R E S

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research 4

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters 6

FIGURE 3 Examples of some DOI and ARK identifiers 8

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages 9

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership 13

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections 19

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections 21

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon 26

FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata 27

E X E C U T I V E S U M M A R Y

The OCLC Research Library Partners Metadata Managers Focus Group first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP) a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

This report Transitioning to the Next Generation of Metadata synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions

Yet metadata is changing Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff This report traces how metadata is evolving and considers the impact this transition may have on library services posing such questions as

bull Why is metadata changing

bull How is the creation process changing

bull How is the metadata itself changing

bull What impact will these changes have on future staffing requirements and how canlibraries prepare

The future of linked data is tied to the future of metadata the metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for future linked data innovations as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

vi

I N T R O D U C T I O N

The OCLC Research Library Partners Metadata Managers Focus Group (hereafter referenced as the Focus Group)1 first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP)2 a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions Metadata provides the research infrastructure necessary for all librariesrsquo ldquovalue delivery systemsrdquo fulfilling their communityrsquos requests for information and resources Metadata is crucial for transitioning to next generations of library and discovery systems Good metadata created today can easily be reused in a linked data environment in the future3 As noted in the British Libraryrsquos Foundations for the Future ldquoOur vision is that by 2023 the Libraryrsquos collection metadata assets will be unified on a single sustainable standards-based infrastructure offering improved options for access collaboration and open reuserdquo4

Format-specific metadata management based on curated text strings in

bibliographic records understood only by library systems is nearing obsolescence

both conceptually and technically

Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff ldquoTraditional methods of metadata generation management and disseminationrdquo suggests the British Libraryrsquos Collection Management Strategy ldquoare not scalable or appropriate to an era of rapid digital change rising audience expectations and diminishing resourcesrdquo5 Focus Group members are eager to unleash the power of metadata in legacy records for different interactions and uses by both machines and end-users in the future Consistent metadata created according to past rules or standards need to be transformed into new structures

1

2 Transitioning to the Next Generation of Metadata

Why is metadata changing

Traditional library metadata was and is made by librarians conforming to rules that are mainly used and understood by librarians It is record-centered expensive to produce and has historic size limitations Metadata is limited in its coverage notably not including articles within scholarly journals or other scholarly outputs The infrastructure has been inadequate for managing corrections and enhancements inducing an emphasis on perfection that has exacerbated the slowness of metadata creation In short the metadata could be better there is not enough of it and the metadata that does exist is not used widely outside the library domain

How is the creation process changing

Metadata is no longer created by library staff alone Today publishers authors and other interested parties are equally involved in metadata creation Metadata creation has also been pushed forward in the scholarly life cycle with publishers creating metadata records earlier than in the traditional cataloging process Metadata can now be enhanced or corrected by machines or by crowdsourcing

How is the metadata itself changing

Machine-readable cataloging (MARC) was created to replicate the metadata traditionally found on library catalog cards We are transitioning from MARC records to assemblages of well-coded and shareable linkable components with an emphasis on references and we are eliminating anachronistic abbreviations not understood by machines Instead of relying only on library vocabularies such as subject headings and coded lists the developing assemblages can accommodate vocabularies created for specific domains expanding the metadatarsquos potential audiences

In short the metadata could be better there is not enough of it and the

metadata that does exist is not used widely outside the library domain

The Focus Grouprsquos composition has fluctuated over time and currently comprises representatives from 63 RLP Partners in 11 countries spanning four continents6 The group includes both past and incoming chairs of the Program for Cooperative Cataloging (PCC)7 providing cross-fertilization between the two Topics for group discussions can be proposed by any Focus Group member and are selected by an eight-member Planning Group (see appendix) who then write ldquocontext statementsrdquo explaining why the topic is considered timely and important and then develop question sets that delve into the topic Context statements and question sets are then distributed to all Focus Group members who are given three to five weeks to submit their responses Compilations of the Focus Grouprsquos responses inform face-to-face discussions held in conjunction with the American Library Association conferences8 and in subsequent virtual meetings

As the Focus Group facilitator I have summarized and synthesized these discussions in a series of OCLC Research Hanging Together Blog publications9 Nearly 40 blog posts on a wide range of metadata-related topics have been published on this forum over the past six years

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

E X E C U T I V E S U M M A R Y

The OCLC Research Library Partners Metadata Managers Focus Group first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP) a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

This report Transitioning to the Next Generation of Metadata synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions

Yet metadata is changing Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff This report traces how metadata is evolving and considers the impact this transition may have on library services posing such questions as

bull Why is metadata changing

bull How is the creation process changing

bull How is the metadata itself changing

bull What impact will these changes have on future staffing requirements and how canlibraries prepare

The future of linked data is tied to the future of metadata the metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for future linked data innovations as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

vi

I N T R O D U C T I O N

The OCLC Research Library Partners Metadata Managers Focus Group (hereafter referenced as the Focus Group)1 first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP)2 a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions Metadata provides the research infrastructure necessary for all librariesrsquo ldquovalue delivery systemsrdquo fulfilling their communityrsquos requests for information and resources Metadata is crucial for transitioning to next generations of library and discovery systems Good metadata created today can easily be reused in a linked data environment in the future3 As noted in the British Libraryrsquos Foundations for the Future ldquoOur vision is that by 2023 the Libraryrsquos collection metadata assets will be unified on a single sustainable standards-based infrastructure offering improved options for access collaboration and open reuserdquo4

Format-specific metadata management based on curated text strings in

bibliographic records understood only by library systems is nearing obsolescence

both conceptually and technically

Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff ldquoTraditional methods of metadata generation management and disseminationrdquo suggests the British Libraryrsquos Collection Management Strategy ldquoare not scalable or appropriate to an era of rapid digital change rising audience expectations and diminishing resourcesrdquo5 Focus Group members are eager to unleash the power of metadata in legacy records for different interactions and uses by both machines and end-users in the future Consistent metadata created according to past rules or standards need to be transformed into new structures

1

2 Transitioning to the Next Generation of Metadata

Why is metadata changing

Traditional library metadata was and is made by librarians conforming to rules that are mainly used and understood by librarians It is record-centered expensive to produce and has historic size limitations Metadata is limited in its coverage notably not including articles within scholarly journals or other scholarly outputs The infrastructure has been inadequate for managing corrections and enhancements inducing an emphasis on perfection that has exacerbated the slowness of metadata creation In short the metadata could be better there is not enough of it and the metadata that does exist is not used widely outside the library domain

How is the creation process changing

Metadata is no longer created by library staff alone Today publishers authors and other interested parties are equally involved in metadata creation Metadata creation has also been pushed forward in the scholarly life cycle with publishers creating metadata records earlier than in the traditional cataloging process Metadata can now be enhanced or corrected by machines or by crowdsourcing

How is the metadata itself changing

Machine-readable cataloging (MARC) was created to replicate the metadata traditionally found on library catalog cards We are transitioning from MARC records to assemblages of well-coded and shareable linkable components with an emphasis on references and we are eliminating anachronistic abbreviations not understood by machines Instead of relying only on library vocabularies such as subject headings and coded lists the developing assemblages can accommodate vocabularies created for specific domains expanding the metadatarsquos potential audiences

In short the metadata could be better there is not enough of it and the

metadata that does exist is not used widely outside the library domain

The Focus Grouprsquos composition has fluctuated over time and currently comprises representatives from 63 RLP Partners in 11 countries spanning four continents6 The group includes both past and incoming chairs of the Program for Cooperative Cataloging (PCC)7 providing cross-fertilization between the two Topics for group discussions can be proposed by any Focus Group member and are selected by an eight-member Planning Group (see appendix) who then write ldquocontext statementsrdquo explaining why the topic is considered timely and important and then develop question sets that delve into the topic Context statements and question sets are then distributed to all Focus Group members who are given three to five weeks to submit their responses Compilations of the Focus Grouprsquos responses inform face-to-face discussions held in conjunction with the American Library Association conferences8 and in subsequent virtual meetings

As the Focus Group facilitator I have summarized and synthesized these discussions in a series of OCLC Research Hanging Together Blog publications9 Nearly 40 blog posts on a wide range of metadata-related topics have been published on this forum over the past six years

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

I N T R O D U C T I O N

The OCLC Research Library Partners Metadata Managers Focus Group (hereafter referenced as the Focus Group)1 first established in 1993 is one of the longest-standing groups within the OCLC Research Library Partnership (RLP)2 a transnational network of research libraries The Focus Group provides a forum for administrators responsible for creating and managing metadata in their institutions to share information about topics of common concern and to identify metadata management issues The issues raised by the Focus Group are pursued by OCLC Research in support of the RLP and inform OCLC products and services

The firm belief that metadata underlies all discovery regardless of format now and in the future permeates all Focus Group discussions Metadata provides the research infrastructure necessary for all librariesrsquo ldquovalue delivery systemsrdquo fulfilling their communityrsquos requests for information and resources Metadata is crucial for transitioning to next generations of library and discovery systems Good metadata created today can easily be reused in a linked data environment in the future3 As noted in the British Libraryrsquos Foundations for the Future ldquoOur vision is that by 2023 the Libraryrsquos collection metadata assets will be unified on a single sustainable standards-based infrastructure offering improved options for access collaboration and open reuserdquo4

Format-specific metadata management based on curated text strings in

bibliographic records understood only by library systems is nearing obsolescence

both conceptually and technically

Format-specific metadata management based on curated text strings in bibliographic records understood only by library systems is nearing obsolescence both conceptually and technically Innovations in librarianship are exerting pressure on metadata management practices to evolve as librarians are required to provide metadata for far more resources of various types and to collaborate on institutional or multi-institutional projects with fewer staff ldquoTraditional methods of metadata generation management and disseminationrdquo suggests the British Libraryrsquos Collection Management Strategy ldquoare not scalable or appropriate to an era of rapid digital change rising audience expectations and diminishing resourcesrdquo5 Focus Group members are eager to unleash the power of metadata in legacy records for different interactions and uses by both machines and end-users in the future Consistent metadata created according to past rules or standards need to be transformed into new structures

1

2 Transitioning to the Next Generation of Metadata

Why is metadata changing

Traditional library metadata was and is made by librarians conforming to rules that are mainly used and understood by librarians It is record-centered expensive to produce and has historic size limitations Metadata is limited in its coverage notably not including articles within scholarly journals or other scholarly outputs The infrastructure has been inadequate for managing corrections and enhancements inducing an emphasis on perfection that has exacerbated the slowness of metadata creation In short the metadata could be better there is not enough of it and the metadata that does exist is not used widely outside the library domain

How is the creation process changing

Metadata is no longer created by library staff alone Today publishers authors and other interested parties are equally involved in metadata creation Metadata creation has also been pushed forward in the scholarly life cycle with publishers creating metadata records earlier than in the traditional cataloging process Metadata can now be enhanced or corrected by machines or by crowdsourcing

How is the metadata itself changing

Machine-readable cataloging (MARC) was created to replicate the metadata traditionally found on library catalog cards We are transitioning from MARC records to assemblages of well-coded and shareable linkable components with an emphasis on references and we are eliminating anachronistic abbreviations not understood by machines Instead of relying only on library vocabularies such as subject headings and coded lists the developing assemblages can accommodate vocabularies created for specific domains expanding the metadatarsquos potential audiences

In short the metadata could be better there is not enough of it and the

metadata that does exist is not used widely outside the library domain

The Focus Grouprsquos composition has fluctuated over time and currently comprises representatives from 63 RLP Partners in 11 countries spanning four continents6 The group includes both past and incoming chairs of the Program for Cooperative Cataloging (PCC)7 providing cross-fertilization between the two Topics for group discussions can be proposed by any Focus Group member and are selected by an eight-member Planning Group (see appendix) who then write ldquocontext statementsrdquo explaining why the topic is considered timely and important and then develop question sets that delve into the topic Context statements and question sets are then distributed to all Focus Group members who are given three to five weeks to submit their responses Compilations of the Focus Grouprsquos responses inform face-to-face discussions held in conjunction with the American Library Association conferences8 and in subsequent virtual meetings

As the Focus Group facilitator I have summarized and synthesized these discussions in a series of OCLC Research Hanging Together Blog publications9 Nearly 40 blog posts on a wide range of metadata-related topics have been published on this forum over the past six years

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

2 Transitioning to the Next Generation of Metadata

Why is metadata changing

Traditional library metadata was and is made by librarians conforming to rules that are mainly used and understood by librarians It is record-centered expensive to produce and has historic size limitations Metadata is limited in its coverage notably not including articles within scholarly journals or other scholarly outputs The infrastructure has been inadequate for managing corrections and enhancements inducing an emphasis on perfection that has exacerbated the slowness of metadata creation In short the metadata could be better there is not enough of it and the metadata that does exist is not used widely outside the library domain

How is the creation process changing

Metadata is no longer created by library staff alone Today publishers authors and other interested parties are equally involved in metadata creation Metadata creation has also been pushed forward in the scholarly life cycle with publishers creating metadata records earlier than in the traditional cataloging process Metadata can now be enhanced or corrected by machines or by crowdsourcing

How is the metadata itself changing

Machine-readable cataloging (MARC) was created to replicate the metadata traditionally found on library catalog cards We are transitioning from MARC records to assemblages of well-coded and shareable linkable components with an emphasis on references and we are eliminating anachronistic abbreviations not understood by machines Instead of relying only on library vocabularies such as subject headings and coded lists the developing assemblages can accommodate vocabularies created for specific domains expanding the metadatarsquos potential audiences

In short the metadata could be better there is not enough of it and the

metadata that does exist is not used widely outside the library domain

The Focus Grouprsquos composition has fluctuated over time and currently comprises representatives from 63 RLP Partners in 11 countries spanning four continents6 The group includes both past and incoming chairs of the Program for Cooperative Cataloging (PCC)7 providing cross-fertilization between the two Topics for group discussions can be proposed by any Focus Group member and are selected by an eight-member Planning Group (see appendix) who then write ldquocontext statementsrdquo explaining why the topic is considered timely and important and then develop question sets that delve into the topic Context statements and question sets are then distributed to all Focus Group members who are given three to five weeks to submit their responses Compilations of the Focus Grouprsquos responses inform face-to-face discussions held in conjunction with the American Library Association conferences8 and in subsequent virtual meetings

As the Focus Group facilitator I have summarized and synthesized these discussions in a series of OCLC Research Hanging Together Blog publications9 Nearly 40 blog posts on a wide range of metadata-related topics have been published on this forum over the past six years

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 3

The Metadata Managers Focus Group is just one activity within the broader OCLC Research Library Partnership which is devoted to extensive professional development opportunities for library staff Focus Group members value their affiliation with the Research Library Partnership as a channel to becoming the ldquochange agentsrdquo of future metadata management10 Focus Group membersrsquo responses to question sets have facilitated intra-institutional discussions and helped metadata managers understand how their institutionsrsquo situation compares with peers within the Partnership

These Focus Group discussions identified a broad range of metadata-related issues documented in this report Transitioning to the next generation of metadata is an evolving process intertwined with changing standards infrastructures and tools Together Focus Group members came to a common understanding of the challenges shared possible approaches to address them and inoculated these ideas into other communities that they interact with

Collectively Focus Group members command a wide range of experiences with linked data The Focus Grouprsquos keen interest in linked data implementations sparked the series of OCLC Researchrsquos International Linked Data Surveys for Implementers11 A subset of Focus Group members have participated in various linked data projects including the OCLC Research Project Passage and CONTENTdm Linked Data pilot OCLCrsquos Shared Entity Management Infrastructure Library of Congressrsquo Bibliographic Framework Initiative (BIBFRAME) the Mellon-funded Linked Data for Production (LD4P) project the Share-VDE initiative and the IMLS planning grant Shareable Local Name Authorities which exposed issues raised by identifier hubs in the linked data environment12 In addition Focus Group members contribute to the PCC task groups addressing aspects of linked data work including the PCC Task Group on Linked Data Best Practices Task Group on Identity Management Task Group on URIs in MARC and the PCC Linked Data Advisory Committee13 This cross-fertilization has prompted the Focus Group to examine issues around the entities represented in institutional resources

This report synthesizes six years (2015-2020) of OCLC Research Library Partners Metadata Managers Focus Group discussions and what they may foretell for the ldquonext generation of metadatardquo The document is organized in the following sections each representing an emerging trend identified in the Focus Grouprsquos discussions

bull The transition to linked data and identifiers expanding the use of persistent identifiers aspart of the shift from ldquoauthority controlrdquo to ldquoidentity managementrdquo

bull Describing the ldquoinside-outrdquo and ldquofacilitatedrdquo collections challenges in creating andmanaging metadata for unique resources created or curated by institutions in variousformats and shared with consortia

bull Evolution of ldquometadata as a servicerdquo increased involvement with metadata creationbeyond the traditional library catalog

bull Preparing for future staffing requirements the changing landscape calls for new skill setsneeded by both new professionals entering the field and seasoned catalogers

The document concludes with some observations on the forecasted impact of the next generation of metadata on the wider library community

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

4 Transitioning to the Next Generation of Metadata

The Transition to Linked Data and IdentifiersLinked data offers the ability to take advantage of structured data with an emphasis on context It relies on language-neutral identifiers pointing to objects with a focus on ldquothingsrdquo replacing the ldquostringsrdquo inherent in current authority and catalog records These identifiers can then be connected to related data vocabularies and terms in other languages disciplines and domains including nonlibrary domains Linked data applications can consume othersrsquo contributions and thus free metadata specialists from having to re-describe things already described elsewhere allowing them instead to focus on providing access to their institutionsrsquo unique and distinctive collections This promises a richer user experience and increased discoverability with more contextual relationships than is possible with our current systems Furthermore linked data offers an opportunity to go beyond the library domain by drawing on information about entities from diverse sources14

FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research15

The hope is that linked data will allow libraries to offer new value-added services that current models cannot support that outside parties will be able to make better use of library resource descriptions and that the data will be richer because more parties share in its creation Moving to a linked data environment portends changes to resource description workflows as shown in figure 1

The drive to move metadata operations to linked data depends on the availability of tools access to linked data sources for reuse documented best practices on identifiers and the metadata descriptions associated with them (ldquostatementsrdquo) and a critical mass of implementations on a network level

EXPANDING THE USE OF PERSISTENT IDENTIFIERS

The Focus Group discussed the ldquofuture-proofingrdquo of cataloging which refers to the opportunities to unleash the power of metadata in legacy records for different interactions and uses in the future Persistent identifiers were viewed as crucial to transitioning from current metadata to future applications16 Identifiers in the form of language-neutral alphanumeric strings serve as a shorthand for assembling the elements required to uniquely describe an object or resource They can be resolved over networks with specific protocols for finding identifying and using that object or resource In the nonlibrary domain Social Security and employee numbers are examples of such identifiers In the library and academic domains

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 5

Focus Group members pointed to ORCID (Open Researcher and Contributor ID)17 as a ldquogluerdquo that holds together the four arms of scholarly work publishing repository library catalog and researchersmdashbut ORCID is limited to only living researchers ORCID is increasingly used in STEM (science technology engineering mathematics) journals for all authors and contributors18 and included in institutionsrsquo Research Information Management systems ISNI (International Standard Name Identifier)19 uniquely identifies persons and organizations involved in creative activities used by libraries publishers databases and rights management organizations and it covers nonliving creators

Persistent identifiers were viewed as crucial to transitioning from current

metadata to future applications Persistent identifiers are used by parties such as Google and HathiTrust for service integration20 More institutions are using geospatial coordinates in metadata or URIs (Uniform Resource Identifiers) pointing to geospatial coordinates that support API (Application Programming Interface) calls to GeoNames21 enabling map visualizations Research institutions are also adopting person identifiers such as ORCID to streamline the collection of the institutional research record usually through a Research Information Management system as documented in the 2017 OCLC Research Report Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Management22

While publishers serve as a key player in the metadata workstream publisher data does not always meet library requirements For example publisher data for monographs usually does not include identifiers The British Library is working with five UK publishers to add ISNIs23 to their metadata as a promising proof-of-concept for publishers and libraries working together earlier in the supply chain The ability to batch load or algorithmically add identifiers in the future is on Focus Group membersrsquo wish list

No single person identifier covers all use cases Researchersrsquo names have been only partially represented in national name authority files that identify persons both living and dead A sizable quantity of legacy names are represented only by text strings in bibliographic records Authority records are created only by institutions involved in the PCCrsquos Name Authority Cooperative Program (NACO)24 or in national library programs Even then authority records are created selectively for certain headings or sometimes only when references are involved The LCNACO name authority file contained only 30 of the total names reflected in WorldCatrsquos bibliographic record access points (9 million LCNACO records compared to the 30 million total names reported on the WorldCat Identities project page as of 2012)25 By 2020 this percentage decreased to 18 11 million LCNACO authority records compared to 62 million in WorldCat Identities These statistics illustrate that the number of names represented in bibliographic records are increasing more quickly than those that are under authority control

Authority files focus on the ldquopreferred formrdquo of a name which can vary depending on language discipline context and time period Scholars have objected to the very concept of a ldquopreferred formrdquo as the name may be referred to differently depending on the context26 When a name has multiple forms historians need to know the provenance of each name following the citation practices commonly used in their field An identifier linked to different forms of names each associated with the provenance and context could resolve this conundrum

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

6 Transitioning to the Next Generation of Metadata

Researcher names are just one example of a need unmet by current identifier systems Institutions have been minting their own ldquolocal identifiersrdquo to meet this need Use cases for local identifiers include registering all researchers on campus representing entities that are underrepresented in national authority files such as authors of electronic dissertations and theses performers events local place names and campus buildings identifying entities in digital library projects and institutional repositories reflecting multilingual needs of the community and supporting ldquohousekeepingrdquo tasks such as recording archival collection titles27

Focus Group membersrsquo consistent need to disambiguate names across disciplines and formats spurred creating the OCLC Research working group on Registering Researchers in Authority Files28 The need to accurately record researchersrsquo institutional affiliations to reflect the institutionrsquos scholarly output to promote cross-institutional collaborations and to lead to more successful recruitment and funding led to another working group on Addressing the Challenges with Organizational Identifiers and ISNI29 which presented new data modeling of organizations that others could adapt for their own uses Since then the Research Organization Registry (ROR) was launched to develop an open sustainable usable and unique identifier for every research organization in the world30

Disambiguating names is the most labor-intensive part of authority work and will still be a prerequisite for assigning unique identifiers Given the different name identifier systems already in use libraries need a name reconciliation service Authority work and algorithms based on text string matching have limits the results will still need manual expert review Tapping the expertise in user communities to verify if two identifiers represent the same person may help

Disambiguation is particularly difficult for authors or contributors listed in journal articles where names are often abbreviated and there may be dozens or even hundreds of contributors For example an article in Physical Review LettersmdashPrecision Measurement of the Top Quark Mass in Lepton + Jets Final Statemdashhas approximately 300 abbreviated author names for a five-page article (figure 2)31 This exemplifies the different practices among disciplines By contrast other objects with many contributors such as feature films and orchestral recordings are usually represented by only a relative handful of the associated names in library legacy metadata32 Such differences make creating metadata that is uniform understandable and widely reusable a challenge

FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters

Abbreviated forms of author names on journal articles make it difficultmdashand often impossiblemdashto match them to the correct authority form or an identifier if it exists Associating ORCIDs with article authors makes it easier to differentiate authors with the same abbreviated forms Research Information Management (RIM) systems apply identity management for local researchers so that they are correctly associated with the articles they have written Their articles are displayed as part of their profiles (See for example ExpertsMinnesota or University of Illinois at Urbana-Champaignrsquos Experts research profiles)33

arX

iv1

405

1756

v2 [

hep-

ex]

16 Ju

n 20

14

FERMILAB-PUB-14-123-E

Precision measurement of the top-quark mass in lepton+jets final states

VM Abazov31 B Abbott67 BS Acharya25 M Adams46 T Adams44 JP Agnew41 GD Alexeev31

G Alkhazov35 A Altona56 A Askew44 S Atkins54 K Augsten7 C Avila5 F Badaud10 L Bagby45

B Baldin45 DV Bandurin73 S Banerjee25 E Barberis55 P Baringer53 JF Bartlett45 U Bassler15

V Bazterra46 A Bean53 M Begalli2 L Bellantoni45 SB Beri23 G Bernardi14 R Bernhard19 I Bertram39

M Besanccedilon15 R Beuselinck40 PC Bhat45 S Bhatia58 V Bhatnagar23 G Blazey47 S Blessing44 K Bloom59

MI Boehnlein45 D Boline64 EE Boos33 G Borissov39 M Borysoval38 A Brandt70 O Brandt20 R Brock57

A Bross45 D Brown14 XB Bu45 M Buehler45 V Buescher21 V Bunichev33 S Burdinb39 CP Buszello37

E Camacho-Peacuterez28 BCK Casey45 H Castilla-Valdez28 S Caughron57 S Chakrabarti64 KM Chan51

A Chandra72 E Chapon15 G Chen53 SW Cho27 S Choi27 B Choudhary24 S Cihangir45 D Claes59

J Clutter53 M Cookek45 WE Cooper45 M Corcoran72 F Couderc15 M-C Cousinou12 D Cutts69

A Das42 G Davies40 SJ de Jong29 30 E De La Cruz-Burelo28 F Deacuteliot15 R Demina63 D Denisov45 SP Denisov34 S Desai45 C Deterrec20 K DeVaughan59 HT Diehl45 M Diesburg45 PF Ding41

A Dominguez59 A Dubey24 LV Dudko33 A Duperrin12 S Dutt23 M Eads47 D Edmunds57

B Ellison43 VD Elvira45 Y Enari14 H Evans49 VN Evdokimov34 A Faureacute15 L Feng47 T Ferbel63

F Fiedler21 F Filthaut29 30 W Fisher57 HE Fisk45 M Fortner47 H Fox39 S Fuess45 PH Garbincius45

A Garcia-Bellido63 JA Garciacutea-Gonzaacutelez28 V Gavrilov32 W Geng12 57 CE Gerber46 Y Gershtein60

G Ginther45 63 O Gogota38 G Golovanov31 PD Grannis64 S Greder16 H Greenlee45 G Grenier17

Ph Gris10 J-F Grivaz13 A Grohsjeanc15 S Gruumlnendahl45 MW Gruumlnewald26 T Guillemin13 G Gutierrez45

P Gutierrez67 J Haley68 L Han4 K Harder41 A Harel63 JM Hauptman52 J Hays40 T Head41

T Hebbeker18 D Hedin47 H Hegab68 AP Heinson43 U Heintz69 C Hensel1 I Heredia-De La Cruzd28

K Herner45 G Heskethf 41 MD Hildreth51 R Hirosky73 T Hoang44 JD Hobbs64 B Hoeneisen9 J Hogan72

M Hohlfeld21 JL Holzbauer58 I Howley70 Z Hubacek7 15 V Hynek7 I Iashvili62 Y Ilchenko71

L Illingworth45 AS Ito45 S Jabeenm45 M Jaffreacute13 A Jayasinghe67 MS Jeong27 R Jesik40 P Jiang4

K Johns42 E Johnson57 M Johnson45 A Jonckheere45 P Jonsson40 J Joshi43 AW Jung45 A Juste36

E Kajfasz12 D Karmanov33 I Katsanos59 R Kehoe71 S Kermiche12 N Khalatyan45 A Khanov68

L Kharchilava62 YN Kharzheev31 I Kiselevich32 JM Kohli23 AV Kozelov34 J Kraus58 A Kumar62

M Kupco8 T Kurča17 VA Kuzmin33 S Lammers49 P Lebrun17 HS Lee27 SW Lee52 WM Lee45 X Lei42

J Lellouch14 D Li14 H Li73 L Li43 QZ Li45 JK Lim27 D Lincoln45 J Linnemann57 VV Lipaev34

R Lipton45 H Liu71 Y Liu4 A Lobodenko35 M Lokajicek8 R Lopes de Sa64 R Luna-Garciag28

K L Lyon45 AKA Maciel1 R Madar19 R Magantildea-Villalba28 S Malik59 VL Malyshev31 J Mansour20

J Martiacutenez-Ortega28 R McCarthy64 CL McGivern41 MM Meijer29 30 A Melnitchouk45 D Menezes47

PG Mercadante3 M Merkin33 A Meyer18 J Meyeri20 F Miconi16 NK Mondal25 M Mulhearn73 E Nagy12

M Narain69 R Nayyar42 HA Neal56 JP Negret5 P Neustroev35 HT Nguyen73 T Nunnemann22

J Orduna72 N Osman12 J Osta51 A Pal70 N Parashar50 V Parihar69 SK Park27 R Partridgee69

N Parua49 A Patwaj65 B Penning45 M Perfilov33 Y Peters41 K Petridis41 G Petrillo63 P Peacutetroff13

M-A Pleier65 VM Podstavkov45 AV Popov34 M Prewitt72 D Price41 N Prokopenko34 J Qian56

A Quadt20 B Quinn58 PN Ratoff39 I Razumov34 I Ripp-Baudot16 F Rizatdinova68 M Rominsky45

A Ross39 C Royon15 P Rubinov45 R Ruchti51 G Sajot11 A Saacutenchez-Hernaacutendez28 MP Sanders22

AS Santosh1 G Savage45 M Savitskyi38 L Sawyer54 T Scanlon40 RD Schamberger64 Y Scheglov35

H Schellman48 C Schwanenberger41 R Schwienhorst57 J Sekaric53 H Severini67 E Shabalina20 V Shary15

S Shaw57 AA Shchukin34 V Simak7 P Skubic67 P Slattery63 D Smirnov51 GR Snow59 J Snow66

I Snyder65 S Soumlldner-Rembold41 L Sonnenschein18 K Soustruznik6 J Stark11 DA Stoyanova34 M Strauss67

L Suter41 P Svoisky67 M Titov15 VV Tokmenin31 Y-T Tsai63 D Tsybychev64 B Tuchming15 C Tully61

L Uvarov35 S Uvarov35 S Uzunyan47 R Van Kooten49 WM van Leeuwen29 N Varelas46 EW Varnes42

LI A Vasilyev34 AY Verkheev31 LS Vertogradov31 M Verzocchi45 M Vesterinen41 D Vilanova15 P Vokac7

HD Wahl44 MHLS Wang45 J Warchol51 G Watts74 M Wayne51 J Weichert21 L Welty-Rieger48

MRJ Williams49 GW Wilson53 M Wobisch54 DR Wood55 TR Wyatt41 Y Xie45 R Yamada45

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 7

For researcher identity management to work individuals must create and maintain their own ORCIDs Institutions have been encouraging their researchers to include an ORCID in their profiles Researchers have greater incentives to adopt ORCID to meet national and funder requirements such as those of the National Science Foundation and the National Institutes of Health in the United States34 Research Information Management Systems harvest metadata from abstract and indexing databases such as Scopus Web of Science and PubMed each of which has its own person identifiers that help with disambiguation they may also be linked to an authorrsquos ORCID Linked data could access information across many environments including those in Research Information Systems but would require accurately linking multiple identifiers for the same person to each other

Some Focus Group members are performing metadata reconciliation work such as searching matching terms from linked data sources and adding their URIs in metadata records as a necessary first step toward a linked data environment or as part of metadata enhancement work35 Improving the quality of the data improves usersrsquo experiences in the short term and will help with the transition to linked data later Most metadata reconciliation is done on personal names subjects and geographic names Sources used for such reconciliation include OCLCrsquos Virtual International Authority File (VIAF) the Library of Congressrsquos linked data service (idlocgov) ISNI the Gettyrsquos Union List of Artists Names (ULAN) Art and Architecture Thesaurus (AAT) and Thesaurus of Geographic Names (TGN) OCLCrsquos Faceted Application of Subject Terminology (FAST) and various national authority files Selections of the source depend on the trustworthiness of the organization responsible subject matter and richness of the information Such metadata reconciliation work is labor intensive and does not scale well

Some members of the Focus Group have experimented with obtaining identifiers (persistent URIs from linked data sources) to eventually replace their current reliance on text strings Institutions concluded that it is more efficient to create URIs in authority records at the outset rather than reconcile them later The University of Michigan has developed a LCNAF Named Entity Reconciliation program36 using Googlersquos Open Refine that searches the VIAF file with the VIAF API for matches looks for Library of Congress source records within a VIAF cluster and extracts the authorized heading This results in a dataset pairing the authorized LC Name Authority File heading with the original heading and a link to the URI of the LCNAF linked data service This service could be modified to bring in the VIAF identifier instead it gets fair results even though it uses string matching

A long list of nonlibrary sources that could enhance current authority data or could be valuable to link to in certain contexts has been identified Wikidata and Wikipedia led the list Other sources include AllMusic author and fan sites Discogs EAC-CPF (Encoded Archival Context for Corporate Bodies Persons and Families) EAD (Encoded Archival Description) family trees GeoNames GoodReads IMDb (Internet Movie Database) Internet Archive Library Thing LinkedIn MusicBrainz ONIX (ONline Information eXchange) Open Library ORCID and Scopus ID The PCCrsquos Task Group on URIs in MARCrsquos document Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources37 provides valuable guidance for collecting data from these other sources Wikidata is viewed as an important source for expanding the language range and providing multilingual metadata more easily than with current library systems38

Identifiers for ldquoworksrdquo represent a particular challenge as there is no consensus on what represents a ldquodistinctive workrdquo39 Local work identifiers cannot be shared or reused Focus Group members voiced concern that differing interpretations of what a ldquoworkrdquo is could hamper the ability to reuse data created elsewhere and look to a central trusted repository like OCLC to publish persistent Work Identifiers that could be used throughout the community

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

8 Transitioning to the Next Generation of Metadata

FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers40

Identifiers need to be both unchanging over time and independent of where the digital object is or will be stored For instance identifiers for data sets such as digital resources and collections in institutional repositories include system-generated IDs locally minted identifiers PURL handles DOIs (Digital Object Identifiers) URIs URNs and ARKs (Archival Resource Keys) A few examples of DOI and ARK Identifiers are shown in figure 3 Resources can have both multiple copies and versions that change over time Institutional repositories used as collaborative spaces can lead to multiple publications from the same data sets a problem compounded by self-deposits from coauthors at different institutions into different repositories Furthermore libraries (as well as funders and national assessment efforts) want to be able to link related pieces (such as preprints supplementary data and images) with the publication Multiple DOIs pointing to the same object pose a problem Some libraries use DataCite or Crossref to mint and publish unique long-term identifiers and thus minimize the potential for broken citation links41 Ideally libraries would contribute to a hub for the metadata describing their researchersrsquo data sets regardless of where the data sets are stored

Identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and

establishing the relationships and links among them

Examples of Some DOI and ARK Identifiers

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 9

MOVING FROM ldquoAUTHORITY CONTROLrdquo TO ldquoIDENTITY MANAGEMENTrdquo

The emphasis in authority work is shifting from construction of text strings to identity managementmdashdifferentiating entities creating identifiers and establishing relationships among entities42 The intellectual work required to differentiate names is the same for both current authority work and identify management Focus Group members agree that the future is in identity management and getting away from ldquomanaging text stringsrdquo as the basis of controlling headings in bibliographic records43 But identity management poses a change in focus from providing access points in resource descriptions to describing the entities in the resource (work persons corporate bodies places events) and establishing the relationships and links among them

The transition from ldquoauthority controlrdquo and ldquoauthorized access pointsrdquo in our legacy systems to identity management requires us to separate identifiers from their associated labels A unique identifier could be associated with an aggregate of attributes that would enable users to distinguish one entity from another44 Ideally libraries could take advantage of the identifiers and attributes from other nonlibrary sources Wikidata for example aggregates a variety of identifiers as well as labels in different languages as shown in figure 4

FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages

Providing contextual information is more important than providing one unique label Labels could differ depending on communitiesmdashsuch as various spellings of names and terms different languages and writing systems and different disciplinesmdashwithout requiring that one form be preferred over another Label preference becomes localized rather than homogenized for global use

One Wikidata Identifier Links to Other Identifiers and Labels in Different Languages

Wikidata Identifier Q19526

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

10 Transitioning to the Next Generation of Metadata

A key barrier to moving from text strings to identity management is the lack of technology and infrastructure to support it New tools are needed to index and display information about the entities described with links to the sources of the identifiers Since multiple identifiers may point to the same entity tools to reconcile them will also be needed Some systems index only the controlled access points which is a problem when dealing with names represented in different languages Can library systems be reconfigured to deal with identifiers as the match point collocation point and the key to whatever associated labels are displayed and indexed45

Some Focus Group members are experimenting with Wikidata as another option to assign identifiers for names not represented in authority files which would broaden the potential pool of contributors46 Many libraries are looking toward Wikidata and Wikibasemdashthe software platform underlying Wikidatamdashto solve some of the long-standing issues faced by technical services departments archival units and others47 WikidataWikibase are viewed as a possible alternative to traditional authority control and have other potential benefits such as embedded multilingual support and bridging the silos describing the institutionrsquos resources Focus Group membersrsquo experimentations with Wikidata and OCLC projects using the Wikibase platform indicate that Wikibase is a plausible framework for realizing linked data implementations This infrastructure could enable the Focus Group and the wider bibliographic and archival communities to focus on the entities that need to be created their relationship with each other and how best they can increase discoverability by end-users

Identity management could also bridge the variations of names found in journal articles scholarly

profile services and library catalogs transcending these now siloed domains This bridge is a

requirement to fulfill the promises of linked data

Because Wikidata was originally seeded by drawing data from Wikipedia representation of books in Wikidata has a focus on ldquoworksrdquo and their authors This focus on works and authors could be viewed as an alternate version of the traditional authortitle entries in authority files Books that are ldquonotablerdquo are more likely to be represented in Wikidata Recently an effort to support citations in Wikipedia articles WikiCite48 demonstrates a need to register and support identifiers that make up those citations including information about a specific edition or document

One of the most practicalmdashand powerfulmdashaspects of identity management is to reduce the amount of copyingpasting in library metadata workflows when an identifier is stewarded in an external location Identifiers could provide a bridge between MARC and non-MARC environments and to nonlibrary resources Librarians would not have to be the experts in all domains49 Many resources curated or managed by libraries are not under authority control such as digital and archival collections institutional repositories and research data Identifiers could provide links to these resources Identity management could also bridge the variations of names found in journal articles scholarly profile services and library catalogs transcending these now siloed domains This bridge is a requirement to fulfill the promises of linked data

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 11

ADDRESSING THE NEED FOR MULTIPLE VOCABULARIES AND EQUITY DIVERSITY AND INCLUSION

Concepts or subject headings are particularly thorny as terminology can differ depending on the time period and discipline In some cases terms may be considered pejorative harmful or even racist by some communities Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities The issues around equity diversity and inclusion are complex and the vocabulary used in subject headings is just one aspect and language-neutral identifiers represent one approach

The issue of supporting ldquoalternaterdquo subject headings came to the fore when the Library of Congressrsquo initial solution to change the LC subject heading for ldquoIllegal aliensrdquo to ldquoUndocumented immigrantsrdquo failed to be implemented This prompted one Focus Group member to comment ldquoBeing held hostage to a national system slow to change in the face of changing semantics is damaging to libraries as generally we pride ourselves on being welcoming and inclusiverdquo End-users hold their libraries accountable for what appears in their catalogs Although LCSH is the Library of Congress Subject Headings it is used worldwide sometimes losing its context50

Addressing language issues is important as libraries seek to develop relationships and build trust with marginalized communities

Some see Faceted Application of Subject Terminology (FAST)51 as a means to engage the community to mitigate the issues that have driven attempts to develop alternate subject headings for LCSH A subset of the Focus Group has been applying FAST to records that would otherwise lack any subjects FAST was originally developed by OCLC as a medium between totally-uncontrolled keywords at one end of the spectrum and difficult-to-learn-and-apply precoordinated subject strings at the other end52 FAST headings provide an easy transition to a linked data environment since each FAST heading has a unique identifier As FAST headings are generated from Library of Congress precoordinated subject headings they can also include the same terminology that some consider inappropriate or disrespectful

The recently launched FAST Policy and Outreach Committee53 represents FAST users to oversee community engagement term contributions and procedures and to recommend improvements Its vision statement reads

FAST will be a fully supported widely adopted and community developed general subject vocabulary derived from LCSH with tools and services that serve the needs of diverse communities and contexts54

Multiple overlapping and sometimes conflicting vocabularies already exist in legacy library data55 For example Focus Group members in New Zealand add terms from the Māori Subject Headings thesaurus (Ngā Upoko Tukutuku) to the same records as LC subject headings Focus Group members in Australia add terms authorized in the Australian Institute of Aboriginal and Torres Strait Islander Studies (AIATSIS) Thesauri56 There may be no satisfactory equivalences across languages Different concepts in national library vocabularies cannot always be mapped unequivocally to English concepts The multiyear MACS (Multilingual Access to Subjects)57 built relationships across three subject vocabularies Library of Congress Subject Headings the German GND integrated authority file and the French RAMEAU (Reacutepertoire drsquoautoriteacute-matiegravere encyclopeacutedique et alphabeacutetique unifieacute) It has been a labor-intensive process and is not known to be widely implemented58

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

12 Transitioning to the Next Generation of Metadata

A growing percentage of data in institutionsrsquo discovery layers comes from non-MARC nonlibrary sources Metadata describing universitiesrsquo research data and materials in Institutional Repositories is usually treated differentlymdashand separately How should institutions provide normalization and access to the entities described so users do not experience the ldquocollision of name spacesrdquo and ambiguous terms (or terms meaning different things depending on the source) Synaptica Knowledge Solutionsrsquo Ontology Management ndash Graphite tool59 to create and manage various types of controlled vocabularies seems promising in this context

Focus Group members cited examples of established vocabularies or datasets that have become outdated or do not provide for local needs or sensibilities Slow or unresponsive maintenance models for established vocabularies have tempted some to consider distributed models High training thresholds to participate in current models have contributed to a desire for alternatives60

Linked data could provide the means for local communities to prefer a different label for an established vocabularyrsquos preferred term for a concept or entity One might reference a local description of a concept or entity not representedmdashor not represented satisfactorilymdashin established vocabularies or linked data sources If these kinds of amendments and additions are made possible in a linked data environment others could agree (or disagree) with the point of view by linking to the new resource Such a distributed model for managing both terminology and entity description raises issues around metadata stability expectations metadata interoperability and metadata maintenance How could a distributed model avoid people duplicating work on the same entity or concept How would a distributed model record the trustworthiness of the contributors or determine who would be allowed to contribute

Educational institutions and libraries have under-taken EDI initiatives and metadata departments have been

struggling to support them

Stability and permanence issues have been highlighted by the numerous vocabularies created for specific projects that once funding ended remain frozen in time As one Focus Group member noted ldquoNothing is sadder than a vocabulary that someone invented that was left to go stalerdquo Such examples provide a major reason for librarians wanting to rely on international authority files rather than on local solutions They also exemplify the value of the Library of Congress taking on the entire cost of creating and maintaining LCSH

The OCLC Research report on the findings from a 2017 survey of the Research Library Partnership on equity diversity and inclusion (EDI)61 spurred discussions on the complexity of embedding equity diversity and inclusion in controlled vocabularies in library catalogs62 Educational institutions and libraries have undertaken EDI initiatives and metadata departments have been struggling to support them The excerpt from the EDI survey in figure 5 shows that metadata in library catalogs lags behind other areas in support of the institutionrsquos EDI goals and principles

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 13

FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership63

Focus Group members are eager to provide more detailed subject access than is currently offered by national subject heading systems such as LCSH which has more granularity for Western European places than for Southeast Asia and Africa They see the need to offer more accurate and current terms and replace terms that reflect bias or are considered offensive with more neutral terms

Challenges that Focus Group members identified in offering more respectful terminology in subject access for users

bull Discovery Using other less-offensive vocabularies locally can split collections in the library catalog thus hampering discovery of all relevant materials

bull Lack of consensus Focus Group members doubt that there can ever be complete consensus about any given text string Terms that may be offensive to one community may not always be clear to others (For example ldquoDissident artrdquo rather than ldquoNon-conformist artrdquo)

bull Speed The process of changing standard subject headings can be very slow

bull Capacity Changing headings in existing records can require a massive undertaking Targeted access point maintenance occurs in the context of access point maintenance generally For example the Library of Congress recently changed the heading ldquoMentally handicappedrdquo to ldquoPeople with mental disabilitiesrdquo Implementing such changes in the catalog can involve a mix of automated vended and manual remediation methods as well as decisions about resource allocation64 Some noted it would be less labor-intensive to present a ldquocultural sensitivityrdquo message as part of the search interface to alert users that terms and annotations they find in a catalog may reflect the creatorrsquos attitude or the period in which the item was created and may be considered inappropriate today in some contexts

0

20

40

60

80

100

Collectionbuilding

Select materialsfor digitiation

Metadata inlibrary catalogs

Metadata inarchival

collections

Metadata indigital

collections

Terminologiesor vocabularies

Changed Plan to change

What Areas Have You Changed or Plan to Change Due to Your Institutions EDI Goals and Principals

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

14 Transitioning to the Next Generation of Metadata

bull Sharing Local vocabularies cannot be shared with other systems

bull Maintenance Some who have tried to use local vocabularies more suitable for their context and communities found them too burdensome to maintain and abandoned them

bull Language barriers The language of our controlled vocabularies may be exclusive to audiences who do not read that language The Ohio State University Libraries has tried to address this by developing some non-Latin script equivalents of English subject terms

bull Classification Current classification systems are apt to segregate ethnic groups Rather than include them as part of an overall concept like history education or literature they tend to be grouped together as one lump As institutions store more publications off-site the need to shelve materials together and have just one classification in a record has subsided but few apply multiple classifications in one record

Requirements for a distributed system that accommodates multiple vocabularies and could also support EDI converged around the need to support semantic relationships among different vocabularies Communities of practice need a hub to aggregate and reconcile terms within their own domains It was noted that different communities of practice might use terms that conflict with othersrsquo terminologies or mean different things The PCC Linked Data Advisory Committeersquos Linked Data Infrastructure Models Areas of Focus for PCC Strategies65 describes high-level functional requirements and a spectrum of models anticipated as cultural heritage institutions adopt linked data as a strategy for data sharing The model must be both scalable and extensible with the ability to accommodate the proliferation of new topics and terms symptomatic of the humanities and sciences and facilitate contributions by the researchers themselves It needs to be flexible enough to coexist with other vocabularies

Replacing text strings with stable persistent identifiers would facilitate using different labels depending on context This would accommodate both different languages and scripts (and different spellings within a language such as American vs British English) as well as terms that are more respectful to marginalized communities The 19 October 2017 OCLC Research Works in Progress webinar on ldquoDecolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archivesrdquo66 described the process launched by the Association for Manitoba Archives and the University of Alberta Libraries to examine subject headings and classification schemes and consider how they might be more respectful and inclusive of the experiences of Indigenous peoples

Expanding vocabularies to include those used in other communities requires building trust relationships A model of ldquocommunity contributionrdquo for new terms and community voting could be more inclusive Librariesrsquo current ldquoconsensus environmentrdquo excludes a lot of people Much metadata is currently created according to Western knowledge constructs and systems have been designed around them Communicating the history of changes and the provenance of each new or modified term would provide transparency that could contribute to the trustworthiness of the source The edit history and discussion pages that are part of each Wikidata entity description is a possible model to follow Requiring provenance as part of a distributed vocabulary model may help in creating an alternative environment that is more equitable diverse and inclusive

LINKED DATA CHALLENGES

Identifiers and vocabularies are just two components required in the transition to linked data A vital part of describing entities are the associated statements made How will libraries resolve or reconcile conflicts between statements67 Different types of inconsistencies may appear than do now with for example different birthdates for persons The provenance of each statement becomes more critical Even in the current environment certain sources are more trusted and give catalogers confidence in their accuracy Libraries often have a list of ldquopreferred sourcesrdquo68 OCLC Research explored how libraries might apply Googlersquos ldquoKnowledge Vaultrdquo to identify

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 15

statements that may be more ldquotruthfulrdquo than others in the 2015 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo69 Focus Group members posited that aggregations such as WorldCat the Virtual International Authority File (VIAF) and Wikidata may allow the library community to view statements from these sources with more confidence than others Librarians could share their expertise by establishing the relationships between and among statements from different sources

But good linked data requires good metadata Administrators are well aware of the tension between delivering access to library collections in a timely manner and providing good quality description The metadata descriptions must be full enough to allow libraries to manage their collections and to support accessibility and discoverability for the end-user Many libraries need to compromise between speed over accuracy speed over depth or brevity over nothing These compromises are reflected by using inadequate vendor records by creating minimal or less-than-full level descriptions for certain types of resources and by limiting authority work Minimal-level cataloging is commonly used as an alternative to leaving materials uncatalogued often because of large volume of materials and insufficient staff resources70 These less-than-full descriptions will result in fewer and less accurate linked data statements

Good linked data requires good metadata

The transition period from legacy cataloging systems reliant on MARC to a new linked data environment with entities and statements has many challenges since both standards and practices are moving targets It is unclear how libraries will share statements rather than records in a linked data environment Focus Group members were divided on whether a centralized linked data store would be needed to provide ldquotrustworthy provenancerdquo or whether data should be distributed with peer-to-peer sharing71 Different statements might be correct in their own contexts ldquoConflicting statementsrdquo might represent different world views Selecting statements based on provenance could be challenging to our principles of equity diversity and inclusion

The Focus Group members wondered how to involve the many vendors that supply or process MARC records in the transition to linked data In the United Kingdom the Jisc initiative ldquoPlan Mrdquo (where ldquoMrdquo stands for ldquometadatardquo) seeks to streamline the metadata supply among libraries publishers data suppliers and infrastructure providers72 Among the implications cited by stakeholders in the UKrsquos National Bibliographic Knowledgebase (NBK) in Plan Mrsquos 10-year vision ldquoLinked data instances of the NBK will need to be created and maintained requiring convincing business-cases around the impact this could have on researchrdquo73 Working with others in the linked data environment involves people unfamiliar with the library environment requiring metadata specialists to explain what their needs are in terms nonlibrarians can understand

Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo CollectionsOCLC Vice President and Chief Strategist Lorcan Dempsey refers to the shifting emphasis of libraries to support the creation curation and discoverability of institutional resources as the ldquoinside-out collectionrdquo (in contrast to the ldquooutside-in collectionrdquo in which the library buys or licenses materials from external providers to make them accessible to a local audience) Providing access to a broader range of local external and collaborative resources around user needs is the ldquofacilitated collectionrdquo74 Focus Group membersrsquo activities have increasingly focused on metadata that will provide access to the resources unique to their institutions as well as those in their consortia or national networks

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

16 Transitioning to the Next Generation of Metadata

All resources collected created and curated by libraries require metadata to make them discoverable However Focus Group members concentrated on the challenges and issues related to specific formats

bull Archival collections

bull Archived websites

bull Audio and video collections

bull Image collections

bull Research data

All these content types can be categorized as belonging to ldquoinside-outrdquo collections and present different challenges For example Focus Group members described efforts to retrieve metadata from completely different systems as ldquosuper challengingrdquo In addition many of these resources are not under any authority control Reconciling access points from various thesauri and metadata mapping work requires technical services expertise and skills75 This reconciliation also will be needed in the previously discussed linked data environment

This section summarizes the discussions on these format types

ARCHIVAL COLLECTIONS

Archival collections are in many ways the crown jewels of collections as they are unique research resources providing insights into the world across many centuries and places providing the primary sources for new knowledge creation Increasing visibility for these collections reaps significant benefits for both scholars and libraries and archives Archives are however complex and present different metadata issues compared with traditional library collections As institutions turn to ArchiveSpace and other content management systems to provide infrastructures for structured archival metadata various issues are emerging76

Archives have had more autonomy than libraries within their institutions because they have unique collections with their own population of users their own metadata standards and their own systems While some institutions have integrated archival processing within technical services most maintain a separate unit Archivists do not have the tradition of creating authority records and sharing identifiers for the same entity as is common among librarians They also tend to use the fullest form of a name based on the information found in collections while librarians focus on ldquopreferredrdquo form found in publications Even so a significant shift from artisanal archival approaches to metadata standardization has been occurring

So how can archivists and librarians better integrate their metadata and name authority practices The number of personal names in archival collections can be so large that most are uncontrolled and without identifiers However the contextual information that archivists provide for person and organization entities could enrich the information provided in authority filesmdasha use case that was explored in the 2017-2018 Project Passage pilot77 and examined in more detail in 2019-2020 by the OCLC Research Library Partners Archives and Special Collections Linked Data Review Group78

The increased reliance on electronic and digital resources during the COVID-19 pandemic will likely accelerate institutions digitizing their archival and distinctive collections that have been available only in physical form79 More metadata may be created from digitized versions of these resources

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 17

ARCHIVED WEBSITES

For some years archives and libraries have been archiving web resources of scholarly or institutional interest to ensure their continuing access and long-term survival Some websites are ephemeral or intentionally temporary such as those created for a specific event Institutions would like to archive and preserve the content of their websites as part of their historical record A large majority of web content is harvested by web crawlers but the metadata generated by harvesting alone is considered insufficient to support discovery80

Some archived websites are institutional theme-based collections supporting a specific research area such as Columbia Universityrsquos Human Rights Historic Preservation and Urban Planning and New York City Religions81 National libraries archive websites within their national domain For example the National Library of Australiarsquos Archived websites (1996-now)82 collect websites in partnership with cultural institutions around Australia government websites formerly accessible through the Australian Government Web Archive and websites from the au domain collected annually through large scale crawl harvests These curated collections by subject provide snapshots of Australian cultural and social history Examples of consortia-based archived websites include the Ivy Plus Libraries Confederationrsquos Collaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY) and Contemporary Composers Web Archive (CCWA) and the New York Art Resources Consortium (NYARC) which captures dynamic web-based versions of auction catalogs and artist gallery and museum websites83

The Focus Group discussed the challenges for creating and managing the metadata needed to enhance machine-harvested metadata from websites Some of the challenges identified

bull Type of website matters Descriptive metadata requirements may depend on the type of website archived (eg transient sites research data social media or organizational sites) Sometimes only the content of the sites is archived when the user experience of the site (its ldquolook-and-feelrdquo) is not considered significant

bull Practices vary Some characteristics of websites are not addressed by existing descriptive rules such as RDA (Resource Description and Access) and DACS (Describing Archives A Content Standard) Metadata tends to follow bibliographic description traditions or archival practice depending on who creates the metadata

bull Consider scale and projected use Metadata requirements may differ depending on the scale of material being archived and its projected use For example digital humanists look at web content as data and analyze it for purposes such as identifying trends while other users merely need individual pages The level of metadata granularity (collection seedURL document) may also vary based on anticipated user needs scale of material being crawled and available staffing

bull Update frequency Many websites are updated repeatedly requiring re-crawling when the content has changed Some types of change can result in capture failures

bull Multi-institutional websites Some websites are archived by multiple institutions Each may have captured the same site on different dates and with varying crawl specifications How can they be searched and used in conjunction with one another

A 2015 survey of the OCLC Research Library Partnership revealed the ldquolack of descriptive metadata guidelinesrdquo as the biggest challenge related to website archiving leading to the formation of the OCLC Research Library Partnership Web Archiving Metadata Working Grouprdquo84 The challenges that the Focus Group identified were explored in depth by this working group which issued a report of its recommendations in 2018 Descriptive Metadata for Web Archiving85

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

18 Transitioning to the Next Generation of Metadata

AUDIO AND VIDEO COLLECTIONS

Focus Group members reported that their institutions had repositories filled with large amounts of audiovisual (AV) materials which often represent unique local collections86 However as Chela Scott Weber states in the publication Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries ldquoFor decades AV materials in our collections were largely either separated from related manuscript material (often shunted away to be dealt with at a later date) or treated at the item level Both have served to create sizeable backlogs of un-quantified and un-described AV materialsrdquo87 Much of this audiovisual material urgently requires preservation digitization clarification of conditions of use and description

In addition the needed skill sets and stakeholders across institutions are complex The nature of the management of AV resources requires knowledge of the use context as well as technical metadata issues providing a complex environment to think through requirements for description and access Further libraries must deal with current time-based media that is either being produced locally as part of research and learning or streaming media that is being commercially licensed

Focus Group discussions focused on the AV resources within archival collectionsmdashoften in deteriorating formats in large backlogs and sometimes requiring rare and expensive equipment to access and assess the files For locally generated content institutions prefer that the creators describe their own resources

Metadata describing the same AV materials may differ across library archival

and digital asset management systems

The overarching challenge was how much effort needs to be invested in describing these AV materials because they are unique Institutions have used hierarchical structures to aggregate similar materials with finding aids that are marked up in the Encoded Archival Description standard88 which provides useful contextual information for individual items within a specific collection But often an aggregated approach to description can lack important details about individual items needed for discovery such as transcribed title and date broadcast This is a particularly acute issue for legacy data describing recordings from years past Metadata describing the same AV materials may differ across library archival and digital asset management systems Some hope that better discovery layers will alleviate the need to repeat the same information across databases but presenting the information to users would require using consistent access points across systems The same will be true in a linked data environment But the challenge to link between items and the finding aid and to maintain the links over time despite changes in systems will remain

Metadata for AV materials needs to include important technical information such as details about the AV capture and digitization process like compression year digitized the technology used and file compatibility This data is critical to ensure perpetual access for such enormous files and mercurial playback formats Some Focus Group members have implemented PREMIS (Preservation Metadata Implementation Strategies)89 the international standard for metadata to support the preservation of digital objects and ensure their long-term usability for some of their AV materials

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 19

OCLC Senior Program Officer Chela Scott Weber continues working with the Research Library Partnership on the needs and challenges of managing AV collections summarized in OCLC Research Hanging Together Blog posts ldquoAssessing Needs of AV in Special Collectionsrdquo and ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo90 A subset of the Focus Group members responded to Weberrsquos 2019 survey to assess the needs of audiovisual materials in special collections within the Research Library Partnership incorporating AV collections into archival and digital collections workflows were two of the challenges that most interested respondents as shown in figure 6

FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections

IMAGE COLLECTIONS

Focus group members manage a wide variety of image collections presenting challenges for metadata management In some cases image collections that developed outside the library and its data models need to be integrated with other collections or into new search environments Depending on the nature of the collection and its users questions arise concerning identification of works depiction of entities chronology geography provenance genre subjects (ldquoof-nessrdquo and ldquoabout-nessrdquo) Image collections also offer opportunities for crowdsourcing and interdisciplinary research91

Many libraries describe their digital image resources on the collection level while selectively describing items As much as possible enhancements are done in batch Some do authority work depending on the quality of the accompanying metadata Some libraries have disseminated metadata guidelines to help bring more consistency to the data

Among the challenges discussed by the Focus Group

bull Variety of systems and schemas Image collections created in different parts of the institution such as art or anthropology departments serve different purposes and use different systems and schemas than those used by the library The metadata often comes in spreadsheets or unstructured accompanying data Often the metadata

What Challenges Related to Managing AV Collections Would You Be Interested in the RLP Addressing (n=137)

0 20 40 60 80 100 120 140

Resource allocation assessment and prioritization

Digital asset management and preservation

Physical collection management

Digitization and preservation reformatting

Incorporating into digital collection workflows

Incorporating into archival workflows

Selection appraisal and collection development

Very interested Interested Somewhat interested Not interested

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

20 Transitioning to the Next Generation of Metadata

created by other departments requires much editing massaging and manual review The situation is simpler when all digitization is handled through one centralized location and the library does all the metadata creation Some libraries are using Dublin Core for their image collectionsrsquo metadata and others are using MODS (Metadata Object Description Schema)92 Some wrap the metadata records in METS (Metadata Encoding and Transmission Standard)93 a schema maintained by the Library of Congress designed to express the hierarchical nature of digital library objects the names and locations of the files that comprise those objects and the associated metadata Some suggested that MODS be used in conjunction with MADS (Metadata Authority Description Schema)94

bull Duplicate metadata for different objects Metadata for a scanned set of drawings may be identical even though there are slight differences in those drawings Duplicating the metadata across similar objects is likely due to limited staff Possibly the faculty or the photographers could add more details

bull Lack of provenance A common challenge is receiving image collections with scanty metadata and with no information regarding their provenance For example metadata staff at one institution were given OCRrsquoed text retrieved by a researcher from HathiTrust Millions of images lacked the location of the original source material and therefore limitedmdashif not discreditedmdashany further use

bull Maintaining links between metadata and images How should libraries store images and keep them in sync with the metadata There may be rights issues from relying on a specific platform to maintain links between metadata and images Where should thumbnails live

bull Relating multiple views and versions of same object Multiple versions of the same object taken over time can be very useful for disciplines like forensics For example Brown University decided to describe a ldquoblobrdquo of various images of the same thing in different formats and then describe the specific versions included This work was done even though there is no system yet that displays relationships among images such as components of a piece even when the metadata in records are wrapped and stored in METS

bull Managing relationships with faculty and curators It is important to ensure that faculty feel their needs are met Collaboration is necessary among holders of the materials metadata specialists and developers as all come from different perspectives The challenge is to support both a specific purpose and groups of people as well as large-scale discovery

bull Aggregating digital collections Institutions have been sharing the metadata for their digital collections with both national and international discovery services Within individual organizations librarians create and recreate metadata for digital and digitized resources in a plethora of systemsmdashthe library catalog archive management digital asset and preservation systems the institutional repository research management systems and external subscription-based repositories Targets for sharing this metadata range from tailored topic-based digital discovery services to national and international aggregations such as Google Scholar HathiTrust Digital Public Library of America (DPLA) Internet Archive Trove and WorldCat to online exhibitions such as Google Arts and Culture or image banks such as Flickr or Unsplash Such aggregations can help inform an institutionrsquos own collection development as librarians can see their contributions in the context of othersrsquo content and identify gaps that they may wish to fill locally95

Aggregators often have different guidelines and input formats Aggregatorsrsquo very reasonable contention that they cannot support many variations in submitted metadata conflict with contributorsrsquo very reasonable contention that they cannot support the different needs of a wide range of aggregators Disseminating corrections or updates between the source and the aggregation can be problematic Information that may have been corrected in the chain

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 21

leading to incorporation in the aggregation may not be pushed back to the source so that the same errors must be corrected repeatedly It is often not clear what data elements have been updated when or by whom

Aggregating images and bringing together different images or versions of the same object was the goal of the 2012-2013 OCLC Research Europeana Innovation Pilots96 which developed a method for hierarchically structuring cultural objects at different similarity levels to find ldquosemantic clustersrdquomdashthose that include terms with a similar meaning In 2017 OCLC implemented the International Interoperability Image Framework (IIIF)97 Presentation Manifest protocol in its CONTENTdm digital content management system an aggregation containing more than 70 million digital records contributed by over 2500 libraries worldwide In 2019 OCLC Research developed an IIIF Explorer experimental prototype for testing and evaluation that searches across all the CONTENTdm images using the IIIF Presentation Manifest protocol98 as shown in figure 7 Aggregating content across IIIF-compliant systems may facilitate discovery across the plethora of platforms containing digital content mentioned above

In 2020 OCLC Research launched the CONTENTdm Linked Data Pilot99 focused on developing scalable methods and approaches to produce machine-readable representations of entities and relationships and make visible the connections formerly invisible Existing record-based metadata is being converted to linked data by replacing strings of characters with identifiers from known authority files and local library-defined vocabularies the resulting graphs of entities and relationships can retrieve contextual information from sources such as GeoNames and Wikidata This pilot (to be completed by August 2020) is addressing many of the above challenges identified by the Focus Group

FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections

The OCLC ResearchWorks IIIF Explorer Retrieves Images about ldquoParis Mapsrdquo across CONTENTdm Collections

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

22 Transitioning to the Next Generation of Metadata

RESEARCH DATA

Research funders expect that the research data resulting from research they support will be archived and made available to others Institutions have allotted more resources to collecting and curating this scholarly resource for reuse within the scholarly record OCLC Research Scientist Ixchel Fanielrsquos two-part blog entry ldquoData Management and Curation in 21st Century Archivesrdquo (Sept 2015)100 prompted the discussion among Focus Group members on the metadata needed for research data management101 To maximize the chances that metadata for research data are shareable (that is sufficiently comparable) and helpful to those considering reusing the data our communities would benefit from sharing ideas and discussing plans to meet emerging discovery needs

Metadata is important for both discovery and reuse of datasets The 2016 OCLC Research report Building Blocks Laying the Foundation for a Research Data Management Program noted

Datasets are useful only when they can be understood Encourage researchers to provide structured information about their data providing context and meaning and allowing others to find use and properly cite the data At minimum advise researchers to clearly tell the story of how they gathered and used the data and for what purpose This information is best placed in a readmetxt file that includes project information and project-level metadata as well as metadata about the data itself (eg file names file formats and software used title author date funder copyright holder description keywords observation unit kind of data type of data and language)102

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing

systems as well as technical systems can be invaluable to the research life cycle

All four of the of 2017-2018 The Realities of Research Data Management series webinars103 led by OCLC Senior Program Officer Rebecca Bryant mention the importance of metadata Research information infrastructure calls on many of the key strengths of the library profession Metadata is fundamental to our complex research environmentmdashbeginning with the planning our researchers do before and during the creation of data to managing the data then to disseminating the knowledge gained finally through to understanding the impact engagement and the resulting reputation of our home institutions104

Librariesrsquo expertise in metadata standards identifiers linked data and data sharing systems as well as technical systems can be invaluable to the research life cycle Faniel highlighted this value in the November 2019 Next blog post ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo

[C]ataloging for discovery using terms and definitions that are consistent across repositories is critical if we want the data and their associated metadata to be discoverable for reuse in any way imaginable Librarians and archivists can help create consistencies in metadata that build bridges between researchers and repositories thus greatly increasing the discovery reuse and value of their institutionsrsquo research investments105

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 23

National contexts differ For example our Australian colleagues can take advantage of Australiarsquos National Computational Infrastructure for big data and the Australian Data Archive for the social sciences106 Canada has launched a national network called Portage for the ldquoshared stewardship of research datardquo107

Some institutions have developed templates to capture metadata in a structured form Some Focus Group members noted the need to keep such forms as simple as possible as it can be difficult to get researchers to fill them in All agreed data creators needed to be the main source of metadata But what will inspire data creators to produce quality metadata New ways of training and outreach are needed an area of exploration within Metadata 2020rsquos Research Communications project108

Focus Group members generally agreed on the data elements required to support reuse licenses processing steps tools data documentation data definitions data steward grant numbers and geospatial and temporal data (where relevant) Metadata schema used includes Dublin Core MODS (Metadata Object Description Schema) and DDI (Data Documentation Initiativersquos metadata standard) The Digital Curation Centre in the UK provides a linked catalog of metadata standards109 The Research Data Alliancersquos Metadata Standards Directory Working Group has set up a community-maintained directory of metadata standards for different disciplines110 The disparity of metadata schemas across disciplines represents a hurdle in institutionsrsquo discovery layers

The importance of identifiers for both the research data and the data creator(s) has become more widely acknowledged DOIs Handles and ARKs (Archival Resource Key) have been used to provide persistent access to datasets Identifiers are available at the full data set level and for component parts and they can be used to track downloads and potentially help measure impact Both ORCID and ISNI are in use to identify data creators uniquely and work is continuing on the Research Organizational Registry to address institutional affiliations

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages

of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata

governancerdquo across an institution makes integrating workflows between repositories and

discovery layers problematic

Some Focus Group members have started to analyze the metadata requirements for the research data life cycle not just the final product asking questions like Who are the collaborators111 How do various projects use different data files What kind of analysis tools do they use What are the relationships of data files across a project between related projects and to other scholarly output such as related journal articles Research support services such as those offered at the University of Michigan112 are being developed to assist researchers during all phases of the research data life cycle often through collaboration with other campus units

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

24 Transitioning to the Next Generation of Metadata

Among the most critical issues identified by Focus Group members is that metadata specialists need to be more involved in the early stages of the research life cycle Researchers need to understand the importance of metadata in their data management plans The lack of ldquometadata governancerdquo across an institution makes integrating workflows between repositories and discovery layers problematic

Some libraries have started to provide research data management support in a variety of ways For example metadata specialists work with their institutionsrsquo Scholarly Communications and Publishing Division which also manages the Institutional Repository These institutional repositories may have only the ldquocitationrdquo or ldquometadata-onlyrdquo records with a link to the full text or data set deposited in a disciplinary repository ldquoMetadata consultation servicesrdquo may be provided to advise on the data management plan which includes appropriate metadata standards and controlled vocabularies a strategy to effectively organize their data and an approach that will facilitate reuse of the data years after the research is completed The OCLC Research The Realities of Research Data Management report series classifies metadata support as part of the ldquoexpertiserdquo function and flags some variations in its case studies113 At the University of Illinois at Urbana-Champaign metadata consultants help researchers with metadata regardless of where the research data is deposited Monash University supports metadata curation only for local deposits114

Communication is key for researchers to understand the importance of metadata throughout the research life cycle Some universities offer ldquoresearch sprintsrdquo where researchers partner with a team of expert librarians that may include metadata creation management analysis and preservation The ldquoShared BigData Gateway for Research Librariesrdquo hosted by Indiana University and partially funded by the Institute of Museum and Library Services is developing a cloud-based platform to share data and expertise across institutions including datasets such as records from the US Patent and Trademark Office and the Microsoft Academic Graph115

Curation of research data as part of the evolving scholarly record requires new skill sets including deeper domain knowledge and experience with data modeling and ontology development Libraries are investing more effort in becoming part of their facultyrsquos research process and are offering services that help ensure that their research data will be accessible if not also preserved Good metadata will help guide other researchers to the research data they need for their own projects and the data creators will have the satisfaction of knowing that their data has benefitted others116

Evolution of ldquoMetadata as a ServicerdquoMetadata underlies the ability to discover all resources in the inside-out and facilitated collections Focus Group members anticipate more involvement with metadata creation beyond the traditional library catalog and new services that leverage both legacy and future metadata

METRICS

Library strategic goals often include key phrases such as ldquofoster discovery and userdquo ldquoenrich the user experiencerdquo and ldquoexplore new ways to support the whole life cycle of scholarshiprdquo all of which is predicated on quality metadata Usage metricsmdashsuch as how frequently items have been borrowed cited downloaded or requestedmdashcould be used to build a wide range of library services and activities Focus Group members identified some possible services informing collection management decisions about weeding projects and identifying materials for offsite storage evaluating subscriptions comparing citations for researchersrsquo publications with what the library is not purchasing and improving relevancy ranking personalizing search results offering recommendation services in the discovery layer and measuring impact of library usage on research or student success or learning analytics117 The University of Minnesota conducted a study to investigate the relationships between first-year undergraduate

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 25

studentsrsquo use of the academic library academic achievement and retention118 The results suggest a strong correlation between using academic library services and resourcesmdashparticularly database logins book loans electronic journal logins and library workstation loginsmdashand higher grade point averages In the United Kingdom the Jisc Library Impact Data Project found a similar correlation119

CONSULTANCY

Metadatarsquos value is demonstrated by integrating it into the fabric of both the library and other units across the campus For example metadata specialists can provide ldquometadata as a servicerdquomdashconsultancy in the earliest stages of both library and research projects120 An emerging trend is for digital humanities departments to request advice from metadata specialists on metadata standards and how to use controlled vocabularies More visibility of this metadata consultant role appears in recent library job postings In one Metadata Librarian job posting at Cornell121 one duty cited was 20 for ldquometadata outreach and consultationrdquo ldquoMaintains strong working relationships and communicates regularly with staff across Cornell fostering collaborative efforts between Metadata Services and the greater Cornell communityrdquo Georgia Tech is recruiting a metadata librarian who will ldquoserve as a metadata consultant to larger library projectsinitiatives Work closely with other Library departments Emory University Libraries GALILEO University System of Georgia Libraries and other partners involved in joint projectsrdquo122

NEW APPLICATIONS

The shared and consistent use of MARC fields supports new applications Libraries currently use identifiers in bibliographic records to fetch tables of contents abstracts reviews and cover images and to generate floor maps of where to locate resources in a specific classification range (such as in OCLCrsquos integration with StackMap)123 Bibliographic metadata is used to populate Digital Asset Management Systems and Institutional Repositories and with tools such as Tableau and OpenRefine can enable a richer analysis of collections and a view of collections MARC metadata is connecting scholars with the bibliographic data for their projects and can generate relationships to related resources with applications such as Yewno124 MARC metadata is also being used to inform institutional output measures and affiliation tracking and serves as a source to build organization histories The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases Analyzing catalog data by data mining can also be used to enrich the metadata such as generating language codes missing in related records or identifying the original titles of translated works MARC data has also supported generating subject maps to discover relationships otherwise not explicit in the cataloging metadata125

The provenance implicit in an institutionrsquos bibliographic metadata has proven helpful in documenting theft cases

Visualizations represent another type of metadata service A striking example is from the Auslang national codeathon held in 2019 a collaboration among the National Library of Australia the Australian Institute of Aboriginal and Torres Strait Islander Studies Trove Libraries Australia and the State and Territory librariesmdasha national code-a-thon to identify items in Indigenous Australian languages126 Figure 8 shows the results a map indicating the 465 Indigenous languages in the Australian National Bibliographic Database tagged as a result of the code-a-thon and an example of involving the community to enhance bibliographic metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

26 Transitioning to the Next Generation of Metadata

FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon

BIBLIOMETRICS

Library metadata is also being used to generate bibliometrics statistical methods to analyze books articles and other publications Using library metadata for Digital Humanities research projects has much potential For example a Library of Congress researcher used bibliographic metadata to trace the history of publishing and copyright UCLA researchers have used cataloging metadata to track the commercialization of inventions such as insulin

Using library metadata for Digital Humanities research projects has much potential

A novel use of cataloging metadata was by Hachette UK the United Kingdomrsquos second largest bookseller which commissioned the Graphic History Company to unlock the histories of all nine of Hachettersquos publishing houses and weave them into a cohesive story by asking the British Library for every author and book title published by their nine publishing houses spanning 250 years The British Library provided a list of over 55000 authors from which 5000 of the most prominent were selected to create perhaps the most beautiful example of metadata use a giant mural spanning eight floors featuring all 5000 authors in chronological order (Figure 9 shows one part of the mural for more images of the mural see Hachettersquos River of Authors)127

Distribution of 465 Indigenous Language Codes in the Australian National Bibliographic Database

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 27

FIGURE 9 UK Hachettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

SEMANTIC INDEXING

When controlled vocabularies and thesauri are converted into linked open data and shared publicly their traditional role of facilitating collection browsing will fade but could find a renewed purpose within web-based knowledge organizations systems (KOS)128 As Marcia Zeng points out in Knowledge Organization Systems (KOS) in the Semantic Web a multi-dimensional review

a KOS vocabulary is more than just the source of values to be used in metadata descriptions by modeling the underlying semantic structures of domains KOS act as semantic road maps and make possible a common orientation by indexers and future users whether human or machine129

Good examples of such repurposing are the Getty Vocabularies that allow browsing of Gettyrsquos representation of knowledge and also helps users generate their own SPARQL queries that can be embedded in external applications Another example is Social Networks and Archival Context (SNAC)130 which enables browsing of entities and relationships independently of their collections of origins In such cases the discovery tool pivots to being person-centric (or family-centric or topic-centric) rather than (only) collection-centric

Rather than one ldquoglobal domainrdquo metadata specialists could provide added value by adding bridges from the metadata in library domain databases to other domains Wikidata is an example of a platform aggregating entities from different sources and linking to more details in various language Wikipedias Some institutions have employed Wikimedians in Residence to accelerate this process

UK Hatchettersquos ldquoRiver of Authorsrdquo Generated from the British Libraryrsquos Catalog Metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

28 Transitioning to the Next Generation of Metadata

Focus Group members hope that Artificial Intelligencemdashor at least machine-learningmdashcould mitigate the amount of current manual effort to link names and concepts in research data Perhaps algorithms could be used to match names based on related metadata or sources relate topics to each other based on context disambiguate names based on other metadata available and analyze datasets to identify possible biases in a collection131 A few Research Library Partners participate in Artificial Intelligence for Libraries Archives amp Museums (AI4LAM)132 an ldquointernational participatory community focused on advancing the use of artificial intelligence in for and by libraries archives and museumsrdquo133 Some high-level recommendations on enhancing descriptions at scale and improving discovery are noted in Thomas Padillarsquos OCLC Research 2019 position paper Responsible Operations Data Science Machine Learning and AI in Libraries134

Preparing for Future Staffing RequirementsThe anticipated changes from transitioning to the next generation of metadata will also shift staffing requirements to prepare for the future Focus Group members identified new skill sets needed for both professionals entering the field as well as seasoned catalogers driven by the changing information technology landscape and increasing staff attrition Focus Group members characterized professionals as those who ldquotrail-blaze innovationsrdquo which are then routinized for nonprofessionals These discussions reinforce Padillarsquos recommendations on investigating core competencies committing to internal talent and expanding evidence-based training135

THE CULTURE SHIFT

Focus Group members reported a delicate balance of allocating staff to ldquotraditional cataloging activitiesrdquo (such as original and copy cataloging authority work) with more exploratory RampD projects such as linked data projects exploring new data models and technologies such as Wikidata and learning about emerging standards and identifiers A culture shift is needed from pride in production alone to valuing opportunities to learn explore and try new approaches to metadata work Metadata specialists must understand that improving all metadata is more important than any individualrsquos productivity numbers This culture shift requires buy-in from administrators to support training programs for staff to learn new workflows for processing multiple formats and to view metadata specialists as more than just ldquoproduction machinesrdquo

Metadata managers faced with staff reductions while still being expected to maintain production levels must justify allocating staff time for RampDmdashor ldquoplay timerdquomdashto explore such questions as What can we stop doing What is the one thing you learned that we all need to do more of What do you need to move forward What open source software could help us do the work more efficiently What new methods could enhance discoverability access and use of our facilitated collections Managers must incorporate goals for success that are not based solely on numbers136

A culture shift is needed from pride in production alone to valuing

opportunities to learn explore and try new approaches to metadata work

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 29

Indications of this culture shift include institutions outsourcing some metadata work or training support staff to create metadata for the ldquoeasier stuffrdquo while mandating that catalogers only do what well-trained humans can do Metadata managers could scope the materials requiring metadata that support staff or students can handle providing templates where possible If you remove these tasks the majority of what remains requires highly skilled metadata specialists with expertise in languages physical formats and disambiguating and describing persons organizations and other entities

LEARNING OPPORTUNITIES

To encourage the culture shift among metadata specialists to change their mindsets about how they work and stimulate interest in learning opportunities Focus Group members have used several approaches

bull Identify who on your team has the aptitude to acquire new skills At one institution the staff member shared what she learned and the whole unit became ldquolivelyrdquo because she brought her colleagues along It created appreciation for ldquocontinuous learningrdquo and staff presented their activities at national conferences

bull Convene cross-team group discussions to look at problem metadata and come up with solutions encouraging staff to move forward together Staff less interested in new skills can pick up some of the production from those learning new skills and producing less

bull Launch ldquoreading clubsrdquo where staff all read an article and respond to three discussion questions to inspire metadata specialists to think about broader metadata issues outside of their daily work

bull Hold weekly group ldquovideo-viewing brown-bag lunchesrdquo for staff on new developments such as linked data so staff can ldquowatch and learnrdquo together

bull Participate in multi-institutional projects to collaborate with peers to solve problems and cross-pollinate ideas

bull Encourage participation in professional conferences and standards development

Educating and training catalogers has been at the forefront of many discussions in the metadata community Both new professionals and seasoned catalogers need new skills to successfully transition to the emerging linked data environment Catalogers are learning about and experimenting with BIBFRAME while remaining responsible for traditional bibliographic control of collections Metadata specialists utilize tools for metadata mapping remediation and enhancement They identify and map semantic relationships among assorted taxonomies to make multiple thesauri intelligible to end users For the more technical aspects of metadata management competition for talent from other industries has been increasing This may intensify as metadata becomes more central to various areas of government nonprofit and private enterprise137

NEW TOOLS AND SKILLS

The extent of metadata specialistsrsquo collaboration with IT or systems staff varies among institutions Such collaboration is necessary for many reasons including managing data that is outside the libraryrsquos control Some noted that ldquocultural differencesrdquo exist between the professions developers tend to be more dynamic and focus on quick prototyping and iteration while librarians focus first on documenting what is needed and are more ldquoschematicrdquo Which is more likely to be successful teaching metadata specialists IT skills or teaching IT staff metadata principles The ldquoholy grailrdquo is to recruit someone with an IT background interested in metadata services Retaining staff with IT skills is difficultmdashthey are in demand for higher-paying jobs in the private sector Focus Group membersrsquo experiences have shown that it is easier for librarians to learn programming skills than it is to hire IT specialists to learn the ldquotechnical services mindsetrdquo

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

30 Transitioning to the Next Generation of Metadata

Ideally Focus Group members would like a few staff who have the technical skills to take batch actions on data or at least who know how to use the external tools available to automate as many tasks as possible

For many years Focus Group members have been using MarcEdit andor other tools such as OpenRefine scripts (eg Python Ruby or Perl) and macros for metadata reconciliation and batch processing138 MarcEdit is the most popular tool and has a large global and active user community as indicated in its 2017 Usage Snapshot139 Terry Reese MarcEditrsquos developer estimates that about one-third of all users work in non-MARC environments and two-thirds of the most active users are OCLC members Focus Group members reported that they use MarcEdit for data transformation enhancing vendor records building MARC records from spreadsheets linked data reconciliation de-duplicating records within a file merging two or more records into one Z3950 harvesting and reconciling metadata before sending records to other systems The 2017 release of MarcEdit 7 includes new features such as light weight clustering functionality providing a powerful way to find relationships between data without introducing a large learning curve It also has mechanisms that support linked data140 Reese has created a series of YouTube tutorials available on his MarcEdit Playlist141

Managers want to focus less on specific schema and more on metadata principles that can be applied to a range of different formats and environments Desirable soft skills include problem-solving effective collaboration willingnessmdasheven eagernessmdashto try new things understanding researchersrsquo needs and advocacy Although some metadata specialists have always enjoyed experimenting with new approaches often they lack the time to learn new tools or methodologies while keeping up with their routine work assignments Libraries should promote metadata as an exciting career option to new professionals in venues such as library schools and ALArsquos New Members Roundtable Emphasizing that metadata encompasses much more than library catalogingmdashentity identification descriptive standards used in various academic disciplines describing born-digital archival and research data that can interact with the semantic Webmdashcan increase its appeal As one Focus Group member noted ldquoWe bring order out of a vacuumrdquo142

SELF-EDUCATION

Metadata increasingly is being created outside the library by academics and students with minimal training leading to a need for more catalogers with record maintenance skills Focus Group members noted the need for technical skills such as simple scripting data remediation and identity management to reconcile equivalents across multiple registries Frequently mentioned sources of instruction include Library Juice Academy MarcEdit tutorials LinkedIn Learning (which acquired Lyndacom) Library of Congress Training Webinars ALCTS Webinars Code Academy Software Carpentry and conferences such as Code4Lib and Mashcat143 W3Crsquos Data on the Web Best Practices and Semantic Web for the Working Ontologist were recommended reading144 Crucial to the success of such training is the ability to quickly apply what has been learned If new skills are not applied people forget what they have learned Staff feel frustrated when they have invested the time to learn something that they cannot use in their daily work

Focus Group members have seen a big shift from relying on Library of Congress instructions to self-education from multiple sources Some approaches mentioned by participants

bull Emphasize continuity of metadata principles when introducing an expanded scope of work

bull Take advantage of the Library Workflow Exchange145 a site designed to help librarians share workflows and best practices across institutions including scripts

bull From the 2017 Electronic Resources and Libraries Conference ldquoDonrsquot wait iteraterdquo In other words rather than waiting until staff have all the required skills let them do tasks

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 31

iteratively learning as they go so they are ready for new tasks when the time comes

bull Have small groups of metadata specialists take programming courses together after which they can continue to meet and discuss ways to apply their new skills to automate routine tasks

bull Encourage staff to participate in events such as OCLCrsquos DevConnect Webinars146 to learn from libraries using OCLC APIs to enhance their library operations and services

bull Create reading and study groups that include cross-campus or cross-divisional staff

bull Expand the scope of current work to enable metadata specialists to apply their skills to new domains or terminology such as using Dublin Core for digital collections Involve staff in digital projects from the conceptual stage to developing project specifications quality assurance practices and tool selection As a bonus this fosters collaborative teamwork relationships

bull Hire graduate students in computer science for short-term tasks such as creating scripts

ADDRESSING STAFF TURNOVER

Turnover in a professional position within a cataloging or metadata unit now comes with the significant risk that it may be impossible to convince administrators to retain the position in the unit and repost it This is particularly true when the outgoing incumbent performed a high proportion of ldquotraditionalrdquo work such as original cataloging in MARC The odds of retaining the position are much greater if careful thought goes into how the position could be reconfigured or re-purposed to meet emerging needs147

Most Focus Group members have had to address varying amounts of turnover either from retirements or staff leaving for other positions Half of them needed to reconfigure the positions of outgoing librarians Looking at what other institutions are advertising helps in creating an attractive position description Many cataloging positions do not require an MLS degree so recruiting professionals has focused on adaptability aligning new positions with university priorities and on eagerness to learn and take initiative in areas such as metadata for research output open access digital collections and linked data Mapping out future strategies and designing ways of making metadata interoperate across systems have been components of recent recruitments New staff with programming skills are sought after as they can apply batch techniques to metadata that can compensate for the loss of staff Using technology in the service of library service helps catalogers ldquodo more with lessrdquo

Focus Group members want new staff to be aware of both the shared cataloging community and the overlaps with other cultural heritage organizations such as archives and museums The library environment keeps evolving and librarians have had to reflect on their priorities moving forward Metadata managers need to rethink the roles of metadata specialists beyond ldquotraditionalrdquo cataloging work Potential candidates with more flexible skill sets have become more attractive than those with a traditional cataloging background who may not adapt well to working in new environments Many cataloging roles and descriptions may need to be rewritten and retooled Perhaps the only activities that will perennially remain professional tasks are those like management scouting new trends strategizing participating in new international standards leading and implementing changes and thinking about the big picture

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

32 Transitioning to the Next Generation of Metadata

I M P A C T

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections Focus Group membersrsquo linked data activities including their participation in OCLC Researchrsquos Project Passage and CONTENTdm Linked Data pilots contributed to OCLC obtaining Andrew W Mellon funding for its two-year Shared Entity Management Infrastructure project148 launched in January 2020 Eleven of the Shared Entity Management Infrastructure Advisory Group members are also Focus Group members The project builds on OCLC Researchrsquos linked data work and will provide a production infrastructure with persistent authoritative identifiers for persons and works It will be largely API-based allowing librarians to customize their workflows around linked data infrastructure This infrastructure has long been desired by Focus Group members as it will address many of the challenges documented above around persistent identifiers especially identifiers for ldquoworksrdquo

The next generation of metadata will become even more focused on entities rather than record-based descriptions of an institutionrsquos collections

Authoritative persistent identifiers provided by the Shared Entity Management Infrastructure will supply the needed language-neutral links to trustworthy sources The metadata that libraries archives and other cultural heritage institutions have created and will create will provide the context for these entities as ldquostatementsrdquo associated with those links The impact will be global affecting how librarians and archivists will describe the inside-out and facilitated collections inspiring new offerings of ldquometadata as a servicerdquo and influencing future staffing requirements

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 33

A C K N O W L E D G M E N T S

OCLC Research wishes to thank all Research Library Partners Metadata Managers Focus Group members who have shared their experiences and thoughts summarized here Additionally we extend thanks to the dedicated Metadata Managers Planning Group which initiated the topics and provided the context statements and question sets the responses to which served as the basis of our discussions In addition we particularly appreciate the insightful comments from the following Focus Group members who reviewed an earlier version of this document their comments improved this synthesis

bull Charlene Chou New York University

bull Suzanne Pilsk Smithsonian Institution

bull Greg Reeve Brigham Young University

bull Alexander Whelan Columbia University

bull Helen K R Williams London School of Economics

I also extend thanks to current and former OCLC colleagues Rebecca Bryant Jody DeRidder Annette Dortmund Rachel Frick Janifer Gatenby Jean Godby Shane Huddleston Andrew Pace Merrilee Proffitt Nathan Putnam Stephan Schindehette and Chela Weber for their careful review of all or parts of earlier versions of this document Thank you to Erica Melko for her editing Jeanette McNicol for the design of this report and JD Shipengrover for the cover artwork

On a personal note I have greatly benefited from my interactions with the OCLC Research Partners Metadata Managers Focus Group and have been delighted to play a small part in this transition to the next generation of metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

34 Transitioning to the Next Generation of Metadata

A P P E N D I X

OCLC Research Library Partners Metadata Managers Planning Group

2015-2020

Planning Group members selected the topics for the OCLC Research Library Partners Metadata Managers discussions wrote up the context statements why the topic was important and timely and developed the question sets that Focus Group members responded to The Planning Group initiators for each topic also reviewed draft summaries that were later posted on the OCLC Research Hanging Together blog

Current Planning Group members are listed in bold institutional affiliations are given for the time when they served on the Planning Group

bull Jennifer Baxmeyer Princeton University

bull Sharon Farnel University of Alberta

bull Steven Folsom Harvard University and Cornell University

bull Erin Grant University of Washington

bull Dawn Hale Johns Hopkins University

bull Myung-Ja Han University of Illinois Urbana-Champaign

bull Kate Harcourt Columbia University

bull Corey Harper New York University

bull Stephen Hearn University of Minnesota

bull Daniel Lovins Yale University

bull Roxanne Missingham Australian National University

bull Chew Chiat Naun Cornell University and Harvard University

bull Suzanne Pilsk Smithsonian

bull John Riemer University of California Los Angeles

bull Carlen Ruschoff University of Maryland

bull Philip Schreur Stanford University

bull Jackie Shieh George Washington University

bull Melanie Wacker Columbia University

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 35

N O T E S

1 OCLC Research Library Partnership Metadata Managers Focus Group httpswwwoclcorgresearchareasdata-sciencemetadata-managershtml

2 OCLC Research ldquoThe OCLC Research Library Partnershiprdquo httpswwwoclcorgresearchpartnershiphtml

3 Smith-Yoshimura 2017 ldquoMetadata Advocacyrdquo Hanging Together the OCLC Research Blog 17 October 2017 httpshangingtogetherorgp=6282

4 British Library 2019 Foundations for the Future The British Libraryrsquos Collection Metadata Strategy 2019-2023 London British Library httpswwwblukbibliographicpdfsbritish

-library-collection-metadata-strategy-2019-2023pdf

5 Ibid 4

6 Statistics as of 1 June 2020

7 Library of Congess ldquoProgram for Cooperative Catalogingrdquo httpswwwlocgovabapcc

8 Except for June 2020 when all discussions were held virtually only because of the COVID-19 pandemic

9 See Hanging Together The OCLC Research Blog search-category Metadata httpshangingtogetherorgcat=81

10 Benefits from affiliating with the RLP are cited in Smith-Yoshimura 2018 ldquoWhat Metadata Managers Expect from and Value about the Research Library Partnershiprdquo Hanging Together The OCLC Research Blog 16 April 2018 httpshangingtogetherorgp=6683

11 Analyses of the three International Linked Data Surveys for Implementers 2014-2018 and the spreadsheet of all responses to the surveys are available See OCLC Research 2020

ldquoLinked Datardquo International Linked Data Survey httpswwwoclcorgresearchthemes data-sciencelinkeddatalinked-data-surveyhtml

12 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Holly Tomren and Craig Thomas 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

OCLC Research 2020 ldquoCONTENTdm Linked Data pilotrdquo httpswwwoclcorgresearch themesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

Library of Congress ldquoBIBFRAMErdquo Bibliographic Framework Initiative httpswwwlocgovbibframe

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

36 Transitioning to the Next Generation of Metadata

Futornick Michelle 2019 ldquoLD4P2 Linked Data for Production Pathway to Implementationrdquo LS4P2 Project Background and Goals Lyrasis Posted 14 January 2019 httpswikilyrasisorgdisplayLD4P2LD4P2+Project+Background+and+Goals

Share-VDE (Share Virtual Discovery Environment) ldquoAn Effective Environment for the Use of Linked Data by Librariesrdquo Accessed 17 September 2019 httpswwwshare-vdeorg sharevdeclustersl=en

Casalini Michele Chiat Naun Chew Chad Cluff Michelle Durocher Steven Folsom Paul Frank Janifer Gatenby Jean Godby Jason Kovari Nancy Lorimer Clifford Lynch Peter Murray Jeremy Myntti Anna Neatrour Cory Nimer Suzanne Pilsk Daniel Pitti Isabel Quintana Jing Wang and Simeon Warner 2018 National Strategy for Shareable Local Name Authorities National Forum White Paper Ithaka New York Cornell University Library eCommons digital repository httpshdlhandlenet181356343

13 Library of Congress 2019 PCC (Program for Cooperative Cataloging) Task Group on Linked Data Best Practices 2019 PCC Task Group on Linked Data Best Practices Final Report Submitted to PCC Policy Committee 12 September 2019 Washington DC Library of Congress httpswwwlocgovabapcctaskgrouplinked-data-best-practices-final-reportpdf

Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO

-rev2018-05-22pdf

Library of Congress 2020 ldquoPCC Task Group on URIs in MARCrdquo Programs of the PCC Charge Accessed 19 September 2020 httpswwwlocgovabapccbibframeTaskGroupsURI-TaskGrouphtml

Library of Congress 2018 ldquoPCC Linked Data Advisory Committee Linked Data Advisory Committee Chargerdquo PCC Task Groups 2018 Task Groups Revised 24 July 2018 [Word doc 28KB] httpswwwlocgovabapcctaskgrouptask-groupshtml

14 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo OCLC Research Hanging Together Blog 13 May 2015 httpshangingtogetherorgp=5195

15 OCLC Research 2020 ldquoLInked Datardquo Linked Data Overview httpswwwoclcorgresearchareasdata-sciencelinkeddatalinked-data-overviewhtml [All figures CC BY 40]

16 Smith-Yoshimura Karen 2019 ldquolsquoFuture Proofingrsquo of Catalogingrdquo OCLC Research Hanging Together Blog 10 November 2019 httpshangingtogetherorgp=7526

17 ORCID Connecting Research and Researchers ldquoWhat is Orcidrdquo Our Vision Accessed 19 September 2020 httpsorcidorgaboutwhat-is-orcidmission

18 See for example the list of signatories of journal publishers requiring ORCID IDs for authors ORCID ldquoORCID Open Letter - Publishersrdquo Accessed 19 September 2020 httpsorcidorgcontentrequiring-orcid-publication-workflows-open-letter

19 ISNI ldquoWhat is ISNIrdquo Accessed 19 September 2020 httpsisniorgpagewhat-is-isni

20 HathiTrust is a not-for-profit collaborative of academic and research libraries preserving more than 17 million digitized items See HathiTrust Digital Library ldquoWelcome to

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 37

HathtiTrustrdquo Accessed 19 September 2020 httpswwwhathitrustorgabout

21 GeoNames ldquoBrowse the Namesrdquo Accessed 19 September 2020 httpswwwgeonamesorg

22 Bryant Rebecca Annette Dortmund and Constance Malpas 2017 Convenience and Compliance Case Studies on Persistent Identifiers in European Research Information Dublin OH OCLC Research httpsdoiorg1025333C32K7M

23 ISNI currently holds 1102 million identities 1026 million individuals (of which 291 million are researchers) and 933039 organizations Statistics retrieved from ISNI See ISNI ldquoKey Statisticsrdquo Accessed 5 May 2020 httpsisniorg

24 Library of Congress 2020 ldquoNACO ndash Name Authority Cooperative Programrdquo Documents and Updates Programs for Cataloging and Acquisitions (PCC) Accessed 19 September 2020 httpwwwlocgovabapccnacoindexhtml

25 Smith-Yoshimura Karen 2015 ldquoGetting identifiers Created for Legacy Namesrdquo Hanging Together The OCLC Research Blog 30 October 2015 httpshangingtogetherorgp=5463

26 Smith-Yoshimura Karen 2013 ldquoIrreconcilable Differences Name Authority Control amp Humanities Scholarshiprdquo Hanging Together The OCLC Research Blog 27 March 2013 httpshangingtogetherorgp=2621

27 Smith-Yoshimura Karen 2017 ldquoUse Cases for Local Identifiersrdquo Hanging Together The OCLC Research Blog 5 May 2017 httpshangingtogetherorgp=5938

28 OCLC Research 2020 ldquoRegistering Researchers in Authority Filesrdquo httpswwwoclcorgresearchthemesresearch-collectionsregistering-researchershtml

29 Smith-Yoshimura Karen Janifer Gatenby Grace Agnew Christopher Brown Kate Byrne Matt Carruthers Peter Fletcher Stephen Hearn Xiaoli Li Marina Muilwijk Chew Chiat Naun John Riemer Roderick Sadler Jing Wang Glen Wiley and Kayla Willey 2016 Addressing the Challenges with Organizational Identifiers and ISNI Dublin Ohio OCLC Research httpsdoiorg1025333C3FC9Q

30 Research Organization Registry (ROR) ldquoAboutrdquo httpsrororgabout

31 V M Abazov B Abbott B S Acharya M Adams T Adams J P Agnew G D Alexeev et al (2014) 2020 ldquoPrecision Measurement of the Top-Quark Mass in Lepton+jets Final Statesrdquo (Archived 24 February 2020) ArXivorg 150107912 httpsarxivorgpdf14051756

32 Smith-Yoshimura Karen 2017 ldquoHow Much Metadata Is Practicalrdquo Hanging Together The OCLC Research Blog 14 November 2017 httpshangingtogetherorgp=6328

33 University of Minnesota 2020 ldquoExpertsMinnesotardquo Find Profiles httpsexpertsumneduenpersons or

University of Illinois at Urbana-Champaign 2020 ldquoIllinois Expertsrdquo Find U of I Research View Scholarly Works and Discover New Collaborators httpsexpertsillinoisedu

34 The National Institute of Health (NIH) National Institute of Allergy and Infectious Diseases

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

38 Transitioning to the Next Generation of Metadata

(NIAID) on 7 April 2020 mandates ORCIDs for training fellowship education or career development awards in FY20 See NIH NIAID 2019 ldquoORCID iD Required for Some Encouraged for Allrdquo NIAID Funding News Last reviewed 7 August 2019 httpswwwniaidnihgovgrants-contractsorcid-id-required-some-encouraged-all

See also Lyrasis 2020 ldquoSciENcv and ORCID to Streamline NIH and NSF Grant Applicationsrdquo LyrasisNow (blog) 8 April 2020 httpslyrasisnoworgsciencv-and-orcid-to -streamline-nih-and-nsf-grant-applications

35 Smith-Yoshimura Karen 2016 ldquoMetadata Reconciliationrdquo Hanging Together The OCLC Research Blog 28 September 2016 httpshangingtogetherorgp=5710

36 Carruthers Matt (2014) 2020 mcarruthersLCNAF-Named-Entity-Reconciliation GitHub Repository httpsgithubcommcarruthersLCNAF-Named-Entity-Reconciliation

37 Deliot Corine Steven Folsom Myung-Ja Han Nancy Lorimer Terry Reese and Adam Schiff 2019 Formulating and Obtaining URIs A Guide to Commonly used Vocabularies and Reference Sources Library of Congress PCC Task Group on URIs in MARC httpswwwlocgovabapccbibframeTaskGroupsformulate_obtain_URI_guidepdf

38 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=5710

39 Smith-Yoshimura Karen 2015 ldquoPersistent Identifiers for Local Collectionsrdquo Hanging Together The OCLC Research Blog 27 October 2015 httpshangingtogetherorgp=5445

40 See DOI examples in detail from DOI 2020 ldquoDOI System Examplesrdquo Accessed 20 September 2020 httpswwwdoiorgdemoshtml and

See ARK examples in detail from Department Dallas (Tex ) Police 1963 ldquo[Photographs of Identification Cards]rdquo Collection University of North Texas The Portal to Texas History digital repository httpstexashistoryunteduark67531metapth346793

41 DataCite ldquoAssign DOIsrdquo httpsdataciteorgdoishtml

Wilkinson Laura J 2020 ldquoConstructing your DOIsrdquo Crossref The Crossref Curriculum Last updated 8 April 2020 httpswwwcrossreforgeducationmember-setupconstructing

-your-dois

42 ldquoIdentity managementrdquo here reflects its usage among metadata specialists (See for example Library of Congress 2018 ldquoCharge for PCC Task Group on Identity Management in NACOrdquo 5 American Bar Association Program for Cooperative Cataloging Revised 22 May 2018 httpswwwlocgovabapcctaskgroupPCC-TG-Identity-Management-in-NACO-rev2018-05-22pdf) But the term has other meanings depending on the audience for example identity access management as described in

Wikiwand ldquoIdentity Managementrdquo httpswwwwikiwandcomenIdentity_management

43 Smith-Yoshimura Karen 2018 ldquoThe Coverage of Identity Management Workrdquo Hanging Together The OCLC Research Blog 8 October 2018 httpshangingtogetherorgp=6805

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 39

44 Smith-Yoshimura Karen 2017 ldquoBeyond the Authorized Access Point Hanging Together The OCLC Research Blog 10 October 2017 httpshangingtogetherorgp=6271

45 Smith-Yoshimura ldquoCoverage of Identity Managementrdquo (See note 43)

46 Watch the highly-rated Webinar by Andrew Lih and Robert Fernandez 2018 ldquoWorks in Progress Webinar Introduction to Wikidata for Librarians Structuring Wikipedia and Beyondrdquo Produced by OCLC Research 12 June 2018 MP4 video presentation 1151 httpswwwoclcorgresearchevents201806-12html

47 Smith-Yoshimura Karen 2020 ldquoExperimentations with WikidataWikibase Hanging Together The OCLC Research Blog 18 June 2020 httpshangingtogetherorgp=8002

48 Wikimedia ldquoWikiCiterdquo Home httpsmetawikimediaorgwikiWikiCite

49 Smith-Yoshimura Karen 2016 ldquoImpact of Identifiers on Authority Workflows Hanging Together The OCLC Research Blog 22 March 2016 httpshangingtogetherorgp=5603

50 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headings Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

51 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo httpswwwoclcorgenfasthtml

52 Smith-Yoshimura Karen 2016 ldquoFaceted Vocabulariesrdquo Hanging Together The OCLC Research Blog 31 October 2016 httpshangingtogetherorgp=5739

53 OCLC 2020 ldquoFASTrdquo (See note 51)

54 OCLC 2020 ldquoFAST (Faceted Application of Subject Terminology)rdquo Heading 3 FAST Policy and Outreach (FPOC) Committee httpswwwoclcorgenfasthtml

55 Smith-Yoshimura Karen 2017 ldquoVocabulary Control Data in Discovery Environmentsrdquo Hanging Together The OCLC Research Blog 5 October 2017 httpshangingtogetherorgp=6264

56 National Library New Zealand Government ldquoNgā Upoko Tukutuku Māori Subject Headingsrdquo httpmshupokonatlibgovtnzmshupoko

AIATSIS Pathways Gateway to the AIATSIS Thesauri ldquoPathwaysrdquo httpwww1aiatsisgovau

57 Deutsche Nationalbibliothek 2019 ldquoMACS - Multilingual Access to Subjectsrdquo (Archived 13 Jan 2019) httpswebarchiveorgweb20190113003823httpswwwdnbdeENWirKooperationMACSmacs_nodehtml

58 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 March 2019 httpshangingtogetherorgp=7135

59 Synaptica ldquoOntology Management ndash Graphiterdquo httpswwwsynapticacomgraphite

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

40 Transitioning to the Next Generation of Metadata

60 Smith-Yoshimura Karen 2018 ldquoAre Distributed Models for Vocabulary Maintenance Viablerdquo Hanging Together The OCLC Research Blog 12 April 2018 httpshangingtogetherorgp=6672

61 OCLC Research 2020 ldquoEquity Diversity and Inclusion in the OCLC Research Library Partnership Surveyrdquo Overview Accessed 20 September 2020 httpswwwoclcorg researchareascommunity-catalystsrlp-edihtml

62 Smith-Yoshimura Karen 2018 ldquoCreating Metadata for Equity Diversity and Inclusionrdquo Hanging Together The OCLC Research Blog 7 November 2018 httpshangingtogetherorgp=6833

63 Smith-Yoshimura ldquoDistributed Modelsrdquo (See note 60)

64 Smith-Yoshimura Karen 2019 ldquoStrategies for Alternate Subject Headings and Maintaining Subject Headingsrdquo Hanging Together The OCLC Research Blog 29 October 2019 httpshangingtogetherorgp=7591

65 Baxmeyer Jennifer Karen Coyle Joanna Dyla MJ Han Steven Folsom Phil Schreur and Tim Thompson 2017 Linked Data Infrastructure Models Areas of Focus for PCC Strategies Library of Congress PCC Linked Data Advisory Committee httpswwwlocgovabapcc documentsLinkedDataInfrastructureModelspdf

66 Bone Christine Sharon Farnel Sheila Laroque and Brett Lougheed 2017 ldquoWorks in Progress Webinar Decolonizing Descriptions Finding Naming and Changing the Relationship between Indigenous People Libraries and Archives ldquo Produced by OCLC Research 19 October 2017 MP4 video presentation 543500 httpswwwoclcorg researchevents201710-19html

67 Smith-Yoshimura Karen 2015 ldquoShift to Linked Data for Productionrdquo Hanging Together The OCLC Research Blog 13 May 2015 httpshangingtogetherorgp=5195

68 Smith-Yoshimura Karen 2015 ldquoWorking in Shared Filesrdquo Hanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

69 Bruce Washburn and Jeff Mixter 2018 ldquoWorks in Progress Webinar Looking Inside the Library Knowledge Vaultrdquo Produced by OCLC Research 12 August 2018 MP4 video presentation 574500 httpswwwoclcorgresearchevents201508-12html

70 Smith-Yoshimura Karen 2019 Systematic Reviews of Our Metadata Hanging Together The OCLC Research Blog 10 April 2019 httpshangingtogetherorgp=7117

71 Smith-Yoshimura Karen 2015 ldquoWorking in Shared File rdquoHanging Together The OCLC Research Blog 7 April 2015 httpshangingtogetherorgp=5091

72 Jisc Library Services nd ldquoWhat Is lsquoPlan Mrsquordquo Accessed 21 September 2020 httpslibraryservicesjiscinvolveorgwp201912plan-m

Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

To learn more about the current phase of ldquoPlan Mrdquo (MayndashNovember 2020) see Grindley Neil ldquoMoving Plan M Forwards ndash We Need Your Helprdquo Library Services (PlanM) (blog) Jisc 6 May 2020 httpslibraryservicesjiscinvolveorgwp202005planm_nextphase

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 41

73 Grindley Neil 2019 ldquoPlan M Definition Principles and Directionrdquo Jisc (Word docx) httplibraryservicesjiscinvolveorgwpfiles201912Plan-M-Definition-and

-Direction-1docx

74 Dempsey Lorcan 2016 ldquoLibrary Collections in the Life of the User Two Directionsrdquo LIBER Quarterly 26(4) 338ndash359 httpdoiorg1018352lq10170

75 Smith-Yoshimura Karen 2019 ldquoPresenting Metadata from Different Sources in Discovery Layers Hanging Together The OCLC Research Blog 16 April 2019 httpshangingtogetherorgp=7880

76 Smith-Yoshimura Karen 2017 ldquoMetadata for Archival Collectionsrdquo Hanging Together The OCLC Research Blog 30 May 2017 httpshangingtogetherorgp=5903

77 Godby Jean Karen Smith-Yoshimura Bruce Washburn Kalan Knudson Davis Karen Detling Christine Fernsebner Eslao Steven Folsom Xiaoli Li Marc McGee Karen Miller Honor Moody Craig Thomas and Holly Tomren 2019 Creating Library Linked Data with Wikibase Lessons Learned from Project Passage 49-51 Dublin OH OCLC Research httpsdoiorg1025333faq3-ax08

78 The OCLC Research Library Partnership Archives and Special Collections Linked Data Review Group is described at httpswwwoclcorgresearchpartnershipworking-groups archives-special-collections-linked-data-reviewhtml

79 Smith-Yoshimura Karen 2020 ldquoMetadata Management in Times of Uncertaintyrdquo Hanging Together The OCLC Research Blog 15 June 2020 httpshangingtogetherorgp=7998

80 Smith-Yoshimura Karen 2016 ldquoMetadata for Archived Websitesrdquo Hanging Together The OCLC Research Blog 14 March 2016 httpshangingtogetherorgp=5591

81 Archive-It 2008 ldquoHuman Rightsrdquo Columbia University Libraries Collection (Archived May 2008) httpsarchive-itorgcollections1068

Archive-It 2010 ldquoNew York City Places and Spacesrdquo Columbia University Libraries Collection (Archived January 2010) httpsarchive-itorgcollections1757

Archive-It 2010 ldquoBurke Library New York City Religionsrdquo Columbia University Libraries Collection (Archived May 2010) httpsarchive-itorgcollections1945

82 NLA ldquoTroverdquo Archived Websites Sub Collections Accessed 20 September 2020 httpstrovenlagovauwebsite

83 Archive-It 2014 ldquoCollaborative Architecture Urbanism and Sustainability Web Archive (CAUSEWAY)rdquo Ivy Plus Libraries Confederation Collection (Archived June 2014) httpsarchive-itorgcollections4638

Archive-It 2013 ldquoContemporary Composers Web Archive (CCWA)rdquo Ivy Plus Libraries Confederation Collection (Archived October 2013) httpsarchive-itorgcollections4019

NYARC New York Art Resources Consortium ldquoWeb Archivingrdquo httpwwwnyarcorg contentweb-archiving

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

42 Transitioning to the Next Generation of Metadata

84 OCLC Research 2020 ldquoWeb Archiving Metadata Working Grouprdquo The Problem Addressing the Problem Outputs httpswwwoclcorgresearchthemesresearch -collectionswamhtml

85 Dooley Jackie and Kate Bowers 2018 Descriptive Metadata for Web Archiving Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group Dublin OH OCLC Research httpsdoiorg1025333C3005C

86 Smith-Yoshimura Karen 2018 ldquoMetadata for Audio and Videosrdquo Hanging Together The OCLC Research Blog 29 October 2018 httpshangingtogetherorgp=6814

87 Weber Chela Scott 2017 Research and Learning Agenda for Archives Special and Distinctive Collections in Research Libraries Dublin OH OCLC Research httpsdoiorg1025333C3C34F

88 Library of Congress ldquoStandardsrdquo Encoded Archival Description (EAD) Official Site Accessed 21 September 2020 httpswwwlocgovead

89 Library of Congress ldquoStandardsrdquo Preservation Metadata Maintenance Activity (PREMIS) Accessed 21 September 2020 httpswwwlocgovstandardspremis

90 Weber Chela Scott 2019 ldquoAssessing Needs of AV in Special Collectionsrdquo Hanging Together The OCLC Research Blog 23 July 2019 httpshangingtogetherorgp=7405

Weber Chela Scott 2019 ldquoScale amp Risk Discussing Challenges to Managing AV Collections in the RLPrdquo Hanging Together The OCLC Research Blog 1 October 2019 httpshangingtogetherorgp=7479

91 Smith-Yoshimura Karen 2015 ldquoManaging Metadata for Image Collectionsrdquo Hanging Together The OCLC Research Blog 9 April 2015 httpshangingtogetherorgp=5130

92 Library of Congress ldquoStandardsrdquo Metadata Object Description Schema (MODS) Accessed 21 September 2020 httpwwwlocgovstandardsmods

93 Ibid

94 Library of Congress ldquoStandardsrdquo Metadata Authority Description Schema (MADS)rdquo Accessed 21 September 2020 httpwwwlocgovstandardsmads

95 Smith-Yoshimura Karen 2016 ldquoSharing Digital Collections Workflowsrdquo Hanging Together The OCLC Research Blog 2 November 2016 httpshangingtogetherorgp=5744

96 OCLC Research 2020 ldquoEuropeana Innovation Pilotsrdquo Accessed 20 September 2020 httpwwwoclcorgresearchthemesdata-scienceeuropeanahtmlurlm=168921

97 IIIF (International Image Interoperability Framework) Enabling Richer Access to the Worldrsquos Images ldquoHomerdquo Accessed 20 September 2020 httpsiiifio

98 OCLC Research 2020 ldquoOCLC ResearchWorks IIIF Explorerrdquo httpswwwoclcorgresearch themesdata-scienceiiifiiifexplorerhtml

99 OCLC Research 2020 ldquoCONTENTdm Linked Data Pilotrdquo Introduction httpswwwoclcorgresearchthemesdata-sciencelinkeddatacontentdm-linked-data-pilothtml

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 43

100 Smith-Yoshimura Karen 2015 ldquoData Management and Curation in 21st Century Archives ndash Part 1rdquo 21 September 2015 httphangingtogetherorgp=5375

101 Smith-Yoshimura Karen 2016 ldquoMetadata for Research Data Managementrdquo Hanging Together The OCLC Research Blog 18 April 2016 httpshangingtogetherorgp=5616

102 Erway Ricky Laurence Horton Amy Nurnberger Reid Otsuji and Amy Rushing 2015 Building Blocks Laying the Foundation for a Research Data Management Program 8 Dublin OH OCLC Research httpsdoiorg1025333C39P86

103 See the OCLC Research Data Management Planning Guide at httpswwwoclcorgresearchareasresearch-collectionsrdmguidehtml

104 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

105 Faniel Ixchel M 2019 ldquoLetrsquos Cook Up Some Metadata Consistencyrdquo Next (blog) OCLC 21 November 2019 httpwwwoclcorgblogmainlets-cook-up-some-metadata

-consistency

106 NCI (National Computational Infrastructure) Australia ldquoHomerdquo Accessed 21 September 2020 httpnciorgau

ADA (Australian Data Archive) ldquoHomerdquo Accessed 21 September 2020 httpswwwadaeduau

107 Portage Network ldquoHomerdquo Accessed 21 September 2020 httpsportagenetworkca

108 Metadata 2020 is a ldquocollaboration advocating richer connected reusable open metadata for all research outputsrdquo (httpwwwmetadata2020org) The Metadata 2020 Researcher Communications project is outlined here httpwwwmetadata2020orgprojects researcher-communications

109 Digital Curation Centre ldquoDisciplinary Metadatardquo List of Metadata Standards Accessed 21 September 2020 httpwwwdccacukresourcesmetadata-standardslist

110 RDA Metadata Directory ldquoMetadata Standards Directory Working Grouprdquo GitHub Repository Accessed 21 September 2020 httprd-alliancegithubiometadata-directory

111 NISO is about to make CRediT (Contributor Roles Taxonomy)mdashwhich identifies 14 roles describing each contributorrsquos specific contribution to the scholarly outputmdasha standard CRediT was developed by CASRAI the Consortia Advancing Standards in Research Administration Information See CASRAI ldquoCRediT ndash Contributor Roles Taxonomyrdquo Accessed 21 September 2020 httpscasraiorgcredit

112 University of Michigan Library 2020 ldquoData Servicesrdquo httpwwwlibumicheduresearch -data-services

113 OCLC Research 2020 ldquoThe Realities of Research Data Managementrdquo Overview httpswwwoclcorgresearchpublications2017oclcresearch-research-data

-managementhtml

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

44 Transitioning to the Next Generation of Metadata

114 Bryant Rebecca Brian Lavoie and Constance Malpas 2017 Scoping the University RDM Service Bundle The Realities of Research Data Management Part 2 pp 16 21 Dublin OH OCLC Research httpsdoiorg1025333C3Z039

115 Indiana University 2018 ldquoIU will Lead $2 Million Partnership to Expand Access to Research Data IU Libraries and IU Network Science Institute Are Leading a Public-Private Partnership to Create the Shared BigData Gateway for Research Librariesrdquo News at UI (Science and Technology) Indiana University 18 October 2018 httpsnewsiuedu stories201810iureleases18-shared-bigdata-gateway-for-research-networkshtml

Microsoft 2020 ldquoMicrosoft Academic Graphrdquo Established 5 June 2015 httpswwwmicrosoftcomen-usresearchprojectmicrosoft-academic-graph

For more details watch the August 2019 recording of ldquoDemocratizing Access to Large Datasets through Shared Infrastructurerdquo See Wittenberg Jamie and Valentin Pentchev

ldquoWorks in Progress Webinar Democratizing Access to Large Datasets through Shared Infrastructurerdquo Produced by OCLC Research 8 August 2019 MP4 video presentation 583400 httpswwwoclcorgresearchevents2019080819-democratizing-access-large

-datasets-shared-infrastructurehtml

116 NISOrsquos Reproducibility Badging and Definitions now out for public comment may also help researchers extend the benefit of their research to others See ldquoTaxonomy Definitions and Recognition Badging Scheme Working Group | NISO Websiterdquo nd Accessed 22 September 2020 httpswwwnisoorgstandards-committeesreproducibility-badging

117 Smith-Yoshimura Karen 2015 ldquoServices Built on Usage Metricsrdquo Hanging Together The OCLC Research Blog 30 September 2015 httpshangingtogetherorgp=5430

118 Krista M Soria Jan Fransen Shane Nackerud 2014 ldquoStacks Serials Search Engines and Studentsrsquo Success First-Year Undergraduate Studentsrsquo Library Use Academic Achievement and Retentionrdquo Journal of Academic Librarianship 40 84-91 httpsdoiorg101016jacalib201312002

119 See Jisc ldquoLibrary Impact Data Project (LIDP)rdquo Accessed 21 September 2020 httpwwwactivitydataorgLIDPhtml

120 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

121 DLF (Digital Library Federation) 2015 ldquoMetadata Librarian Cornell University Libraryrdquo DLF (blog) 11 June 2015 httpswwwdigliborgmetadata-librarian-cornell-university-library

122 Salarycom (2019) 2020 ldquoMetadata Librarianrdquo Posted by Georgia Tech University 13 November 2019 (Archived 2 September 2020) httpswebarchiveorgweb20200903061830httpswwwsalarycomjobgt-library metadata-librariane5644ece-c847-4cfb-994f-c4c80fa81e3d

123 OCLC 2020 ldquoLocate Items in the Library with StackMaprdquo httpshelpoclcorgDiscovery_and_ReferenceWorldCat_DiscoverySearch_resultsLocate_items_in_the_library_with_StackMap

124 Yewno Transforming Information into Knowledge 2020 ldquoHomerdquo httpswwwyewnocom

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

Transitioning to the Next Generation of Metadata 45

125 Smith-Yoshimura Karen 2019 ldquoNew Ways of Using and Enhancing Cataloging and Authority Recordsrdquo Hanging Together The OCLC Research Blog 2 April 2019 httpshangingtogetherorgp=7805

126 National Library of Australia (NLA) ldquoAustlang National Codeathonrdquo Accessed 21 September 2020 httpswwwnlagovauour-collectionsprocessing-and-describing-the-collectionsAustlang-national-codeathon [Map of Australia 2020 HERE Bing Microsoft Corporation]

NLA ldquoTroverdquo Search Uncover Australia Accessed 21 September 2020 httpstrovenlagovau

127 The Graphic History Company ndash Hachette UK ldquoRiver of Authorsrdquo Accessed 21 September 2020 httptheghccoprojectphpproject=hachette-uk-a-river-of-authors

128 Smith-Yoshimura Karen 2019 ldquoKnowledge Organization Systemsrdquo Hanging Together The OCLC Research Blog 17 April 2019 httpshangingtogetherorgp=7135

129 Zeng Marcia Lei and Philipp Mayr 2019 ldquoKnowledge Organization Systems (KOS) in the Semantic Web A Multi-dimensional Reviewrdquo International Journal on Digital Libraries 20 209-230 httpsdoiorg101007s00799-018-0241-2

130 SNAC (Social Networks and Archival Context) ldquoAbout SNACrdquo What is SNAC httpsportalsnaccooperativeorgabout

131 Smith-Yoshimura Karen 2020 ldquoKnowledge Management and Metadatardquo Hanging Together The OCLC Research Blog 9 April 2020 httpshangingtogetherorgp=7845

132 AI4LAM (Artificial Intelligence for Libraries Archives amp Museums) Updated 18 May 2020 httpssitesgooglecomviewai4lamhome

133 AI4LAMrsquos mission is to organize share and elevate knowledge about and use of artificial intelligence by libraries archives and museums It was founded in 2018 inspired by the success of the International Image Interoperability Framework (IIIF) in coordinating large scale collaboration on interoperable technology to advance LAMs

See AI4LAM ldquoAboutrdquo Our Mission httpssitesgooglecomviewai4lamabout

134 Padilla Thomas 2019 Responsible Operations Data Science Machine Learning and AI in Libraries Dublin OH OCLC Research httpsdoiorg1025333xk7z-9g97

135 Ibid 17-19

136 Smith-Yoshimura Karen 2019 ldquoAlternatives to Statistics for Measuring Success and Value of Catalogingrdquo Hanging Together The OCLC Research Blog 15 April 2019 httpshangingtogetherorgp=7122

137 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

138 Smith-Yoshimura Karen 2018 ldquoMarcEdit and Other Tools for Batch Processing and Metadata Reconciliationrdquo Hanging Together The OCLC Research Blog 26 March 2018 httpshangingtogetherorgp=6646

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

46 Transitioning to the Next Generation of Metadata

139 Reese Terry 2018 ldquoMarcEdit 2017 Usage Informationldquo Terryrsquos Worklog (blog) 9 September 2020 httpblogreesetnetarchives2572

140 Reese Terry 2020 ldquoWorking with Linked Data In MarcEditrdquo MarcEdit Development (blog) Accessed 21 September 2020 httpsmarceditreesetnetworking-with-linked -data-in-marcedit

141 Reese Terry 2018 ldquoMarcEdit Playlistrdquo 139 YouTube videos Last updated 26 December 2018 httpswwwyoutubecomplaylistlist=PLrHRsJ91nVFScJLS91SWR5awtFfpewMWg

142 Smith-Yoshimura Karen 2017 ldquoNew Skill Sets for Metadata Managementrdquo Hanging Together The OCLC Research blog 17 April 2017 httpshangingtogetherorgp=5929

143 ldquoXML and RDF-Based Systems Archivesrdquo nd Library Juice Academy (blog) Accessed 22 September 2020 httpslibraryjuiceacademycomcertificatexml-and-rdf

-based-systems

Reese Terry 2013 ldquoTutorialsrdquo YouTube (selected) MarcEdit Development (blog) 14 March 2013 httpmarceditreesetnettutorials

ldquoLynda Online Courses Classes Training Tutorialsrdquo nd Lyndacom - from LinkedIn Learning Accessed 22 September 2020 httpswwwlyndacom

ldquoLearn to Code - for Freerdquo nd Codecademy Accessed 22 September 2020 httpswwwcodecademycom

Software Carpentry ldquoTeaching Basic Lab Skills for Research Computingrdquo Upcoming Workshops Accessed 22 September 2020 httpssoftware-carpentryorg

144 ldquoData on the Web Best Practicesrdquo nd Accessed 22 September 2020 httpswwww3orgTRdwbp

Semantic Web for the Working Ontologist (2008) 2020 httpworkingontologistorg

145 Library Workflow Exchange nd ldquoAboutrdquo Accessed 21 September 2020 httpwwwlibraryworkflowexchangeorgabout

146 OCLC Developer Network 2020 ldquoDevConnect Webinars httpswwwoclcorgdeveloper eventsdevconnect-workshopsenhtml

147 Smith-Yoshimura Karen 2019 ldquoStewardship of Professional FTEs In Metadata Work and Turnoverrdquo Hanging Together The OCLC Research Blog 18 October 2019 httpshangingtogetherorgp=7580

148 OCLC 2020 ldquoWorldCatreg OCLC and Linked Datardquo Shared Entity Management Infrastructure httpswwwoclcorgenworldcatlinked-datashared-entity-management

-infrastructurehtml

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata

For more information about our work related to digitizing library collections please visit oclcdigitizing

6565 Kilgour Place Dublin Ohio 43017-3395T 1-800-848-5878 T +1-614-764-6000 F +1-614-764-6096 wwwoclcorgresearch

ISBN 978-1-55653-170-5 DOI 1025333rqgd-b343 RM-PR-216787-WWAE-A4 2009

O C L C R E S E A R C H R E P O R T

  • Executive Summary
  • Introduction
  • The Transition to Linked Data and Identifiers
    • Expanding the use of persistent identifiers
    • Moving from ldquoauthority controlrdquo to ldquoidentity managementrdquo
    • Addressing the need for multiple vocabularies and equity diversity and inclusion
    • Linked data challenges
      • Describing ldquoInside-Outrdquo and ldquoFacilitatedrdquo Collections
        • Archival collections
        • Archived websites
        • Audio and video collections
        • Image collections
        • Research data
          • Evolution of ldquoMetadata as a Servicerdquo
            • Metrics
            • Consultancy
            • New applications
            • Bibliometrics
            • Semantic indexing
              • Preparing for Future Staffing Requirements
                • The culture shift
                • Learning opportunities
                • New tools and skills
                • Self-education
                • Addressing staff turnover
                  • Impact
                  • Acknowledgments
                  • Appendix
                  • Notes
                  • FIGURE 1 ldquoChanging Resource Description Workflowsrdquo by OCLC Research
                  • FIGURE 2 Some 300 abbreviated author names for a five-page article in Physical Review Letters
                  • FIGURE 3 Examples of some DOI (left) and ARK (right) identifiers
                  • FIGURE 4 One Wikidata identifier links to other identifiers and labels in different languages
                  • FIGURE 5 Excerpt from the survey results from the 2017 EDI survey of the Research Library Partnership
                  • FIGURE 6 Responses to 2019 survey on challenges related to managing AV collections
                  • FIGURE 7 The OCLC ResearchWorks IIIF Explorer retrieves images about ldquoParis Mapsrdquo across CONTENTdm collections
                  • FIGURE 8 Distribution of 465 Indigenous language codes in the Australian National Bibliographic Database from the Austlang National Codeathon
                  • FIGURE 9 UK Hatchettersquos ldquoRiver of Authorsrdquo generated from the British Libraryrsquos catalog metadata