An Organizational Approach to the Analysis of Metadata Harvesting

12
An Organizational Approach to the Analysis of Metadata Harvesting Michael Khoo 1 , Jae-wook Ahn 1 , Doug Tudhope 2 , Ceri Binding 2 , Xia Lin 1 , Eileen Abels 1 , Diana Massam 3 , Hilary Jones 3 1 The iSchool, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA {khoo, xlin, jaewook.ahn, xlin, ega26}@drexel.edu 2 University of Glamorgan, Pontypridd, CF37 1DL, Wakes, UK {dstudhop, cbinding}@glam.ac.uk 3 The University of Manchester, Oxford Road, M13 9PL {Diana.Massam, Hilary.Jones}@manchester.ac.uk Abstract. This paper introduces an organizational approach for analyzing issues with metadata harvesting. The approach considers metadata work as an organizational activity that shapes the way metadata is created and used in individual organizational contexts. Over time, organizations can adopt and adapt metadata formats in different ways, which can lead to harvesting issues. This is illustrated with a comparative analysis of the issues encountered with harvests of Dublin Core from three different digital libraries. The analysis shows how the different organizational histories of each library led to different metadata implementations, and thus to issues with the harvest. We suggest that Dublin Core metadata harvesting can be a two-stage process, requiring manual alignment and normalization in addition to automated harvesting, and conclude that metadata harvesting is supported by a clear understanding of data provider’s organizational history. Some initial requirements for tools and processes to support this understanding are discussed. Keywords: dublin core, harvesting, metadata, OAI-PMH, organizations, standards 1 Introduction Metadata harvesting supports the exchange and aggregation of metadata between digital libraries. Dublin Core metadata harvesting is supported by the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The implementation of these standards is in theory a relatively straightforward if technical exercise. In practice, various barriers to metadata harvesting have been reported [13, 24, 32, 33]. While these barriers can appear ad hoc in nature, it is useful to ask whether they might also be rooted in some wider underlying characteristics of digital libraries. If this is the case, then understanding these characteristics can support the development of more systematic harvesting approaches.

Transcript of An Organizational Approach to the Analysis of Metadata Harvesting

An Organizational Approach to the Analysis of Metadata Harvesting

Michael Khoo1, Jae-wook Ahn1, Doug Tudhope2, Ceri Binding2, Xia Lin1, Eileen Abels1, Diana Massam3, Hilary Jones3

1 The iSchool, Drexel University, 3141 Chestnut Street, Philadelphia, PA 19104, USA {khoo, xlin, jaewook.ahn, xlin, ega26}@drexel.edu

2 University of Glamorgan, Pontypridd, CF37 1DL, Wakes, UK {dstudhop, cbinding}@glam.ac.uk

3 The University of Manchester, Oxford Road, M13 9PL {Diana.Massam, Hilary.Jones}@manchester.ac.uk

Abstract. This paper introduces an organizational approach for analyzing issues with metadata harvesting. The approach considers metadata work as an organizational activity that shapes the way metadata is created and used in individual organizational contexts. Over time, organizations can adopt and adapt metadata formats in different ways, which can lead to harvesting issues. This is illustrated with a comparative analysis of the issues encountered with harvests of Dublin Core from three different digital libraries. The analysis shows how the different organizational histories of each library led to different metadata implementations, and thus to issues with the harvest. We suggest that Dublin Core metadata harvesting can be a two-stage process, requiring manual alignment and normalization in addition to automated harvesting, and conclude that metadata harvesting is supported by a clear understanding of data provider’s organizational history. Some initial requirements for tools and processes to support this understanding are discussed.

Keywords: dublin core, harvesting, metadata, OAI-PMH, organizations, standards

1 Introduction

Metadata harvesting supports the exchange and aggregation of metadata between digital libraries. Dublin Core metadata harvesting is supported by the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). The implementation of these standards is in theory a relatively straightforward if technical exercise. In practice, various barriers to metadata harvesting have been reported [13, 24, 32, 33]. While these barriers can appear ad hoc in nature, it is useful to ask whether they might also be rooted in some wider underlying characteristics of digital libraries. If this is the case, then understanding these characteristics can support the development of more systematic harvesting approaches.

To this end, this paper proposes an organizational approach to analyzing metadata harvesting. The approach describes digital libraries as organizations whose members are engaged in various kinds of metadata work. In this approach, metadata work is conceptualized as an organizational as well a technical activity that, over time, shapes how metadata is created and used in individual organizational contexts. One result of this shaping is that, even when individual organizations adopt the same metadata schema, different uses of the same schema can emerge which can then affect metadata harvesting. In this paper, this organizational approach is elaborated through a comparative analysis of harvesting issues with three different libraries. This harvesting work was part of a wider project that is seeking to generate Dewey Decimal class numbers automatically from metadata records. The harvesting issues were sufficiently interesting to warrant further investigation in themselves. While similar issues have been reported in the past, there has been little systematic analysis of such issues from an organizational perspective.

The paper is structured as follows. Section 2 provides brief background to metadata harvesting, Dublin Core and OAI-PMH. Section 3 describes three related cases of metadata harvesting from three digital libraries. It identifies the harvesting issues encountered in each case, including metadata unavailable through OAI-PMH, and duplicate metadata records describing the same resource in different terms. It then supplies an organizational analysis that attempts to account for all of these issues in a coherent fashion. Section 4 draws together the case study results and proposes an organizational approach to inform future metadata harvesting efforts.

2 Metadata Harvesting

Metadata harvesting is a technique for aggregating metadata from individual digital libraries, for instance to build a central metadata repository. The harvesting discussed in this paper used the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) to harvest Dublin Core metadata. The development of OAI-PMH was based on earlier work on E-print interoperability at ArXiv and other repositories, that sought to make E-prints more widely available. In contrast to formats such as Z39.50, which convey richer and more granular detail but which also require more effort to adopt, early E-print metadata harvesting was therefore intended to be a low barrier technique to resource discovery, which could then support third party development of repository tools and services [26, 27]. As the wider possibilities of metadata harvesting became apparent, OAI-PMH was subsequently developed, in which libraries (‘data providers’) could expose their metadata on servers, where it could be harvested by third parties (‘service providers’) and then re-used in various ways. At the time, it was noted that “experimentation will prove whether such low-barrier interoperability is realistic and functional” [26].

In terms of functionality, the technical characteristics of metadata (such as standardized elements, attributes, definitions, and relationships [e.g. 2, 7, 12, 28]) provided suitable benchmarks for evaluating harvesting. From this perspective, the early challenges for

metadata harvesting were anticipated to be technical in nature [26, 27, 31]. In addition, however, a number of practical challenges were also identified. While OAI-PMH supports the harvesting of unqualified Dublin Core, the existence of various flavours of qualified Dublin Core in different repositories was seen as potential barrier to harvesting [32, 33]. [13] describes a comparative study of OAI-PMH implementations in the Mellon Metadata Harvesting Initiative; among the issues encountered across projects were lack of resources hindering the work of data providers; varying and/or conflicting applications of Dublin Core by different data providers, which required metadata normalization after harvesting; duplicate metadata records; and a general lack of awareness of the full metadata lifecycle associated with harvested metadata. [24] reports how the use of OAI-PMH to harvest metadata from NSDL Pathways to a central NSDL repository was demanding of Pathways’ organizational resources (time, staff, expertise, etc.), and in cases where those resources were inadequate, the quality of harvesting was affected adversely. The harvesting required ongoing human intervention by the harvesting team, in order to ensure success. [9, 32] describe metadata harvesting to build a Web portal for cultural heritage resource discovery, and a number of issues arising from the heterogeneous nature of the collections developed by different communities. Some of the 39 harvest partners were themselves aggregators, and the team had to manage metadata from a total of 580 institutions. Inevitably, there was variation in the quality of the retrieved metadata, both in individual cases, and between the three main communities involved in the harvest (museums and archives, academic libraries, and digital libraries). There were differences in descriptions of the same resource by different institutions, and differences in the use of item-level and collection-level metadata, all of which required normalization work. The team found that managing OAI-PMH required ongoing intervention.

Taken on a case-by-case basis, these reported issues may appear ad hoc. At the same time, they appear consistent across a number of years and a number of projects. In terms of the comment by [26] on the experimental nature of OAI-PMH implementation, it is useful to ask whether the reported issues (as well as a number of similar examples collected anecdotally by the authors) in fact constitute a series of replicated outcomes of such experiments (c.f. [3]). If so, it is then also useful to see to what extent these replicated outcomes might be explained in terms of underlying characteristics of metadata and/or digital libraries.

The rest of this paper approaches this question by considering digital libraries as organizations. This approach allows for the introduction of models of organizational knowledge and management [8, 22, 23, 25]. The specific organizational approach adopted in this paper treats digital libraries components of sociotechnical systems of technologies, institutions, practices and other phenomena [5]. From this perspective, digital libraries, metadata, and metadata work, all mutually shape and re-shape each other in response to internal and external organizational factors (similar observations have been with regard to the mutual co-evolution of data curation practices and cyberinfrastructure [25]). Over time, these mutual interactions can produce divergences in metadata format and usage between organizations, which can impede metadata harvesting.

In other words, while metadata harvesting requires standardization, longitudinal organizational processes can work against standardization, and produce different implementations of the same ‘standard’ in different contexts. This can lead in turn to issues with metadata harvesting. To investigate this tension further, Section 3 compares issues that arose in three instances of metadata harvesting in the same digital library project. Section 4 then analyzes these cases, and outlines some preliminary recommendations regarding the organizational work needed to address these issues.

3 Three Harvesting Case Studies

Fig. 1. Original metadata harvesting workflow for the MASH database (from project proposal).

This section compares three cases of Dublin Core metadata harvesting (two of which used OAI-PMH) within a single project. The project is the Digging Into Metadata project (the ‘Digging project’), which is exploring a method for increasing discovery across digital libraries without the use of metadata crosswalks [10]. The method involves generating Dewey Decimal classes from each record, which can then be used for cross-library browsing. The three digital libraries involved in the project are: the Internet Public Library (http://www.ipl.org/); Intute (http://www.intute.ac.uk); and the National Science Digital Library (http://nsdl.org/). The project began with the assumption that catalog records contain human-generated metadata in fields such as title, description and subject, and that this content can be machine analyzed in order to produced ‘aboutness’ terms which can then be used to generate Dewey Decimal classes (Figure 1). The focus of this paper is on the first step, which involves aggregating the required metadata – including the title, description and subject fields – from each of the partner libraries (IPL, Intute, and NSDL). The database used for this purpose was designated MASH (Metadata Aggregation, Storage and Handling). Once in MASH, the metadata from each record could be subjected content analysis. A representation of this process is shown in Figure 1.

The work to populate MASH included developing a harvesting workflow for each digital library, with the aim of developing a general workflow that could then be scaled up to include additional digital libraries. A number of issues were encountered which slowed

the pilot harvests own. First, there was metadata useful for the Digging project, that was not available through normal harvesting queries; and second, the harvest produced a number of duplicate records, both from different libraries and also within the same library, which described the same resource in different terms. The resolution of these issues required discussions with each digital library before the metadata itself could be harvested in a form useful for the Digging project.

3.1 Inaccessible Metadata

The first important issue was that the digital libraries project team sometimes encountered metadata not immediately retrievable through OAI-PMH queries. This ‘unavailable’ metadata was often discovered by accident, and required further work in order to locate, understand, and then (if possible) harvest.

For instance, at the time of the pilot harvest, IPL metadata was stored in a Fedora database, in three separate datastreams: a DC stream, which held the fifteen unqualified DC metadata fields; an IPL1.0 datastream, which held additional metadata; and the RELS-EXT datastream, which held statements regarding item-collection relationships. While some of the metadata required for MASH (e.g. dc:title, dc:description, and dc:subject) was stored in the DC datastream, further potentially useful elements (such as ipl:subject) were stored in the IPL datastream, and were not originally planned to be part of the harvest (and indeed would have been unknown to anyone outside of IPL). In another example, some metadata from an earlier version of the IPL was stored in a FileMaker Pro database, and as this had not been crosswalked to Dublin Core, it was not accessible for harvesting. Understanding the presence of this extra metadata required familiarity with IPL history before harvesting decisions could be made.

In the case of Intute, a preliminary analysis of the pilot harvest showed that the harvested metadata sometimes differed from the metadata that could be viewed on the Intute site for the same resource. After inquiries, this turned out to be because the Intute metadata was stored in several databases, not all of which were accessible to OAI-PMH, and that the Intute web interface drew on one or more of these databases as appropriate when displaying catalog records. There was again metadata potentially useful for the Digging project, but which was not retrievable via OAI-PMH; and again, this required communication with Intute to locate and understand this metadata, and (in the end) to exchange files of database dumps.

Finally, similar issues were encountered in the metadata harvest with NSDL. The NSDL is a federated multi-disciplinary STEM library, with a central metadata repository at nsdl.org, which was provided as test-bed repository by the Digging Into Data program funders. NSDL served as an early implementation of OAI-PMH; and early issues arose from the fact that many NSDL projects lacked the organizational resources (skills, expertise, time, etc.) necessary for successful harvesting [24]. In the current work, the initial batch of NSDL records obtained through OAI-PMH was found to contain different metadata to that displayed on the corresponding catalog record page in the NSDL Web

site. Email communication with the NSDL’s metadata staff clarified this situation, which arose from the fact that the metadata aggregated for the NSDL central repository sometimes included records for the same resource that had been provided by multiple NSDL partners; in this case the Web site showed a condensed and normalized version of these multiple records, while the original OAI-PMH query retrieved a concatenated version of the same multiple records. This clarification allowed for the adjustment of the harvesting process, through a reconfigured set of OAI-PMH queries.

3.2 Metadata Quality

A total of 263,550 records were harvested. Issues then arose with metadata quality, including duplicate records. While there is no indicator to precisely calculate the duplication of resource, we can examine indirect cues to estimate it, such as URL and title. Titles are good sources that represent the contents of digital library records but it must be kept in mind that different resources can have identical titles, especially when they are shorter. URLs have less chance to be duplicate across different resources but one still needs to be careful with incorrect or insufficient information within URL strings (e.g., typos or when provided only with root URLs). There were 25,318 duplicate titles (9.6%), and 19,475 duplicate URLs (7.4%).

One example of a duplicate from the IPL is the catalog record for the official Web site of the Chateau de Versailles, which includes records from both the IPL, and the Librarians’ Internet Index (which merged with the IPL in LII). A comparison of the subject and description fields is given below:

IPL LII description This museum is located near

Paris and includes many masterpieces. This website describes the history of the chateau through the buildings, gardens and famous royalty that have lived there. Take a tour with the interactive map.

This site contains an introduction to the palace at Versailles, France. Find history of its construction, images, and brief biographies of some of the historic figures in French history. Visiting information and events are provided. Available in English, French, and Japanese.

subject terms

Chateau Louis XIV Marie-Antoinette Marie-Antoinette's estate Palace french court Grand Trianon hall of mirrors formal gardens

Architecture Dragons Dreams Daring Deeds Castles Palaces Palaces

In another example, a number of NSDL Partners cataloged the Web site for the National Science Teachers’ Association (NSTA: http://www.nsta.org), in different ways, generating metadata appropriate for their audience; one record NSDL included five subject terms (biology, physics, education, life science, and chemistry) while another

included twenty-three subject terms (including educational theory and practice, environmental science, policy issues, space science, ecology forestry and agriculture, history/policy/law, and technology). In this case, and in contrast to the IPL case, the resource descriptions were approximately similar, having been taken from About section of the NSTA Web site itself.

Part of the issue here is that these are non-identical duplicates. Identical duplicates are easy to identify and remove. Non-identical duplicate records, such as different descriptions of the same resource, can however affect post-harvesting processing, skew the results of the content analysis in MASH, and lead to the generation of multiple Dewey classes for the same resource. At the time of writing Digging project team is still working on methods to reconcile duplicates.

4 Organizational Histories and Metadata Archaeology

This section focuses on organizational analyses of the harvesting issues just described and seeks to account for their occurrence across the different cases. As introduced above, the overall approach of this paper looks to organizational and sociotechnical histories leading to divergent implementations as explanations for harvesting issues. How well, therefore, does this explanation fit the studies just presented?

Each of the libraries in the study has a complex organizational history. The IPL was founded in 1995 as an online reference service, and developed collections, metadata, and databases to support these activities [17]. Some of this early metadata was that stored in Filemaker Pro, and due to turnover of IPL staff since then, no real explanation of this metadata is now available [22]. Further metadata was acquired in 2008, when the IPL merged with the Librarians’ Internet Index (LII), a California-based digital library that lost funding, and both IPL and LII metadata were crosswalked to Dublin Core. A number of fields that could not be mapped to the Dublin Core datastream were placed (‘archived’) in the IPL datastream, from where they could be searched if necessary [22]. This auxiliary metadata would not however be apparent in a harvest of the DC datastream.

Intute evolved from a number of previous repositories [18, 35]. Its immediate predecessor, the Resource Description Network (RDN), included partners from eight different subject areas [15, 16]. Each Intute resource had a DC record, and sometimes additional subject classification metadata was inherited from these partners, stored in separate databases, and drawn on as needed. This arrangement met the needs of Intute, but also produced metadata that was opaque to OAI queries. The RDN was built in turn on a previous federation, the e-Lib Subject Gateways, and there was not always a simple mapping from e-Lib to the RDN. The provenance and structure of Intute metadata therefore stretched back through at least two previous organizations.

Finally, the NSDL was established by a National Science Foundation program that funded a number of subject libraries, now known as Network Partners [4, 37]. A total of nineteen Partners are active [29]. The funding process was competitive and encouraged

diversity and novelty in approaches. The organizational structure that evolved was a ‘hub and spoke’ model, with a hub of ‘core integration’ projects that developed program-wide infrastructure, linked to Network Partners that built collections for specific audiences. Each Network Partner created its own metadata, within guidelines provided by NSDL core integration, and contributed this to the central NSDL metadata repository via OAI-PMH. (Metadata harvested from NSDL has thus been through at least two OAI-PMH pipelines – from Network Partner to NSDL, and from NSDL to the Digging project). Given their relative autonomy, Network Partners could catalog the same resource in different ways, and after harvesting, the central NSDL metadata repository could hold multiple records for a resource. In these cases the record displayed to users at nsdl.org was often a normalized version that concatenated all records from different Network Partners, editing these for redundancy (such as multiple occurrences of the same subject term).

Fig. 2. Summary organizational history for the Digging project metadata harvest. The dotted lines indicate manual harvesting of ‘hidden’ metadata. Note the metadata archaeologist at the top.

For each of the libraries involved in the harvest, a series of historical and organizational factors had produced metadata that was difficult to the harvest. An outline of these

organizational histories is represented in Figure 2. The figure represents time in terms of depth, with the current Digging project at the top above the horizontal line, and the harvest represented by the top row of databases. As can be seen, the history of the libraries in the Digging project is extensive and complex (c.f. [9, 13, 22, 32, 33]), and this history gets murkier, the more we proceed downwards and back in time. Conversely, going up the figure, and forward in time, we can see how it is possible for metadata to evolve over time in different organizational contexts. This mirrors in some ways well-known issues associated with the dynamic nature of the World Wide Web, and the inherent difficulties faced by libraries that attempt to catalog the Web. In these efforts, metadata is often a snapshot of a changing website at a particular point in time. (Note that figure 2 could be expanded further; in the case of Intute, for instance, the evolution of SOSIG e-Lib subject gateway to RDN member involved multiple external partners. NSDL Network Partners also aggregate metadata from multiple partners; the Bioscience Education Network (BEN) for instance has 32 partners, a number of which also have their own digital libraries and contribute metadata to BEN.)

It is instructive to compare Figure 1 with Figure 2. The former, adapted from the original project proposal, shows a simple configuration of three digital libraries; the latter, in contrast, shows a minimum of forty-two different organizations and databases in the history of these three libraries. Figure 2 can be thought of as representing a form of ‘metadata archaeology,’ in which metadata harvesters dig down through layers of organizational history, in order to understand better how surface structures came to be shaped by past events, in order to then configure metadata harvests.

Some important implications follow from the Figure. In terms of theory, it provides a useful way systematically to understand how organizational factors affecting metadata harvesting. It also points to an explanation of harvesting issues in terms of organizational history which is transferable between the three cases presented in this paper, and retroactively applicable to a number of prior reported cases as well, implying in turn the possibility of transferability to future cases as well. In terms of practice, one implication is that responses to harvesting issues need to be proactive rather than reactive, with organizational issues becoming something to be anticipated in advance, rather than something to be dealt with once they have arisen. Developing organizational histories of data providers can help to identify potential issues before they occur is one way to proceed with such a proactive organizational approach. This suggests in turn that metadata harvesting is a two-stage process, involving initial manual and organizational configuration, followed by the implementation of automatic queries. Again, this claim echoes and some previously reported findings [c.f. 22, 24, 32].

A final set of implications, more tentative than the first two, involves requirements for the design of organizational tools to support harvesting. There are possibilities for formalizing the approach represented in figure 2 in the form of rubrics, descriptions of best practices, and tools and services (in the case of the Digging project, the visual representation of each library’s history helped support discussion regarding solutions to identified metadata harvesting issues). What would these tools look like? Some idea of requirements for such tools can be gleaned from several previous studies that have looked

at data and metadata mobility over time. [22]’s study of metadata crosswalking in the IPL, which drew on theories of networks of practice [6], emphasizes that rich communication, including the use of boundary objects [34], is important to support metadata work amongst organizational members and groups. [14]’s study of data mobility and re-use in training for mammography screening suggests that to make data relevant in new contexts requires rich links to and reinterpretation of data from prior contexts of use. Overall, these analyses suggest that a ‘workbench’ approach to organizational tools that focus on articulating often implicit organizational structures and histories in the context of harvesting workflows, rather than just automated technical solution, might ultimately preferable. The design of such a tool would need to take into account how to represent the complex richness of organizational metadata practices over time; and this is a question that needs to be addressed in future research.

5 Conclusion

This paper has introduced an organizational approach to the analysis of digital library metadata harvesting. The approach treats digital libraries and metadata as mutually constitutive phenomena that co-evolve over time. One consequence of this co-evolution is that even where a metadata standard (such as Dublin Core) is adopted, individual organizational factors can produce divergent implementations of that standard in different digital libraries. The approach was applied to the analysis a three case studies of metadata harvesting in the Digging Into Metadata project. The harvesting was ultimately successful, but a number of issues delayed the work. Upon analysis, the organizational dimensions of the harvesting became clear, with each of harvest encountering issues that could be traced partly to the organizational histories of each library concerned. That these findings occurred in a multi-site comparison suggests that the observations may also be transferrable to other organizational settings; the analysis therefore provides a coherent way to understand a number of the practical problems that have been identified previously with metadata harvesting. Finally, several implications were drawn from the study, based around the need for metadata harvesting to take into account in a systematic fashion the organizational histories of data providers.

References

1. Arms, W., Dushay, N., Fulker, D., Lagoze, C. (2003). A case study in metadata harvesting: The NSDL. Library Hi Tech 21(2), 228-237.

2. Baca, M. (Ed.) (2008). Introduction to Metadata. Los Angeles: Getty Research Institute. http://www.getty.edu/research/publications/electronic_publications/intrometadata/

3. Barley, Stephen. (1986), Technology as an Occasion for Structuring: Evidence from Observations of CT Scanners and the Social Order of Radiology Departments. Administrative Science Quarterly 31(1), 78-108.

4. Bikson, T., Kalra, N., Galway, L., Agnew, G. (2011). Steps Toward a Formative Evaluation of NSDL. RAND Technical Report. http://www.rand.org/content/dam/rand/pubs/technical_reports/2011/RAND_TR998.pdf.

5. Bishop, A. P., Van House, N. A., Buttenfield, B. P. (Eds.). (2003). Digital Library Use. Social Practice in Design and Evaluation. Cambridge, MA: The MIT Press.

6. Brown, J. S., Duguid, P. (2001). Perspective. Knowledge and Organization: A Social-Practice Perspective. Organization Science 12 (2), 198-213.

7. Candela, L., Castelli, D., Ferro, N., Ioannidis, Y., Koutrika, G., Meghini, C., et al. (2010). The Digital Library Reference Model. Foundations for Digital Libraries: DELOS Network of Excellence on Digital Libraries Project.

8. Chowdhury, G., McMenemy, D., Poulter, A. (2006). Large-Scale Impact of Digital Library Services: Findings from a Major Evaluation of SCRAN. In: J. Gonzalo et al. (Eds): ECDL 2006, LNCS 4172, pp. 256-266.

9. Cole, T., Shreeves, S. (2004). Search and discovery across collections: the IMLS digital collections and content project. Library Hi Tech, 22(3), 307-322.

10. Digging Into Metadata. (2013). Digging Into Metadata. 11. Electronic Libraries Program (n.d.). e-Lib: The Electronic Libraries Programme 1995-2001.

http://www.ukoln.ac.uk/elib/ 12. Gonçalves, M. A., Fox, E. A., Watson, L. T., Kipp, N. A. (2004). Streams, Structures, Spaces,

Scenarios, Societies (5S): A Formal Model for Digital Libraries. ACM Transactions on Information Systems, 22(2), 270-312.

13. Halbert, M., Kaczmarek, J., Hagedorn, K. (2003). Findings from the Mellon Metadata Harvesting Initiative. 7th European Conference, ECDL 2003 Trondheim, Norway, August 17-22, 2003. Pp. 58-69. DOI: 10.1007/978-3-540-45175-4_7

14. Hartswood, M., Procter, R., Taylor, P., Blot, L., Anderson, S., Rouncefield, M., Slack, M. (2012). Problems of Data Mobility and Reuse in the Provision of Computer-based Training for Screening Mammography. CHI ’12, Austin, TX, USA, 909-918.

15. Hiom, D. (2006a). Retrospective on the RDN. Ariadne Issue 47, http://www.ariadne.ac.uk/issue47/hiom/

16. Hiom, D. (2006b). RDN Timeline. Ariadne Issue 47, http://www.ariadne.ac.uk/issue47/hiom 17. Janes, J. (1998). The Internet Public Library: An Intellectual History. Library Hi Tech, 16(2),

55-68. 18. Joyce, A., Wickham, J., Cross, P., Stephens, C. (2008). Intute integration. Ariadne 55.

http://www.ariadne.ac.uk/issue55/joyce-et-al 19. Joyce, A., Kerr, L., Machin, T., Meehan, P., Williams, C. (2010). Intute - Reflections at the

End of an Era. Ariadne Issue 64, http://www.ariadne.ac.uk/issue64/joyce-et-al/ 20. Khoo, M. (2006). A Sociotechnical Framework for Evaluating a Large-Scale Distributed

Educational Digital Library. J. Gonzalo et al. (Eds): ECDL 2006, LNCS 4172, pp. 449-452. 21. Khoo, M., Hall, C. (2010). Merging metadata: A sociotechnical study of crosswalking and

interoperability. 10th ACM/IEEE Joint Conference on Digital Libraries (JCDL), Gold Coast, Queensland, Australia, June 21-25, 2010, pp. 361-364.

22. Khoo M., Hall, C. (2013). Managing metadata: Networks of practice, technological frames, and technical work in a digital library. Information and Organization 23, 81–106.

23. Khoo, M., Macdonald, C. 2011. An organizational model for digital library evaluation. First International Conference on The Theory and Practice of Digital Libraries (TPDL), Berlin, Germany, September 26-28, 2011. Springer: Lecture Notes in Computer Science (LNCS).

24. Lagoze, C., Krafft, D. B., Cornwell, T., Dushay, N., Eckstrom, D., Saylor, J. 2006. Metadata aggregation and ‘automated digital libraries’: a retrospective on the NSDL experience. 6th ACM-IEEE Joint Conference on Digital Libraries (JCDL), June 11–15, 2006, Chapel Hill, North Carolina, USA, pp. 230-239.

25. Lagoze, C., Patzke, K. (2011). A research agenda for data curation in cyberinfrastructure. Paper presented at the 11th ACM-IEEE Joint Conference on Digital Libraries (JCDL), June 13-17, 2011, Ottawa, Canada.

26. Lagoze, C., Van de Sompel, H. (2001). The Open Archives Initiative: Building a Low-Barrier Interoperability Framework. First ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL'01), 54–62.

27. Lynch, C. (2001). Metadata Harvesting and the Open Archives Initiative. ARL, A Bimonthly Report, no. 217. http://www.arl.org/resources/pubs/br/br217/br217mhp.shtml

28. NISO (National Information Standards Organization) (2004). Understanding Metadata. Bethesda, MD: NISO. Retrieved from: http://www.niso.org/publications/press/UnderstandingMetadata.pdf

29. NSDL. (n.d.). NSDL Network Partners. http://nsdl.org/partners 30. Rusbridge, C. (1998). Towards the Hybrid Library. D-Lib Magazine, July/August 1998.

http://www.dlib.org/dlib/july98/rusbridge/07rusbridge.html 31. Shreeves, S., Habing, T., Hagedorn, K., Young, J. (2005). Current Developments and Future

Trends for the OAI Protocol for Metadata Harvesting. Library Trends 53(4), 576-589. 32. Shreeves, S., Kaczmarek, J., Cole, T. (2003). Harvesting cultural heritage metadata using the

OAI protocol. Library Hi Tech 21(2), 159-169. 33. Shreeves, S. , Kirkham, C. (2004). Experiences of Educators Using a Portal of Aggregated

Metadata. Journal of Digital Information, 5(3). http://journals.tdl.org/jodi/index.php/jodi/article/viewArticle/144/142

34. Star, S., Griesemer, J. (1989). Institutional ecology, 'translations' and boundary objects: Amateurs and professionals in Berkeley's Musuem of Vertebrate Zoology, 1907-39. Social Studies of Science, 19 (3), 387-420.

35. Williams, C. (2006). Intute: The New Best of the Web. Ariadne Issue 48, http://www.ariadne.ac.uk/issue48/williams/

36. Woodley, M. S. (2008). Crosswalks, metadata harvesting, federated searching, metasearching: Using metadata to connect users and information. In M. Baca (Ed.), Introduction to Metadata, 3rd ed. (pp. 38-62). Los Angeles, CA: Getty Research Institute.

37. Zia, L. (2005). The NSF National Science, Technology, Engineering, and Mathematics Education Digital Library (NSDL) Program. D-Lib Magazine 11(3). http://www.dlib.org/dlib/march05/zia/03zia.html.