A Case Study of a Digital Image Collection Belonging to a ...

78
1 CITY UNIVERSITY LONDON A Case Study of a Digital Image Collection Belonging to a Charity Isobel Ramsden January 2016 Submitted in partial fulfillment of the requirements for the degree of MA in Library Science Supervisor: Professor David Bawden

Transcript of A Case Study of a Digital Image Collection Belonging to a ...

1

CITY UNIVERSITY LONDON

A Case Study of a Digital Image Collection Belonging to a Charity

Isobel Ramsden

January 2016

Submitted in partial fulfillment of the requirements for the degree of MA in Library Science

Supervisor: Professor David Bawden

2

Abstract This case study aims to increase knowledge of working with digital image collections, including issues related to information organisation, information behaviour, digital asset management and user experience. By exploring these issues, the researcher hopes to get both a broad picture of the case and some in-depth insights into specific themes. This research aims to relate a practical case to theories explored in the academic literature. And analysis of the results of the case study will be used to prompt reflection about how aspects of the case could be developed or improved. It is hoped that these findings will be applicable to other, similar cases. The research takes an interpretivist approach in that it aims to describe and reflect upon the case and let the emergence of new themes from the data dictate the research design to some extent. The researcher divides the case study into two phases, as recommended by Pickard (2007, 87-91): an 'orientation and overview' phase where a broad range of potentially relevant issues is explored and a 'focused exploration' phase where a specific theme is investigated. The research begins by reviewing the academic and professional literature, which in turn informs the way in which particular issues are explored. In the initial exploratory phase, the researcher carries out interviews with some of the main users of the collection and analyses logs generated by the Digital Asset Management System (DAM). And in the 'focused exploration' phase, the researcher investigates indexing policy and management of the collection through analysis of metadata, interviews and an indexing task completed by participants. A few key findings are made. Firstly, the collection is important for promoting and keeping a record of the charity's work. Secondly, the rapid growth of the collection makes metadata increasingly important for the discoverability of files. Thirdly, DAM software can support information organisation, information retrieval, information seeking and digital asset management in many ways. Fourthly, the case shows the importance of training for helping staff to use the system and manage the metadata schema and folders. Finally, although time and staffing for organising and managing the collection are limited, the help that good quality metadata and well-organised folders can bring are worth it in the opinion of the researcher. The research results also include specific recommendations for managing the collection and indexing it. The case explores the distinctive nature of visual information as opposed to other types of information. And it gives an insight into working with digital image collections in a corporate environment. It also describes in detail the use of specialist software for storing, organising, retrieving and managing digital files. Anonymity is as agreed with the charity and perhaps helped encourage active staff participation.

3

Contents Introduction ..................................................................................................................................................... 5 1. Aims and objectives .................................................................................................................................... 7 2. Literature Review ........................................................................................................................................ 8 i. Digital Images ..................................................................................................................................... 8 ii. Metadata ............................................................................................................................................ 9 a. Purposes of metadata ............................................................................................................... 9 b. Metadata modelling .................................................................................................................. 9 c. Metadata standards ................................................................................................................. 10 d. Metadata quality ..................................................................................................................... 10 e. Subject indexing ...................................................................................................................... 10 f. Vocabulary control .................................................................................................................. 12 iii. Information Retrieval ...................................................................................................................... 13 iv. Information Behaviour .................................................................................................................... 14 a. Information Needs .................................................................................................................. 15 b. Information Seeking ................................................................................................................ 15 c. Information Use ....................................................................................................................... 16 d. Related Theories ...................................................................................................................... 16 v. User Experience ................................................................................................................................ 17 a. Information Architecture ......................................................................................................... 17 b. Interface and Graphics ............................................................................................................ 18 c. Performance ............................................................................................................................ 18 vi. Digital Asset Management .............................................................................................................. 18 a. Preservation ............................................................................................................................ 19 b. Digital Asset Management systems ......................................................................................... 20 c. Training ................................................................................................................................... 20 3. Methods ...................................................................................................................................................... 20 4. Orientation and Overview .......................................................................................................................... 21 i. Methods ............................................................................................................................................. 21 ii. Results .............................................................................................................................................. 22 iii. Discussion ....................................................................................................................................... 26 5. Focused Exploration ................................................................................................................................... 28 i. Methods ............................................................................................................................................. 28 ii. Results .............................................................................................................................................. 29 iii. Discussion ....................................................................................................................................... 35 iv. Recommendations ........................................................................................................................... 37 6. Conclusion .................................................................................................................................................. 39 7. References .................................................................................................................................................. 40 Appendix A: Account of the researcher's internship at the charity ................................................................ 44 Appendix B: Metadata fields .......................................................................................................................... 48 Appendix C: Controlled vocabularies ............................................................................................................ 49 Appendix D: Folder system ............................................................................................................................ 54 Appendix E: Orientation and overview phase: interview questions .............................................................. 55 Appendix F: Orientation and overview phase: log analysis methods and results .......................................... 57 Appendix G: Focused exploration phase: group interview questions ............................................................ 60 Appendix H: Focused exploration phase: P7 interview questions ................................................................. 61 Appendix I: Focused exploration phase: image-tagging exercise forms and analysis ................................... 62 Appendix J: Consent forms ............................................................................................................................ 66 Appendix K: Research proposal ..................................................................................................................... 68 Appendix L: Reflection .................................................................................................................................. 78

4

Acknowledgements

I am grateful to all the staff at the charity who participated in my research and to the Head of Digital for giving me the chance to do an internship there. Thanks are also due to staff at Third Light and at the Historic England Archive for their help with my queries. Finally, I am grateful to all the tutors on the MA in Library

Science for enabling me to undertake this dissertation, particularly my supervisor, David Bawden.

5

Introduction The purpose of this research is to explore a digital image collection belonging to a charity. Through conducting a case study, the researcher aims to uncover a broad range of issues relevant to the case and explore in more detail those issues that seem most interesting or in need of further research. The research design is informed by a review of the academic and professional literature. It is not fixed from the outset but rather based on analysis of themes emerging from the data. The research aims to create a picture of the case and prompt reflection about it, including how the case relates to theories explored in the literature review and how aspects of the case could be developed or improved. It is hoped that the findings of the case will be applicable to other, similar cases. This research is being undertaken as a result of an internship that the researcher did at the charity from May to November 2014. The purpose of the internship was to improve the organisation of the files in the charity's Digital Asset Management System (DAM). An account of the work is included in Appendix A (p.44). In short, the internship that the researcher applied for required someone to improve the following aspects of the charity's use of the DAM:

• quality of the metadata • ease of filling in the metadata forms • clarity of the folder structure • discoverability of resources

Before the internship, it was not compulsory for staff to add metadata when uploading files, though some did. Controlled vocabularies were not being used. There was confusion as to the purpose of metadata fields and which folders to put resources in. During her internship, the researcher created a new set of metadata fields and, in some cases, controlled vocabularies for tagging and searching for resources. She made some metadata fields compulsory and reapplied metadata to some of the existing photos. She also improved the folder structure. She then trained staff in the new metadata system and folder structure and how to make full use of the search capabilities of the DAM. At the time of writing, there are 8269 files in the charity's Digital Asset Management System. They include the following file types:

• Images - 92.79% • Audio files - 0.01% • Video files - 0.6% • Document (e.g. PDF, Microsoft Office) - 6.42% • Other - 0.17%

The images are mainly photos of the charity's centres, their programme of support and their events. The other main type of file is architectural (designs and plans of the centres). The DAM is also used for storing a small amount of material used by the Marketing and PR departments, such as logos and presentations about the charity's work. The purpose of the collection is to communicate the work of the charity and also partly to preserve the memory of its work. As such the files it contains can be compared to records - documents 'created or acquired by an organization, as part of a business process, so that their existence provides evidence of the fact that the process took place' (Bawden and Robinson, 2012, 256). They are also a collection in so far as they are 'an organized set of information-bearing items chosen for a particular purpose in a particular context or environment, and usually unique to that situation' (Bawden and Robinson, 2012, 78). The charity supports people living with cancer. In their own words: [The charity] provides free practical, emotional and social support for people with cancer and their families and friends. Built in the grounds of NHS hospitals, [the charity's] Centres are designed by leading

6

architects to be warm, welcoming and full of light and open space. Qualified staff offer a programme of support developed to complement medical treatment, including clinical psychology, nutrition, benefits advice and exercise. The first Centre opened in Edinburgh in 1996. There are now 18 Centres across the UK, online and abroad, with more planned for the future. The fact that many of the assets in the DAM are about the centres, including photographs by professional architectural photographers, shows the importance of architectural, landscaping and interior design to the charity's work. The charity's founders were landscape architects and part of their vision for the charity was to provide environments that would 'make the people who visit and work in our Centres feel safe, valued and comfortable in an atmosphere that stimulates their imagination and lifts their spirits'. The architects, landscape architects and interior designers that have created the centres include Frank Gehry, Arabella Lennox-Boyd and Paul Smith. The collection also includes a small number of photos of works of art belonging to or loaned to the charity. Works of art are also used to help create an environment conducive to healing. For example, Anthony Gormley's 'Another Time X' sculpture looks over one of the charity's centres. As this message from a staff e-newsletter shows, images are particularly adept at communicating the importance of art and design to the charity's work: 'We know how difficult it can be to articulate the uniqueness of [the charity] to someone who hasn't visited a Centre and we hope these additional photos will help you in telling our story and how we support people living with cancer, and their family and friends'. Since February 2012, the charity has been using DAM software designed by Third Light Ltd. Prior to this, digital images were stored on CDs. Third Light's DAM software is used by a wide variety of other organisations, including universities, the BFI and Transport for London. It is worth pointing out that Third Light refers to the software as 'Intelligent Media Server' rather than 'Digital Asset Management System'. For, as they point out (Third Light, 2015 (a)), there are many different names for what is essentially the same kind of software. The charity subscribes to the Premium edition of the software, which is middle-of-the-range in terms of expense and functionality (Third Light, 2015 (b)). The DAM is hosted by Third Light and accessed via the Internet. The storage capacity of the DAM is currently 250 GB and the charity has used 41.46% of this at the time of writing. According to Third Light, it was 75 GB in December 2012, increasing to 125 GB in December 2013 and 250 GB in December 2015 at no additional cost to the charity. Third Light explain that 'we are able to provide this free extra capacity because of the investment we continue to make for clients in new equipment and infrastructure services'. However, the donating of extra storage capacity is done in an 'ad-hoc manner' and is at the discretion of Third Light. So if the charity wished to add extra storage capacity in the future they might have to pay for it (Third Light Support, 2015). The software aims to move users away from organizing assets via a folder structure to organizing them using metadata.1 As such, it provides sophisticated functionality for adding metadata to files. Specifically, it allows users to create controlled vocabularies for both indexing and searching for documents. It also allows users to specify compulsory metadata fields and to require metadata to be approved by an 'admin user' (user with more control over how the software is used) before the file can be uploaded. There is also a folder system, which includes five different types of folder for organising and sharing files. More details about the metadata fields and folder system can be found in Appendices B and D respectively. The main users of the DAM are the Marketing and Communications and the Fundraising teams at the charity's head office and eighteen centres. The Events and PR teams at the head office also use it. The DAM is used for storing, organising, managing, finding and sharing resources. To put the case in a wider context, it can be seen as part of an increasing trend for creating, storing and using digital information. In 2011, the 6 billionth photo was added to the photo-sharing website, Flickr

1As they write in their ‘Help’ document: 'Metadata also provides a much more dynamic and powerfulway to index content than simple folder structures, better suited to modern information-led businesses with rapidly expanding collections of content to index.' (Third Light, 2015 (c))

7

(Kremerskothen, 2011). And the Bodleian and British Library have recently opened access to thousands of digital images from their collections online. This proliferation of digital content is largely due to the fact that technology for creating and storing information is becoming cheaper and more efficient all the time. For example, a Kodak digital camera that could take photos of up to 1152 x 864 pixels cost $449 in 1999 (digicamhistory, 2015), whereas a Kodak digital camera that can take photos of up to 4920 x 3264 pixels costs £79.00 today (Tesco, 2015). As for storage, hard drive cost per gigabyte has gone down from $700,000 per gigabyte in September 1981 to $0.03 per gigabyte in March 2014 (Komorowski, 2014). And cloud services, which use shared IT infrastructure to drive down costs, also make storage services cheaper. Dropbox, for example, offers free storage of up to 2GB. The case can also be set against the increasing tendency of end-users to organise and manage large amounts of digital information themselves. For example, Flickr allows users to organise their photos into albums, tag them with keywords and assign copyright or Creative Commons licences to them. And personal computers usually come with free image management software such as iPhoto. Again, this software usually allows one to organise images into albums, tag them with keywords, add captions and share them via social media. As a study of a large-scale digital image collection stored in the cloud that largely depends on the end-user to organise and manage it, this research is both a way of capturing these developments and looking at their implications. 1. Aims and objectives This case study aims to explore a digital image collection belonging to a charity. Using the academic and professional literature as a guide, it seeks to improve understanding of issues related to information organisation, information behaviour, digital asset management and user experience. The scope of the case study is limited to files stored in the charity's Digital Asset Management System. As these files are almost entirely (92.79%) made up of still images, these will be the focus of the case study. It is hoped that the findings of the case study could be transferable to similar cases and inform best practice in organising, managing and using digital image collections. To begin with, the researcher aims to get a broad understanding of all aspects potentially relevant to the case. Based on prior knowledge of the case, she has identified information behaviour, information organisation, digital asset management and user experience as themes likely to be relevant. The researcher will explore these themes during the initial phase of her research using the following research questions as a guide:

1. Since the introduction of the new metadata schema and folder structure,

• is metadata consistently, accurately and fully applied? • does metadata provide the necessary information for staff? • are users able to find the resources they need? • is the folder structure clearer?

2. What information behaviour do staff display in relation to the digital image collection? 3. How usable is the Digital Asset Management System? 4. How is the collection managed?

Having explored these questions, the researcher then aims to decide on a theme or set of themes that she would like to research in more detail. At this stage, a further set of research questions will be devised that will guide the exploration of these themes. This second phase of research aims to produce more detailed analysis and potentially a set of recommendations that could be used to guide practice. The review of the professional and academic literature aims to shape and inform the objectives of the research. It covers all the themes that the researcher has identified as potentially relevant to the case. The literature review also aims to allow the researcher to relate theory to practice both during the literature review and during the presentation of results from the case study.

8

2. Literature Review 2.i Digital Images The word 'image' comes from the Latin noun, 'imago', meaning 'image', 'likeness', 'idea' or 'appearance' (Oxford University Press, 1994). The Oxford English Dictionary (2015) defines 'image' as 'an artificial...representation of something, esp. of a person', 'a visual representation or counterpart of an object or scene', 'an exact likeness; a counterpart, copy' or a 'mental representation of something...created not by direct perception but by memory or imagination'. Except for the latter, images are perceived by the eye but processed or 'understood' by the brain, with 50% of brain activity devoted to vision according to some estimates (Terras, 2008, 3). Terras (2008, 6) defines a digital image as 'a representation of an image stored in numerical form, for potential display, manipulation or dissemination via computer technologies'. Images are represented by 1s or 0s: binary digits more commonly known as 'bits'. There are two types of digital image - bitmap and vector. Bitmap images map strings of bits to colours in the image. They are made up of basic units called 'pixels' and can convey a range of colours and shades, with the number of pixels per inch determining how clear the image is (its 'resolution'). Vector images on the other hand use ASCII text to create instructions to a computer about how colours and shapes relate to each other in an image. They cannot show the same complex range of colours and shades as bitmap images but they take up less memory and display a clear image at any size. They are commonly used in fields such as architecture and product design. The charity's digital image collection contains both bitmap images in the form of digital photos and vector graphics in the form of architectural design files. Digital images have many advantages over analogue images such as drawings on paper or photographic prints. As Terras writes (2008, 6), 'strings of bits can be easily replicated, transmitted, accessed and processed... mathematically sorted through to show hidden relationships, new arrangements, different views and expanded, contracted or concatenated knowledge'. High-resolution images can capture complex and tiny data and display them sharply on a screen (Terras, 2008, 9). However, digital images do also have some disadvantages. Colour and high-resolution bitmap images can require lots of memory to process and display (although compression - reducing the amount of data needed to represent an image - can be used to counter this); in addition, enlarging bitmap images can sometimes cause pixelation, i.e. individual pixels revealing themselves (Terras, 2008, 9). The way in which digital images are made visible depends on their file format. Different file formats have different ways of describing the data in a digital image to allow programs to process and display it. According to Terras (2008, 61), there are now over one hundred image formats and more are being created all the time. Some image file formats are standards endorsed by the International Organisation for Standardisation (ISO), the American National Standards Institute (ANSI) and other official organisations. Others are standards developed by industry that have become de facto standards because of their popularity. Some are proprietary - developed for particular systems - and others are openly documented so they can be easily adapted to other systems. The proliferation of file formats is partly due to different requirements for transporting and storing data and partly the need to keep up with technological developments. The image files in the charity's DAM are mainly JPEGs (84.79%) and TIFFs (14.67%). There are also AIs (0.22%), PSDs (0.16%), PNGs (0.12%) and GIFs (0.04%). JPEG (Joint Photographic Experts Group File Interchange Format) is a popular file format for sharing images as it uses compression to reduce the file size, which means that it takes up less storage space and less memory is needed to process the image. It has become the leading image file format for digital photographic images due to its effective compression method. However, given that this compression method is 'lossy' (i.e. data is irretrievably lost during compression) it is not the most suitable for archival purposes. TIFF (Tagged Image File Format) is the preferred file format for archival purposes and is now a common output format from professional-level digital cameras. TIFF files retain as much image data as possible and can therefore be used to create high quality archival 'master' files. TIFF also allows basic metadata to be written to the file itself. A disadvantage

9

of TIFF is that the files are large and can therefore be expensive to store and difficult to share online. AI (Adobe Illustrator) and PSD (Photoshop Document) files are proprietary standards developed by Adobe for their image editing software. A small percentage of the architectural plans in the DAM are saved as AI and PSD files (the rest are PDF documents). Finally, GIF (Graphics Interchange Format) and PNG (Portable Network Graphics) files are specifically designed for sharing and displaying images via the Internet. They are often used for logos. 2.ii Metadata 2.ii.a Purposes of Metadata Metadata can be described as 'structured data about data' (Jisc, 2015). It is made up of defined elements (e.g. 'title' or 'copyright notice') and defined values (e.g. text or dates). These elements and values together form a schema - a structured set of data representing a resource, also known as a 'surrogate' for that resource. Having a verbal surrogate is vital for image collections as computers are limited in their ability to automatically search the content of images. The purposes of metadata have been defined in various ways. Hider (2012, 18-19) describes them as 'finding, identifying, selecting, obtaining and navigating' resources, with the last purpose expressing how metadata can be used to collocate similar resources to better understand, or 'navigate', a collection. Haynes's 'five-point model of metadata' (2004, 15-17) describes the purposes of metadata as 'resource description, information retrieval, resource management, ownership and authenticity, and interoperability'. Resource description encompasses information such as the title and creator ('descriptive metadata'), technical data such as the image resolution or file size ('structural metadata'), copyright information ('administrative metadata'), preservation instructions ('preservation metadata') and the content of the resource ('subject metadata'). Metadata can support resource management by, for example, indicating who is responsible for a document and when it should be reviewed or archived. It can support interoperability by conforming to agreed standards for what elements to include, whether or not to use controlled vocabularies and which encoding scheme to use (for example, XML or RDF triples). 2.ii.b Metadata Modelling The process of deciding which elements and values to use in a metadata schema is sometimes referred to as 'metadata modelling' (Keathley, 2014, 84). Metadata modelling begins by researching user needs. The more that is known about user needs, the more likely that the metadata will be effective. Metadata modelling also needs to take account of the tools and systems available. How sophisticated is the software available for creating metadata? At a basic level, for example, Apple's file management system, Finder, allows files to be tagged using keywords or colours. At a much more complex level, Third Light's DAM allows one to create metadata fields, define how they should be populated (for example, with controlled or uncontrolled terms), whether or not to make them compulsory to fill in and whether or not to display them. (The metadata schema used to describe the charity's digital image collection is attached in Appendix B, p.48.) Finally, metadata modelling also needs to take into account the costs of creating and maintaining metadata. Creating good quality metadata can be time consuming, especially if it needs to be double-checked by someone with responsibility for the quality of metadata. One way in which the case can be made for a technical service is by working out its benefit-cost ratio. For example, Hider (2008) worked out the monetary value of technical services provided by public libraries using the Standard Preference technique (a method used to estimate how much consumers would pay for a good or service). In Hider's study, people were asked how much they would pay per month for their local public library service if the alternative were for it to shut down. First, they were asked how much they would pay for the service as it currently existed. Then, they were asked how much they would pay for a self-service library. Finally, they were asked how much they would pay for a self-service library without an online catalogue. Hider worked out that the benefit-cost ratio specifically for the library's technical services was 2.4:1. This compared favourably with the overall benefit-cost ratio for all of the library's services (1.33:1).

10

The Standard Preference technique could also perhaps be used in evaluating metadata services and working out their benefit-cost ratio. 2.ii.c Metadata Standards Various standards exist for image metadata. Metadata standards are schemas created by information professionals that are usually openly available. Some standards get taken up because they have been promoted within a particular field, others because they have been endorsed by official organisations such as the International Standards Organisation (ISO). They allow institutions to create high quality metadata that will be interoperable with that of other institutions who use the same schemas. Some of the main schemas for image metadata are:

• Dublic Core Metadata Element Set (DCMES) - an ISO standard that is made up of fifteen 'core' elements that can be applied to a wide range of resources across different subject domains. It does not allow for detailed descriptions of the subject of resources.

• Exif - a schema used by digital camera manufacturers to capture and store basic technical data. (Third Light automatically imports the Exif data of digital photographs.)

• PREservation Metadata: Implementation Strategies (PREMIS) - a standard for capturing preservation metadata.

• Categories for the Description of Works of Art (CDWA) - provides classes for describing the subject of works of art and/or images of works of art.

• Visual Resources Association (VRA) Core - based on CDWA, this schema is particularly useful for managing slides and digital images of art and architecture. However it does not capture technical metadata comprehensively and so is often used alongside DCMES.

These standards can be used to help model a metadata schema for a digital image collection. Third Light also maps imported metadata that conforms to the XMP (Extensible Metadata Platform) and IPTC (International Press Telecommunications Council) standards to corresponding fields in its own system. The mappings can be viewed in their support documentation. XMP is an ISO standard developed by Adobe, which allows content creators to embed metadata into their resources. IPTC is used specifically for describing digital objects within the newspaper and press industry. 2.ii.d Metadata quality The quality of metadata can be judged by various criteria. The overarching measure is how far it serves user needs. More specifically it can be judged on its comprehensiveness, accuracy, clarity and consistency (Hider, 2012, 77-82). Good quality metadata will include all the types of metadata needed by users (administrative, structural, preservation etc.) and subject metadata will be comprehensive enough to ensure that all relevant topics are indexed without this being too costly a process. Metadata will be accurate (i.e. free from spelling mistakes and misinterpretation of the subject of the resource) and up-to-date (i.e. any changes in how the resource should be described will have been made). And the same elements and values will be consistently used across resources to ensure good recall in searches. Metadata values should also be consistent with the kind of language users will use to search. As Lancaster (1992, 62) explains, consistency can be defined as how consistent an indexer is in applying metadata to the same document at different times (intra-indexer consistency) or how consistent the indexing of the same document is between different indexers (inter-indexer consistency). As the concepts represented by images have to be translated into words, there is likely to be a low level of inter-indexer consistency in applying subject metadata. However, as Lancaster (1992, 67-68) points out, controlled vocabularies can improve consistency in subject indexing if indexers are 'knowledgeable in the subject matter and fully familiar with the terms'. 2.ii.e Subject indexing

11

Lancaster (1998, 8) identifies two steps in subject indexing: conceptual analysis and translation. The first step, conceptual analysis, is deciding what the item is about. The second step, translation, is converting the conceptual analysis into a set of index terms. Conceptual analysis will usually be guided by an indexing policy on exhaustivity (how many concepts are indexed) and specificity (how specifically concepts are described). The indexing policy will have an impact on information retrieval, specifically how many relevant items are retrieved out of all relevant items in a database ('recall') and how many out of the retrieved items are relevant ('precision'). Exhaustive indexing leads to high recall but lower precision. Selective indexing leads to low recall but higher precision. As exhaustive indexing is expensive it is usually not a feasible option. However, given that images can mean such different things to different people it is perhaps better to err on the side of exhaustivity when indexing them. As regards specificity, Lancaster writes that 'the single most important principle of subject indexing ... is that a topic should be indexed under the most specific term that entirely covers it' (1998, 28). Subject metadata is essential for the discoverability of most digital image collections. And yet it presents particular difficulty to those indexing them. Images are so rich in meaning that it can be hard to know where to start when choosing descriptors. Hjørland and Nissen Pedersen (2005, 584) distinguish between a positivist and a pragmatic approach when classifying the subject of information resources. The positivist approach assumes that information has specific properties which can be objectively analysed. The pragmatic approach believes that users' goals, purposes, interests and values should influence classification. Whilst the digital images in the charity's collection undoubtedly have specific properties that can be objectively analysed, a pragmatic approach is also advisable, given how many different meanings an image can have and that the charity has limited time for indexing. Studies have shown that images can be broken down into different levels of meaning, which could help with describing them. The art historian, Erwin Panofsky, has been influential in these studies. He identified three levels of meaning in works of Renaissance art (Panofsky, 1962, as cited by Shatford, 1986, 43):

• pre-iconography: generic description of objects and actions, such as 'animal' or 'sleeping', or the mood of a work, for example 'peaceful'

• iconography: specific description of what is represented, often requiring familiarity with a specific culture, such as identifying a picture of a man, woman and child as a representation of the Holy Family

• iconology: the abstract meaning of a work of art drawing on pre-iconographical and iconographical information as well as knowledge about the artistic, social and cultural setting to which the work belongs, for example 'a Nativity scene'

Shatford (1986, 49) created a framework for analysing the subject of images based on Panofsky's research. This divides the subjects of images into three classes - 'generic of', 'specific of' and 'about'. 'Generic of' corresponds to Panofsky's pre-iconography level and is used to describe generic things such as 'man' or 'river'. 'Specific of' corresponds to Panofsky's iconography level and is used to describe specific things such as 'Julius Caesar' or 'Rubicon'. 'About' corresponds to the 'mood' part of Panofsky's pre-iconography level and his iconology level. It is used to describe the moods, emotions and abstract meanings of an image such as 'defiance', 'transgression' or 'fall of the Roman Republic'. Shatford's framework also includes the facets, 'who', 'what', 'where' and 'when'. The 3x4 matrix that her framework presents is sometimes referred to as the 'Panofksy-Shatford' matrix (Hollinck et al., 2004, 603). Shatford's study also distinguishes between a work (e.g. a painting by Gainsborough) and a 'represented work' (e.g. a photo of that painting). This is particularly relevant to the case study as the charity has several photos of artwork in its collection. Jörgensen et al. (2001) were influenced by the Panofsky-Shatford matrix in developing their own model for the classification of image descriptors. Their model classifies not just the subject of images (the 'semantic' levels) but also elements such as the colour and composition of the image (the 'syntactic' levels). The model is shaped like a pyramid with 'semantic' levels at the base of the pyramid and 'syntactic' levels at the top. The 'semantic' levels are thus wider, which is supposed to represent the greater degree of knowledge that is needed to describe semantic content. The 'syntactic' levels include (Jörgensen et al., 2001, 940):

12

• type/technique: general type of image (e.g. black and white/colour) • global distribution: spectral sensitivity (colour) and/or frequency sensitivity (texture) • local structure: elements such as dot, line, tone, colour and texture of individual components of the

image • global composition: specific arrangement or spatial layout of elements in the image

The 'semantic' levels include (from top to bottom of the pyramid): 'generic object', 'generic scene', 'specific object', 'specific scene', 'abstract object' and 'abstract scene'. Hollinck et al. (2004) developed a framework for classifying image descriptors that includes non-subject-metadata too. This framework divides image descriptors into three classes: 'nonvisual', 'perceptual' and 'conceptual'. 'Nonvisual' elements (taken from the VRA element set) include, for example, the date and creator of the image. 'Perceptual' elements are the basic, 'low-level' elements of an image such as its colour and shapes. And the 'conceptual' elements describe the 'high-level' concepts represented by an image. These can be further classified as 'generic', 'specific' or 'abstract' concepts and as either 'conceptual objects' (people or things) or 'conceptual scenes'. They can also be characterised as related to an 'event', a 'place' or a 'time' and/or as related to each other. 2.ii.f Vocabulary control Controlled vocabularies are predefined terms for indexing and searching for resources. They can either conform to a standard (e.g. Getty's Art and Architecture Thesaurus) or be created in-house. Their main function is to improve search results. Control of synonyms (e.g. making sure everyone tags a photo of a visit by the Duchess of Cornwall with 'HRH Duchess of Cornwall' rather than 'Camilla Parker Bowles'), singular/plural nouns and spelling mistakes improves recall. And control of ambiguous terms (e.g. 'walking' (support activity organised by centres) as opposed to 'walking' (fundraising event)) improves precision. Controlled vocabularies also eliminate redundant language, such as articles and conjunctions, to improve the precision of search results. However, controlled vocabularies are not just useful for searching. They can also help with indexing. For they can show to the indexer all the different possible concepts there are to index. There are different types of controlled vocabulary. Alphabetic vocabularies, for example keyword lists and thesauri, arrange terms in alphabetical order. Systematic vocabularies, such as classification schemes and taxonomies, bring terms related in meaning together. Systematic vocabularies usually have a hierarchical ('tree') structure, with the terms that are broadest in meaning at the top. The controlled vocabulary used to populate the Centre field in the metadata schema for the charity's digital image collection is an alphabetic vocabulary. For it consists simply of a list of the charity's centres in alphabetical order. The controlled vocabularies used to populate the Keywords, Event and Resource Type fields in the metadata schema are all taxonomies. They are informal hierarchies with the terms broadest in meaning at the top. The controlled vocabularies for the Keywords and Resource Type fields are attached in Appendix C (p.49). Systematic vocabularies vary in the rules they use to classify concepts. Sparck Jones (cited by Hjørland and Nissen Pedersen, 2005, 583) categorises classification systems as follows:

• monothetic - all members of each class share one or more properties • polythetic - members of a class don't necessarily share one or more common properties • overlapping - objects may appear in more than one class • exclusive - objects can only appear in one class • ordered - some systematic relationships between classes • unordered - no systematic relationships between classes

More formal classification systems, such as the Dewey Decimal Classification, are usually monothetic, ordered and exclusive. By contrast, taxonomies can be polythetic and overlapping, although they usually

13

show systematic relationships between classes. Taxonomies are thus usually more suited to corporate settings as they are more flexible. The opposite of controlled vocabulary is uncontrolled vocabulary (user-created terms). User-created keywords are also referred to as folksonomies, social classification and ethnoclassification.The advantage of folksonomies is that they do not require updating like a controlled vocabulary. As Matusiak (2006, 289) argues, the 'most important strength of social tagging ... is its close connection with users and their language'. Folksonomies are also suited to large, heterogeneous digital image collections such as photo-sharing websites like Flickr. In this environment, the sheer variety of images and the constantly changing nature of user language make folksonomies more suitable than controlled vocabularies. 2.iii Information Retrieval Information retrieval is the means by which information is retrieved to satisfy an information need. There are two main information retrieval systems relevant to digital images. The first involves a user making a query and the system searching a database of image metadata to find a match. The second is the system automatically searching aspects of the 'pixel domain' of images such as colour, texture and geometry (Enser, 2008, 536). The latter method is known as Content Based Image Retrieval (CBIR) and has been a growing field of research within computer science. The problem with retrieval by metadata is that it is difficult to represent the meaning of images in words. CBIR has various problems too though. For example, 'a colour-based CBIR algorithm will match busy city scenes containing beige brick backgrounds with scenes of desert sand, and a shape-based one might return images of the Statue of Liberty in response to queries seeking images of starfish - the so-called 'rhyming image' phenomenon' (Enser, 2008, 537). There is also the problem that users tend to prefer to search for 'high-level' concepts, such as people or objects, which cannot be analysed automatically by computers. Third Light's duplicate detection system might use CBIR as it 'examines files for patterns in image content, to infer when images are either identical or similar' (Third Light, 2015 (d)). However, it does not allow users to search using CBIR. There are various models for information retrieval by metadata. The 'exact match' model involves specifying certain conditions that a search must fulfil. For example, Third Light's Advanced Search allows one to specify a range of conditions, including: Condition Example Explanation IS Keywords IS 'Centre > People' The retrieved resource must be tagged with this

exact keyword in the Keywords field IS NOT Centre IS NOT 'Newcastle' The retrieved resource must not be tagged

'Newcastle' in the Centre field IS including children

Event IS including children 'National event > Run'

The retrieved resource must be tagged with this exact keyword and any child terms (i.e. narrower terms), e.g. 'National event > Run > London Marathon'

INTELLIGENT MATCHES

Caption INTELLIGENT MATCHES 'lymphoma support'

The retrieved resource must have a caption with the words 'lymphoma support' or words resembling them

IS MISSING Resource Type IS MISSING The retrieved resource does not have any value in the Resource Type field

Third Light allows one to create more than one condition and specify that either 'ANY' or 'ALL' conditions must be satisfied. This is an example of Boolean logic, which can also be expressed by the operators 'AND', 'OR' and 'NOT'. Boolean logic allows one to increase the recall (proportion of relevant documents retrieved) or precision (proportion of retrieved documents that are relevant) of search results. For example, searching for 'centre visitors AND tea' increases precision. And searching for 'tea OR coffee' increases recall. There is an inverse relationship between recall and precision, which means that increasing one will decrease the other. Nevertheless, being aware of this logic can help users have more control over their search results.

14

Another retrieval model is the 'best match' model. This involves the user entering a word or phrase into a search box and pressing 'Enter'. The system then uses an algorithm to judge the probable relevance of documents and rank them accordingly. The nature of the algorithm depends on the system. Google, for example, explains that its search algorithm considers things such as 'terms on websites, the freshness of content, your region and PageRank [how many links there are to a website]' (Google, 2015). Third Light does not explain what its criteria for relevance are. However, the 'best match' model traditionally judges relevance on how frequently terms in the search query occur in metadata and other searchable content (Belkin et al., 1982, 63). Third Light does give users options for how results should be ranked, for example by relevance and upload date in descending alphabetical order (default option) or by relevance and file reference number in descending order. Another approach to search is to browse folders and 'eyeball' resources to find what one needs. Third Light supports this by providing 'thumbnail' images so it is easier to scan a set of resources quickly. Metadata can also be consulted by rolling one's mouse over a thumbnail image, which displays a pop-up window with the main metadata, or by clicking on a thumbnail to view all the metadata attached to it. Another feature of information retrieval systems is the ability to modify or refine one's search. In Third Light, the user can refine their search by using the 'Refine Search' menu on the right-hand side of the search results page. At the top of the menu is a search box where users can enter a query that will be used to search within the set of retrieved resources. Beneath the search box is a link called 'Advanced Options', which enables the use of Advanced Search to search within the set of retrieved resources. The 'Refine Search' menu also presents metadata in the retrieved files as hyperlinked keywords, which can be clicked to retrieve other resources within the initial set tagged with the same keyword. This means of allowing the system to know which resources out of an initial set of retrieved resources are most relevant is known as 'relevance feedback'. There are various ways in which information retrieval systems can be evaluated. Firstly, users can be asked about their satisfaction with the system. However, whereas some studies show that there is a correlation between user satisfaction and system effectiveness (Al-Maskari and Sanderson, 2010), other studies show no such correlation (Sandore, 1990, as cited by Al-Maskari and Sanderson, 2010, 861). And some studies question user satisfaction as a measure of system effectiveness as users can sometimes be satisfied with results that do not match their initial query and 'because users tend to discount the contribution of the computer system when things go well and to blame the system when things go poorly' (Hufnagel, 1990, as cited by Al-Maskari and Sanderson, 2010, 862). As Al-Maskari and Sanderson (2010, 859) also point out, other factors can indirectly influence user satisfaction, such as their familiarity with the system, their experience of search and their knowledge of the subject domain. Another way of evaluating the effectiveness of an information retrieval system is to measure its precision and recall. As mentioned above, the formula for precision is the number of relevant documents retrieved divided by the total number of documents retrieved. And the formula for recall is the number of relevant documents retrieved divided by the total number of relevant documents in the database. As relevance is a subjective judgement, it can be quite time-consuming to work out recall and precision. Therefore experiments tend to use binary judgements - i.e. that documents are either relevant or not with no middle ground (MacFarlane, 2013). 2.iv Information Behaviour Information retrieval is about more than just systems though. Information seeking is an aspect of 'information behaviour' - how humans interact with information. 'Information behaviour' encapsulates information needs, information seeking and information use amongst other things. The study of information behaviour has led to many theories and models (analytical descriptions - sometimes flow charts or diagrams - of the entities and activities involved in information behaviour and the relationships between them). These theories and models shape the design of retrieval systems, metadata schemas and indexing policies and can

15

be used to predict and interpret users' interaction with information systems and services, which in turn can help with managing them. 2.iv.a Information Needs Information needs can be hard for people to distinguish or articulate. Thus, Belkin et al. (1982, 63) argue that people should be asked instead to describe the problem that they are seeking to resolve. For they argue that people find it easier to describe the problem they are working on rather than the information they think they need to resolve that problem. A problem can also be defined as a gap or 'anomaly' in someone's knowledge. Thus, Belkin et al. refer to 'anomalous states of knowledge' (ASKs) as the basis for information seeking (1982, 61). Belkin et al. (1982, 65) also suggest that 'the anomaly, and the user's perception of the problem, will probably change with each instance of communication between user and mechanism'. Thus, each set of search results might cause a change in someone's perception of the initial problem that they set out to resolve. Belkin et al. (1982, 65) argue that this shows the need for information retrieval systems to be able to adapt to changes in user ASKs: 'This dynamism implies that information systems ought to be highly iterative, and interactive.' They acknowledge that relevance feedback (users instructing the system how to refine their initial search) allows users to change their search criteria as they go along (1982, 65). As previously discussed, deciding which metadata elements and values to use in a metadata schema ('metadata modelling') should take into account users' information needs. Likewise, indexing policy (deciding how specifically and exhaustively to index concepts) should be informed by information needs. Lippell (2015, 64) suggests conducting 'informal research exercises' to establish user needs in the building of a corporate taxonomy. For example, asking users to keep task diaries - accounts of the kinds of tasks they carry out - can indicate what information they need. Within Third Light, analysis of search logs can show what kinds of queries users are making, which in turn can inform metadata modelling and indexing policies. Analysis of task diaries and search logs could be used to create 'personas' and 'scenarios' - descriptions of typical users and their needs. And sorting cards into categories can give an idea of how users structure concepts and what language they use (Lippell, 2015, 64). 2.iv.b Information Seeking Several models specifically describe information seeking in the workplace. For example, Hansen's model (2005) describes information-seeking tasks and/or information-retrieval tasks as 'embedded within the work task itself' and the work task as part of wider organisational and social contexts. Leckie, Pettigrew and Sylvain (1996, 183) also observe that the individual's context - 'such as age, career stage, area of specialization, and geographic location' - can influence information seeking. Byström (2005) relates task complexity to information types sought and information channels/sources used. She observes that as perceived complexity of the task increases, 'people tend to acquire more types of information; and ... they are less certain to predict what types of information are necessary to acquire', and 'people in the role of experts are relied to an increasing extent for acquisition of all types of information'. Fidel and Pejtersen's (2005) 'cognitive work analysis' model identifies several dimensions that can be used to frame analysis of cognitive work: 'work environment', 'work domain', organisational structure/values, work tasks and 'actor's resources and values'. This framework and the examples of questions to ask in analysis which they provide can be useful in designing research into information behaviour in the workplace. Other models of information seeking describe in more detail the kinds of search that users can undertake. For example, Bates's 'berrypicking' model describes the kind of search where users' queries change as they go along, using the analogy of picking berries to describe how the search is carried out. As she puts it, 'the query is satisfied ... by a series of selections of individual references and bits of information at each stage of the ever-modifying search' (Bates, 1989, 410). And Morville and Rosenfeld's (2006, 37) 'pearl-growing' model describes the kind of search where users start with one or a few good documents that are exactly what they need, then try to find more of the same.

16

Browsing is another common type of search. As Case (2012, 100) points out, browsing can refer to a wide range of information behaviours, 'ranging from aimless scanning to goal-directed searching'. It can be used for 'getting an overview or sample of the information in a collection ... finding one's bearing in a subject of which one knows little ... selecting the 'right' information from a large collection of 'relevant' material ... [and] looking for inspiration, new ideas, or just something interesting; i.e. allowing for serendipity' (Bawden and Robinson, 2012, 150). In a work environment, information seeking is also often a collaborative activity. As Shah (2014, 218-219) points out, the generic term 'collaboration' can be broken down into different activities: communication, contribution, coordination, cooperation and collaboration. Essential to all collaboration is communication, which can be facilitated by email, for example. Third Light supports this by allowing users to email files to each other from the DAM. Contribution is similar to communication but is specific to an environment specially designed for information sharing, such as an online forum. Coordination is the connecting of 'different agents in a harmonious action', such as a conference call. And when agents in a coordinated activity also follow some rules of interaction, such as Wikipedia editors following certain rules about what can be written in an article, the activity is 'cooperation'. Finally, 'collaboration' is the highest-level activity in that it involves elements of all the other activities. Third Light supports collaboration in information seeking through Lightboxes - hidden folders that can be shared with selected users. These allow users to share and review files for the completion of a common goal, such as the design of a publication. Users can communicate by leaving messages for each other in the Lightbox and draw each others' attention to files by flagging them with a coloured flag symbol. And, the fact that Lightbox users do not have to collaborate in real time facilitates successful collaboration. For, as Shah (2014, 219) points out, a supportive environment is one where 'participants should be able to evaluate the discovered information without always consulting others in the group'. 2.iv.c Information Use Information use has not been studied as much as information needs and seeking. Nevertheless, cognitive processes related to information use are described in some theories of information behaviour. Kuhlthau's Information Search Process (Kuhlthau, 2005) describes the cognitive processes of exploration, choice of themes and selection of relevant information as well as the feelings that are experienced during these processes. And Dervin's Sense-Making methodology, with its emphasis on information seeking as a process of actively creating meaning, implies a symbiotic relationship between information seeking and use (Savolainen, 2009, 194). Information processing is also the subject of study in consumer research (how consumers choose which products to buy); these studies highlight the Need For Closure, which can be defined as 'an individual's desire for a firm answer to a question and an aversion towards ambiguity' (Savolainen, 2009, 198). Cognitive processes generally found in information use include comparing, decision-making, thinking, interpreting, gaining insights and synthesising (Savolainen, 2009, 203). 2.iv.d Related Theories Other theories relevant to information behaviour more generally can be cited. For example, the Principle of Least Effort developed by philologist, George Zipf, maintains that people will expend the least amount of effort necessary to achieve something. Evidence for this theory can be seen, for example, in libraries and office filing systems, 'in which people tend to use, borrow, or cite the same documents again and again' (Case, 2012, 175). This tendency is sometimes referred to as the '80-20' rule because 20% of documents account for 80% of the use. Additional evidence for the Principle of Least Effort is in people's preference for finding out information from colleagues and peers rather than more formal sources, which might be harder to access or use. Information overload is a potential result of working with large digital collections. Information overload can be defined as 'the state of an individual or system in which excessive communication inputs cannot be processed, leading to breakdown' (Rogers, as cited by Case, 2012, 115). Strategies used to cope with

17

information overload include filtering it, being less discriminating in one's selection of and/or response to it and failing to process some of it (Miller, as cited by Case, 2012, 116). However, these strategies can lead to lower quality work, particularly if errors are made or relevant information is avoided altogether (Case, 2012, 117). Whilst one can be overstimulated by information, it is still a stimulus that one naturally seeks. Information is necessary to pursue most human activities and, far from being stressful, can aid recreation and relaxation. The state of being profoundly absorbed in an activity is characterised by 'flow' - a pleasant state of mind in which one is so involved in doing something that one does not notice the passage of time. This has been linked to information use by, for example, Chen et al. (2000) who study users' flow experiences while surfing the Web. Information can also be linked to recreation (Case, 2012, 120-127) and creativity. Bawden and Robinson (2012, 275) discuss the ways in which information can support creativity, for example 'emphasis on browsing facilities' and 'representations of information to bring out analogies, patterns, exceptions, etc.' 2.v User Experience User experience (UX) can be defined as the experience of using a system and/or attitudes towards its usability. Elements of user experience design include 'visually pleasing and interactive design, an information architecture that presents information in an organized fashion ... accessibility, HCI [human-computer interaction], ergonomics, utility and performance' (TechTarget, 2015). As Schopflin (2015, 6) observes, the arrival of the Web in workplaces during the 1990s and early 2000s gave end-users more direct access to information. This perhaps led to a greater focus on user experience design in computer systems. Third Light DAM incorporates many aspects of UX design, as will be discussed below. 2.v.a Information Architecture Information architecture is 'the art and science of shaping information products and experiences to support usability and findability' (Morville and Rosenfeld, 2006, 4). In other words, it is the way information is organised, labelled and indexed to make it discoverable and easy to use. Good information architecture depends on an understanding of information-seeking behaviour. Thus, this section will refer back to some of the concepts introduced in the previous section. To support the 'berrypicking' (i.e. non-linear, iterative) model of search, Morville and Rosenfeld (2006, 37) recommend that a system should facilitate moving from search to browse and back again. Third Light does this by presenting an initial set of results as thumbnail images for the user to browse and by allowing the user to search within an initial set of results using the search engine. Furthermore, if a search is refined it provides a 'breadcrumb' trail of the terms that have been used in the search so far with the option to click on any of the breadcrumbs to delete a term from the query. It is also easy for the user to just start a new search by clicking on 'Search' in the left-hand menu. To support the 'pearl-growing' model of search (where users start with a good document and try to find more of the same), the system should allow users to find related documents. For example, Google supports this kind of search by providing a 'Similar Pages' command next to each search result (Morville and Rosenfeld, 2006, 37). And Third Light allows users to click through from a 'good' document to documents indexed with the same keyword via hyperlinked keywords. Third Light also allows users to manually link related documents, which can then be accessed by the 'Related' tab on each document's record. According to Russell-Rose and Tate (2013, 34), the 'sensemaking' model can also be supported by memory aids such as tools that allow one to gather potentially relevant documents into a single collection. 'Sensemaking' (a model for information-seeking developed by Dervin (2005)) involves using subjective thoughts, ideas, beliefs, values, feelings and memories to interpret the world. As an example of Russell-Rose and Tate's theory, Third Light's Lightbox tool could be seen as supporting 'sensemaking' by allowing users to gather potentially relevant documents into a single folder and review them.

18

Morville and Rosenfeld (2006, 34) categorise information needs as follows: known-item, exploratory and exhaustive. Known-item searches are when the user knows what they are looking for, what to call it and where to find it. This type of search can be supported by keyword search and/or browsing. Exploratory search is when the user is not exactly sure what they are looking for or how to articulate it and search is typically open-ended. This type of search can be supported by folders for browsing, thumbnails for scanning, options for refining one's search and suggested query terms via controlled vocabularies. Finally, exhaustive searches are when the user is looking for everything on a particular topic. This type of search can be supported by good recall, which in turn is supported by consistent, comprehensive indexing. 2.v.b Interface and Graphics The interface and graphics of information systems can affect their usability in various ways. Firstly, consistency is important in the design of an interface, for, as Levinson and Schlatter (2013, xiv) write, 'like spoken language, visual language needs to define conventions and use them consistently to be understandable'. Secondly, ensuring the right elements are given prominence can help users find the information they need (Levinson and Schlatter, 2013, xv). Thirdly, layout, colour, type and imagery ('visual usability tools') can be used to, for example, 'create contrast, draw attention, and provide valuable information without overwhelming the user' (Levinson and Schatter, 2013, xvii). Finally, the controls available to users (e.g. buttons/sliders) and how obvious their properties are to users (e.g. their ability to be clicked/dragged) also affect usability (Levinson and Schlatter, 2013, xviii). The ability for users to change the interface to suit themselves ('personalisation') is also increasingly common. For example, in Third Light, users can choose how many results are displayed per page or in what order metadata fields are arranged. 2.v.c Performance The performance of a system is also important to user experience. What is the 'server-response' time (Dubie, 2006) - i.e. how fast does the system respond to user commands? Has the system ever gone down? What technical problems have users experienced? How many times do software updates need to be applied? Service Level Agreements (i.e. the contracts between service providers and end users that define the expected levels of service) can be used to measure whether a system is performing as it should be. And the field of Applications Performance Management (APM) measures the performance of systems but is more focused on server-response times than problems specific to a piece of software, as the APM Model developed by the technology research firm, Gartner, shows (Dragich, 2012). 2.vi Digital Asset Management Digital asset management is the management of digital resources to ensure their effective use and secure storage and disposal. The nature of the assets will determine what kind of management they require. The lifecycle of documents can provide a framework for how to manage them. Documents are said to have a lifecycle because they go through phases, from creation through to archiving or disposal. The lifecycle of the digital images belonging to the charity could be described as follows:

• acquired • selected • assigned folder • indexed • accessed • used • evaluated • destroyed or archived

19

Keeping any controlled vocabularies up-to-date and relevant to user needs is an important part of digital asset management. Writing about corporate taxonomies, Lippell (2015, 73) argues that 'the absolute minimum that is necessary is to ensure that the taxonomy has an owner from an early point, who is a named point of contact for queries and information.' This 'owner' would also 'have ultimate authority to accept or decline change requests'. She also points out that control of the taxonomy can be federated among different groups, which can 'have the advantage of giving responsibility to users who are experts in their area'. However, having more than one 'owner' of the controlled vocabularies could also lead to confusion if people add different terms to describe the same concepts or if terms are put in the wrong place. This suggests that having one overall 'owner' of the controlled vocabularies would be a good idea to oversee edits and answer queries. According to Broomfield (2009, 119), a key requirement of the DAMS at Museum Victoria in Australia is 'controlled access to images based on user privileges'. Many DAM systems, including Third Light, will enable this. Images could be archived by putting them into 'inactive' or 'dark' storage (Keathley, 2014, 31) in the DAM. Or they could be put on a hard drive. However, in the interests of making the archive more widely available, it could also be deposited with a cultural heritage institution, such as the archive of the Royal Institute of British Architects or the Historic England Archive. However, these archives might have specific requirements as to what is deposited. Firstly, they might not be able to acquire more than a certain number of assets. This would mean that they would ask whoever took the photos - and, if possible, the architects and certain staff at the charity - to choose a representative set of images. Secondly, the images would need to be in the TIFF file format as this is the most acceptable format for archiving (Dickinson, 2015). Thirdly, they might require at least some agreed rights and/or for rights to transfer to them after a certain period. To help them document the collection, a minimum amount of metadata is also important - date taken, caption, copyright and name of photographer are all essential (Dickinson, 2015). And in the interests of preservation, prints might be required as well as digital copies. This is because prints are currently predicted to last longer than digital files, which will require 'costly future manipulation to come up to archival standards' (Leith, 2015). 2.vi.a Preservation Preservation is a key aspect of digital asset management. This includes both the preservation of assets and any metadata describing them. Digital preservation presents many challenges, including technological obsolescence (i.e. technology no longer being available to process/display data) and loss of data integrity (i.e. the data being manipulated or substituted or simply deteriorating over time) (British Library, 2013, 9). Using file formats that will be viable for longer periods and keeping track of developments in file formats is one method of combatting technological obsolescence. Ultimately files can also be transferred ('migrated') to new formats so that they can be used with new hardware or software. And, as Leith (2015) points out, printing out digital files is another way of preserving them, athough this would probably not be feasible for all the files in the charity's DAM. To preserve data integrity, any changes to files (e.g. cropping or changes to metadata) should be noted, as Third Light does with its 'Revision' and 'Audit' logs displayed with each item. Loss or damage to the data is another risk that needs to be managed. The charity's DAM is hosted by Third Light, which means that, to a certain extent, the risk of lost or damaged data is out of the charity's control. Nevertheless, the charity should check that the servers used to host the DAM are kept in well-ventilated rooms on cooled racks with fire suppression systems (Keathley, 2014, 29). The charity's DAM is accessed via the Internet (the 'cloud'). A risk associated with cloud technology is cyber attacks. Although this risk is probably low in the case of the charity's DAM, preventative measures can be taken. Third Light protects its data using HTTPS (a protocol for communicating securely over a network) and encryption (Third Light, 2015 (e)). In case of data loss, Third Light backs up its data on a separate server in a different location 'for geographical redundancy' (Third Light, 2015 (e)). Back-up copies could also be kept on hard drives.

20

However, hard drives will deteriorate eventually so should not be the only way in which data is backed up (Keathley, 2014, 32). Ensuring that data can be transferred to a new system if necessary is also important. Most systems will have procedures for how to migrate files. However the transfer of metadata usually has to be done manually by exporting it to Excel and then importing it into the new system. This would be the case for Primary metadata (i.e. the metadata applied by users) in the charity's DAM. 2.vi.b Digital asset management systems Digital asset management depends not just on the assets themselves but also on the software available to manage them. There are many types of digital asset management system, including commercial, open source and in-house systems (Keathley, 2014, 17). Third Light is a commercial media asset management system (MAM) or Digital Asset Management System (DAM). There are many advantages to using a commercial system. As 'off-the-shelf' products they are ready to use immediately and can be customised to suit individual preferences. Support is provided over the telephone and by email. Cloud providers can also offer to host the software, which eases the burden on in-house IT teams and can be cheaper due to economies of scale. The disadvantage of commercial systems is that in a rapidly changing marketplace, vendors are not guaranteed to stay in business. At best, this risks wasting time having to change vendor and at worst it poses a risk to the security of clients' data if a vendor goes bankrupt. By contrast, an open-source alternative is free and supported by a developer community. However, it would take time and a dedicated IT team to set up. Furthermore, few open source alternatives currently specialise in image management (Sarwan, 2014). In-house systems are similarly expensive to develop and implement. 2.vi.c Training Training is another key aspect of digital asset management. Systems like Third Light are designed to be user friendly and users can always consult the manual or email the company for support. However, targeted training can allow users to get more out of the system. For example, users could be trained in how to use Advanced Search to have more control over their searches. Knowledge about the metadata schema also needs to be transferred to whoever is responsible for creating metadata and maintaining the controlled vocabularies. Sarkanen and Stoddard (2015) recommend identifying 'a knowledge gap - and a desire to learn' before starting training, adapting it to the information literacy of the audience, choosing convenient times (for example, lunch or before work in a busy office) and marketing the training well. They also recommend training being part of new staff's induction, which is already the case at the charity. 3. Methods This is a case study of a digital image collection belonging to a charity. The aim of the study is to gain a greater understanding of the case and potentially ideas for ways in which it can be developed or improved. It is hoped that these findings may be applicable to other, similar cases. It can be described as an intrinsic case study as it looks at all phenomena relevant to the case as opposed to one in particular (Pickard, 2007, 86). These phenomena include information organisation, information behaviour, user experience and digital asset management. Specific themes emerging as particularly significant are also explored in more detail. The case is grounded in a review of the academic and professional literature and relates its findings back to relevant theories explored in the literature review. This case study falls into the interpretivist research paradigm. Broadly speaking, interpretivist research aims to describe and reflect upon the world whereas objectivist research aims to explain and predict it. Interpretivism stems from the idea that 'realities are multiple, constructed and holistic' (Pickard, 2007, 12) and that understanding is achieved by studying the contexts that give rise to these multiple realities. The results of interpretivist research are detailed descriptions of data in context that can be used to prompt reflection and understanding of entities within similar contexts. By contrast, objectivist (or positivist)

21

research tests hypotheses about what is assumed to be a single, stable reality to draw conclusions of general applicability. The interpretivist approach was chosen as it was felt that a holistic picture of the case with some in-depth analysis would be more useful in guiding practice. Interpretivist research also allows the emergence of new themes from the data to prompt changes to the research design. As the researcher was not sure at the outset what would be the most significant aspect of the case for research, emergent research design was preferred to a more linear, experimental approach. Pickard (2007, 87-91) identifies two stages in case study research: 'orientation and overview' and 'focused exploration'. The former uncovers the main issues relevant to the case study and the latter researches these main issues. Throughout, the researcher is open to revisiting and/or abandoning themes. The account of methods and results has been divided into the following sections: 'orientation and overview' and 'focused exploration'. This is because the methods at the 'focused exploration' stage depend on the analysis of results from the 'orientation and overview' phase. Also, throughout the 'focused exploration' phase, the discovery of new themes might mean the methods change. So it is clearer to give an account of the methods used and results obtained at each stage rather than separating the methods and results into separate sections. The methods used in the case study are mainly qualitative. This is because these methods are better suited to capturing complex descriptions. Where possible, multiple data collection techniques and multiple sources of evidence ('triangulation') are used to give credibility. Participants have checked both the interview transcripts and the researcher's interpretation of them to confirm their representativeness of what was said. To get a sample of users of the DAM, the researcher asked one of the staff members at the charity's head office to circulate an invitation to all users of the DAM to participate in the case study. Five members of the Marketing and Communications team at the head office volunteered. In the 'focused exploration' phase, the researcher also invited a user who had not initially volunteered to participate in the case study. This user works in one of the regional offices and is also a member of the Marketing and Communications team. The researcher contacted her because she knew that she was a frequent user of the DAM and because she wanted her sample to include a user at a regional office as well as the head office. The researcher included herself in the sample as an expert user of the system. She considers herself an expert user because she designed the metadata schema and used the DAM intensively during her internship. According to the charity, the main users of the DAM are the Marketing and Communications and the Fundraising teams at the charity's head office and eighteen centres. The Events and PR teams at the head office also use it. The sample would therefore have been more representative if it had included members of the Fundraising, Events and PR teams. Although the data is presented as objectively as possible, there might be some subjective bias given that the researcher created the metadata schema and controlled vocabularies that are being studied. 4. Orientation and Overview Phase 4.i Methods The orientation and overview phase of the case study had two parts: interviews and log analysis. The researcher used this phase to get an overview of the case and to inform how she would proceed with the rest of the case study. The interviews were with five members of the Marketing and Communications department. The researcher added her own insights by filling out answers to the interview questions in the interview transcript. The codes for the participants (used throughout the dissertation) are as follows:

• P1: Marketing Coordinator • P2: Publications Manager • P3: Website and Social Media Editor • P4: Digital Production Coordinator

22

• P5: Marketing Coordinator • P6: Researcher

The interviews were designed to explore those themes that the researcher had predicted would be most relevant: information needs, seeking and use, metadata, user experience and digital asset management. They were semi-structured, which allowed the researcher to use the insights from the literature review and prior knowledge of the case to get information more likely to be significant but also left room for participants to provide other information they thought would be useful. The interviews were conducted in person at the charity on 29 July 2015 and lasted about 30 minutes each. The interview questions are attached in Appendix E, p.55. Logs allow one to see how users interact with the DAM. Third Light generates three types of log - 'Audit', 'Download' and 'Search'. Audit logs show a variety of ways in which the user has interacted with the DAM, from putting files into a folder to editing the controlled vocabularies to applying metadata to a file to uploading files to creating folders to logging in and out. Search logs show what search terms have been used, who has performed searches, when and how (i.e. whether they used Advanced Search or General Search). And download logs show which users have downloaded which assets and when. The logs can be exported to Excel to analyse the data. An account of how the logs were analysed and the full results of this analysis are attached in Appendix F, p.57. The participants in the log analysis were the same as the participants in the interviews. However, logs of the researcher's activity were not analysed. The logs only go back as far as a year and in the last year the researcher has only used the DAM for research so it was not felt that her use of the system was representative of the actual use of the system. Participants were informed about the log analysis, which might have influenced their behaviour. However, any distortion this gave to the true picture of their use of the system was a necessary price to pay for being open with them about research methods used. To improve the reliability of the search log analysis, more data could have been analysed. 4.ii Results Background The participants have been using the DAM for between 3 months and 2.5 years. The Website and Social Media Editor and Digital Production Coordinator have used similar software before. Everyone except the Publications Manager had taken part in the training in how to use the new metadata schema and controlled vocabularies that the researcher ran in November 2014. Information behaviour Participants' information needs arise primarily from the tasks they have to complete. Images are used to promote the charity on the website, in newsletters, on posters in hospitals, in the charity's quarterly magazine, social media posts, email campaigns and at centre events and fundraising activities. Participants also occasionally need to find images relating to the past work of the charity, showing the importance of the images for preserving the memory of the charity's work. And images can help staff to get a better knowledge of the charity and are interesting to them - most participants have searched the collection out of interest or would if they had more time. The interviews also reveal that information needs are liable to change during the search process. The interviews suggest that participants still find folders useful as a way of organising the collection and making it searchable. Five out of six participants search folders to find what they need and P5 said she knows the collection so well she hardly ever uses the search engine. And yet P2 finds searching the folders frustrating as a search method because he is not sure whether all relevant information is in one folder. And

23

three out of six participants said they sometimes felt frustrated by the amount of images they had to sift through to find what they need. So participants are aware of the limitations of searching folders as a search method. Furthermore, as the collection grows, this search method is going to become more difficult, especially if the user does not have a good knowledge of the collection. Nevertheless, the popularity of Collections shows how folders can still be useful for organising assets around a particular theme, for example, 'best exterior images'. Most participants do not have much time to find what they need so the precision of search results is important. The fact that most of them find what they need on the first page of search results suggests adequate precision of results. However, 'refine search' tools are used by four out of six participants, which shows that users sometimes have to refine their searches to get more precise results. Search log analysis shows that Advanced Search is used a lot less than General Search. And yet Advanced Search would give users more control over their searches because of the ability to search using controlled vocabularies and to specify various conditions that results must fulfill. Smart Folders, which are currently not used at all, would also save users time if they are often doing the same searches. All participants except P6 have found information by chance, which suggests that the system supports serendipity. And the fact that, according to P1, users tend to use the same images a lot (for example, the same photos of the charity's patron, the Duchess of Cornwall) supports Zipf's 'principle of least effort'. The search logs show that participants mainly search for conceptual information (i.e. the subject of images). The people concept is the most popular for searches, showing the importance of indexing who is in photos. The log of the most opened files and folders also shows that images related to centres and people are most popular. As regards non-visual information, the search logs show that file name, file reference number, resource type and orientation are the most searched for types of information. Versions of the charity's logo are also in the top 20 most opened files, which shows the popularity of this resource type alongside photos. The search logs do not show any searches for visual information (i.e. the colours or composition of images). This is perhaps because users do not think that colours and composition are indexed and so do not try and search for them using the search engine. However, colours are in fact described in the metadata of various resources and are included in the controlled vocabulary for the Keywords field. Despite the lack of searches for visual information in the search log, the interviews reveal that five out of six participants consider visual information important to their searches. Given how difficult visual information is to describe in words - particularly the composition of photos - perhaps this information is best sought by users looking at images directly. However, Content-Based Image Retrieval might also provide a way of automatically searching for photos of a certain composition or colour range in the future. Use of information is hard to ascertain given that what can be measured (number of times the file has been downloaded or emailed) does not necessarily indicate that the information has been useful. Log analysis shows that on average participants downloaded forty-one assets each over a three-month period. One participant downloaded more than 160 files and another participant did not download any files. In the interviews, four out of six participants said they use Lightboxes (hidden folders that can be shared with other users) and two participants usually email resources that they have found to other users. Images have also been shared with external users through Events (folders displayed on the login page of the DAM) and publishing via a URL. This shows the importance of collaborative information use. In terms of cognitive processes, P3's feelings of confusion when not sure which asset to use shows that participants sometimes find the decision-making process in information-seeking hard. Metadata Analysis of the audit logs shows that new files are added every couple of months. Files are sometimes uploaded in large batches (i.e. many assets at a time). Further research is needed to assess the quality of the metadata applied to these assets.

24

Participants think that the metadata schema allows them to describe assets adequately. However, the purpose of some of the metadata fields is unclear. Participants were asked to look at a list of the metadata fields displayed on the 'File Console' for each resource (see Appendix B, p.48). The File Console is the page which displays the image, its metadata and any changes to the file or its metadata. Metadata fields are divided into Primary and Secondary Metadata and File Info. Primary metadata fields are displayed more prominently. File Info is mainly technical metadata that is populated automatically. Participants were asked to indicate if they were not sure of the purpose of any of the metadata fields. Two participants were unclear about two of the Primary Metadata fields - Resource type and Special Instructions. The other metadata fields whose purpose was unclear were File Info fields for technical metadata or information about file usage (i.e. number of times a file has been downloaded or emailed). Uncertainty about the purpose of fields might undermine their use for identification and selection purposes and uncertainty about the Primary Metadata fields might mean that they are not filled in correctly.

Another obstacle to users being able to describe assets adequately using the metadata schema is that sometimes the controlled vocabularies lack terms. Analysis of the search log showed that 14% of queries sampled were not in controlled vocabularies when they could have been. Some uncertainty was expressed in the interviews as to how to index resources to best meet user needs. For example, participants were unsure how specifically to describe the subject of resources. Also, the search log shows that P1 tried to find a photo representing hope - 'just in terms of being joyful etc' - but in her interview P5 said she did not think that the mood of images should be indexed. Research could therefore be done into how to index resources to best meet user needs, looking at questions such as whether to index the mood of resources and how specifically and exhaustively to index the subject of resources. The interviews also revealed the impact of the changing nature of the charity's organisational language on the metadata schema. For example, P5 explained that 'centre visitor' is now preferred to 'centre user' to describe visitors to the charity's centres and that she edits any use of the latter term to describe assets on the DAM. P5 also changed the controlled terms for describing images of the charity's programme of support to conform more to the organisational language. She did this by classifying support activities as 'practical', 'emotional' or 'social'. Similarly, she added a new term ('Centre > Detail > abstract') to one of the controlled vocabularies because of a new branding policy, which aims to use more abstract images and illustrations. The problem with changing the metadata schema to reflect changes in organisational language and policy is that assets tagged with the old metadata then need to be updated. This can be done fairly quickly in Excel but if decisions about how things should be referred to change often then it could become more time-consuming.

0123456

Metadata'ieldswhosepurposeisunclear

Numberofparticipantswhoareunclear

25

Two participants said in their interviews that they do not look at metadata at all when searching for resources. This undermines the idea that metadata is useful for identifying and selecting resources in the case of these participants. However, two other participants said that they use metadata to identify people in photos and one participant that he uses it to find the file size of resources, which shows that it does serve identification and selection purposes some of the time. The problem of images lacking subject metadata was also discussed. Participants think it would be worth adding it but it is not clear how this could be achieved. Many images added before the new metadata schema was introduced also have subject metadata that needs to be converted into controlled terms. However, participants are uncertain whether this would be worth the time as the older photos are used less. Further investigation is therefore needed to decide what to do about resources lacking subject metadata and old subject metadata that needs to be converted into controlled terms, although the latter is less of an immediate priority to participants. Despite participants' lack of use of controlled vocabularies for search, all those who answered the question about whether controlled vocabularies are worth the time it takes to maintain them replied that they thought they were. This might have been because they did not want to upset the researcher who had created the controlled vocabularies. However, it shows at least some commitment to the idea of having them. And P5 said she finds the controlled vocabularies help her apply metadata. P6 said controlled vocabularies do require some investment of time and effort. However, they can improve search results by controlling synonyms, homonyms etc. and by showing broader/narrower terms. They can also remove some of the cognitive burden from users applying metadata because they provide ready-made indexing terms. And they make it easier for the charity to embed organisational language in the metadata applied to assets. So, on balance, she thinks it is worth keeping controlled vocabularies for all four metadata fields where they are currently in use. Furthermore, all participants saw potential in controlled vocabularies to be used as information-giving tools in a broader context but only if they are kept up to date. Whether or not controlled vocabularies are being kept up to date is a subject for further investigation. User experience Participants generally find the DAM user-friendly and that the support offered by Third Light is good. P1, P2, P3 and P4 all find the interface, graphics, design and navigation 'fine' or 'good'. P5 finds the interface (for example, when working on the back-end aspects) and navigation can be a bit 'clunky'. P4 also finds 'going back to the search results once you've clicked on a photo is hard but it's only a minor thing'. P6 finds browsing the controlled vocabularies when choosing controlled terms for searching is difficult. This is because the controlled vocabularies for the Keywords and Events fields are long and the interface only displays them in a small pop-up window. The importance of applying regular updates to the software was revealed by P6's description of a technical problem she experienced when editing the controlled vocabularies. The problem was caused by a bug, which caused the top-level terms in one of the controlled vocabularies to switch round. This could have been prevented by applying the latest software update. P5 has also experienced problems with sending emails from the DAM. For example, she said that she emailed herself four files recently and they did not come through. Participants did not describe their experience of using the system in much detail. Perhaps this is because it is difficult for users to assess their experience in the abstract. Usability testing might therefore give more useful results. Digital asset management

26

It seems that different departments (Marketing, Events, PR) acquire assets and select which ones to put on the DAM. According to P5, the selection process involves deciding which assets are 'best' or 'will be widely used and ... represent [the charity] as we want it to be represented'. According to the audit log, P5 uploaded files until she left in summer 2015. Since P5 left, out of the participants in the interview, only P3 has uploaded files. According to P5, the PR team also upload photos and sometimes struggle with applying metadata, which suggests that better training is needed and possibly that metadata should be checked by an 'expert' user (the DAM enables uploads and their metadata to have to be approved by an 'admin user'). The audit logs show that P5 has been particularly active in folder management, including moving files to different Folders and adding files to Collections. She has also edited the properties of Folders, Collections, Events and Lightboxes, and deleted Folders, Lightboxes and a Collection and recycled a Collection. Two other participants have moved files to different Folders and one other participant has deleted a Folder. The moving of files to different Folders suggests that either the files were not assigned the correct Folders in the first place or that the Folder structure has been changed. The extent to which users find the Folder structure clear and the reason why Smart Folders are not used at all could be looked into further. None of the case study participants have specific responsibility for the metadata schema (i.e. keeping the metadata fields and controlled vocabularies up-to-date and relevant to user needs). P5 said that she updates the controlled vocabularies when it comes to uploading photos, if necessary. One of the participants said that she has found that staff in regional centres do not always know about the DAM or have an account. Thus access to the DAM is not as wide as it could be. Promotion of the DAM could therefore be increased to ensure it is being used as widely as possible. The audit logs show that two of the participants delete files. It is not clear though whether there is an evaluation process in place to decide which files need to be deleted, except that two users said they delete duplicates. The decision about what to do with files once they are no longer actively used is also unclear. As regards preservation, further research is needed as to whether the file formats in the DAM are viable in the long term and what measures are in place to monitor this. The risk of data loss or theft is managed by Third Light who host the data on their own servers. However, the charity can check whether they think this risk is being managed properly. According to P6, if assets ever had to be migrated to a new system, it would be possible to export the metadata automatically via Excel and then tie it up with assets in the new system. However, the controlled vocabularies would have to be manually transferred to the new system (assuming the new system had a controlled vocabulary feature) and the structure of the controlled terms would have to be checked to make sure they matched the structure of controlled terms in the imported metadata. So having controlled vocabularies could make it harder to transfer assets to a new DAM. As regards training, one participant said she has trained staff in the Scottish centres in how to use the DAM. In November 2014, the researcher also ran training for staff in the head office in how to use the new metadata schema, which was attended by all participants except P2. It is not clear whether there is a system in place for training and if so what it involves. 4.iii Discussion Overall, the interviews and log analysis have allowed the researcher to get an overview of the themes she thought would be most significant - information behaviour, metadata, user experience and digital asset management. This section analyses the findings of the 'orientation and overview' phase, using the questions outlined in the 'Aims and Objectives' section as a framework. It also identifies those areas that will be explored in more detail in the 'focused exploration' phase. Since the introduction of the new metadata schema and folder structure,

27

• is metadata consistently, accurately and fully applied? • does metadata provide the necessary information for staff? • are users able to find the resources they need? • is the folder structure clearer?

The first question to address is the impact of the new metadata schema and folder structure. Participants said that the new metadata schema allows them to describe files adequately. However, further research is needed to see whether metadata is being consistently, accurately and fully applied, especially as files are sometimes uploaded in large batches. And the purpose of some of the metadata fields (both in the new schema and existing technical metadata fields) is unclear to some users, which could be an obstacle in applying metadata and/or using it for search. The costs and benefits of the controlled vocabularies were weighed up, with participants in favour of keeping them. And the interviews revealed that metadata is useful for identification and selection purposes some of the time. It seems that users are able to find what they need using the search engine, generally on the first page of search results. They sometimes have to refine their searches though. Five out of six participants also use folders to find what they need. Indeed, one participant hardly ever uses the search engine, preferring to search the folders instead. This shows how frequent users of the DAM can build up a very good knowledge of the collection, which allows them to quickly find what they need without using the search engine. The more the collection grows, the harder it will be to accumulate this kind of knowledge though, especially for new users. Participants are also aware that it is harder to search folders exhaustively (i.e. find all relevant results). The interviews did not ascertain whether participants find the folder structure clearer, so this could be investigated further in the 'focused exploration' phase. What information behaviour do staff display in relation to the digital image collection? Information behaviour was explored through questions on participants' information needs, seeking and use. These revealed that participants mainly need to find images that will promote the charity's work. However, the collection is also useful as a record of the charity's activities. And information needs are liable to change as users carry out their searches, supporting the theories developed by Belkin et al. (1982) and Bates (2005). The subject of images is important in searches, especially people, but participants will also search for specific images or resource types or an image with a particular orientation (landscape or portrait). Participants sometimes feel frustrated by the amount of files they have to sift through to find what they need. However, they prefer using 'General Search' (the 'best match' information retrieval model) to 'Advanced Search' (the 'exact match' information retrieval model). Using the latter could perhaps help combat information overload by retrieving a more refined set of results. The popularity of Lightboxes, Events and emailing files suggest that information-seeking is often a collaborative activity. And Zipf's 'principle of least effort' is shown by the fact that participants will recycle certain images rather than looking for new ones. How usable is the Digital Asset Management System? Participants generally find the software user-friendly. However, P5 and P6 have both experienced technical errors, one of which was solved by implementing a software update. Also, P6 observed that the display of the controlled vocabulary for search could be improved. More specific insights into user experience could be gained by usability testing. How is the collection managed? Various aspects of digital asset management were discussed, including the lifecycle of resources, preservation issues and training and promotion of the DAM. The lifecycle of resources could be investigated in more detail, particularly whether there are any processes in place for evaluating, archiving and disposing of files. As for managing the folder system and controlled vocabularies, out of the participants in the case study, P5 seems to have done most of this work but on an ad hoc basis. She left the charity in August 2015 so the question of who is now doing this work could be investigated. The fact that 14% of the search queries

28

analysed were not in the controlled vocabularies when they could have been shows the need to regularly monitor the search logs to keep the controlled vocabularies up-to-date. Similarly, changes to organisational language and policy should be reflected in the controlled vocabularies. Uncertainty about how to index the subject of resources to best meet user needs and the question of whether to index the mood of images could be explored further. And the problem of images lacking subject metadata also requires further research. Finally, very little was gleaned about training or promotion of the DAM, despite the fact that some users are apparently struggling to use the metadata schema and some staff do not know about the DAM or have access to it. Overall, digital asset management is the theme that stands out as needing more focused exploration, particularly as it has significant implications for all the other aspects of the case. In the 'focused exploration' phase of the case study, the researcher would therefore like to look in more detail at how the collection is managed, including how files are managed at every stage of their lifecycle and how the folder system and controlled vocabularies are managed. Training and promotion of the DAM could also be investigated further. The question of whether metadata is being consistently, accurately and fully applied has not been fully addressed yet and the clarity of the Folder structure needs to be ascertained. Finally, the researcher would also like to explore with participants how to best index assets to meet user needs, how to index large numbers of files efficiently and how to add subject metadata to assets lacking it. These aims can be summed up by the following research questions:

• What are the phases in the lifecycle of files and how are they managed? • Is the Folder structure clearer? • How is the folder system managed? • Is metadata consistently, accurately and fully applied? • How are the controlled vocabularies kept up-to-date and relevant to user needs? • What measures are in place for training and promotion of the DAM? • How can users best index assets to meet user needs? • How can large numbers of files be indexed efficiently? • How can subject metadata be added to assets lacking it?

The 'focused exploration' phase will therefore have two broad aims: to explore digital asset management and indexing policy. The aim will also be to consider best practice based on the literature review and how this might be applied in the context of the case study. The outcome of this research could be a set of recommendations used to guide practice. 5. Focused Exploration Phase 5.i Methods Firstly, to better understand how images are currently indexed, a sample of metadata applied by P3 and P7 was analysed. These are the only participants in the case study to have applied metadata to files, except for P5, who left before the researcher was able to ask permission to analyse the metadata she had applied to files. The audit logs show that there are other users who upload and therefore apply metadata to files. However, it was not possible to analyse this metadata as they had not agreed to take part in the case study. The metadata applied by P3 and P7 was analysed in batches (i.e. groups of files uploaded at the same time). According to the audit log, P3 has only uploaded one batch of files and P7 has uploaded four batches. All these batches were analysed. A group interview was chosen to research digital asset management and indexing policy further. This is because the researcher wanted the participants to be able to discuss ideas with each other and collaborate on finding solutions to some of the problems presented to them. Participants were sent the results from the 'orientation and overview' phase interviews and log analysis to help them understand the context for the interview. The questions (attached in Appendix G, p.60) were designed to cover indexing policy and digital asset management but participants were encouraged to ask questions and raise points themselves.

29

All staff who participated in the 'orientation and overview' phase were invited to take part, except one of the Marketing Coordinators (P5) who had left the charity in early August. Other staff at the head office - where the interview would be held - were also invited to take part but no one volunteered. The researcher had a dual role of leading the interview and participating. The participants in the group interview were as follows (the codes are the same as for the 'orientation and overview' phase):

• P1: Marketing Coordinator • P2: Publications Manager • P3: Website and Social Media Editor • P4: Digital Production Coordinator • P6: Researcher

All participants in the group interview regularly use the DAM for their work. Thus it was expected that they would have some knowledge of how assets are managed and an opinion on best practice in this area. Participants in the interview are all 'admin users' of the DAM (i.e. able to edit the metadata and metadata schema), except P2. As well as the group interview, the researcher interviewed an 'admin user' of the DAM by telephone. The participant's job title is Communications Manager, Scotland, and she is referred to as P7. This user was interviewed because she is based at one of the charity's regional offices and is one of the few users to upload and apply metadata to files. As previously noted, the interviews during the 'orientation and overview' phase were only with users at the charity's head office. The aim of this interview was therefore partly to give a more representative picture of the case study by giving the perspective of a user at one of the regional offices. It was also to gain an insight into the user's experience of uploading and applying metadata to files, as only one participant (P5) in the 'orientation and overview' phase was able to talk about this. Finally, the aim was to explore indexing policy and digital asset management as these are the areas that had been selected for the 'focused exploration' phase of the case study. Time constraints and the fact that this participant had not taken part in the 'orientation and overview' phase interviews meant that not all the same questions as were asked in the group interview were asked in this interview (the questions are attached in Appendix H, p.61). Finally, as part of the research into subject-indexing policy, participants were invited to take part in an image-tagging exercise. In the 'orientation and overview' phase interviews, P5 said she was not sure how specifically to describe the subject of images. And, when asked if she knew what kind of subject keywords to add to match user needs, P7 replied in her interview, 'sometimes I'm unsure of this - sometimes it's guesswork and possibly quite hit-and-miss'. As Lancaster wrote, subject-indexing policy should include how exhaustive and specific the indexing should be. The image-tagging exercise shows how exhaustive and specific participants think the indexing should be for seven different images in the collection. As part of this exercise, P1, P3 and P7 applied subject metadata to seven images chosen by the researcher. Their instructions were to try and think what users would search for and what metadata would be helpful to them when selecting images. The forms they filled in are in Appendix I, p.62. The researcher used a matrix combining elements of the Panofsky-Shatford matrix and Hollinck et al.'s (2004) classification of user image descriptions to classify participants' descriptions. This helped her to analyse whether their descriptions were of generic, specific or abstract concepts and the exhaustivity of indexing. The matrices are attached in Appendix I, pp.64-5. 5.ii Results Metadata analysis The researcher analysed five batches of files uploaded by P3 and P7. The sample included:

1. Batch One (B1). 437 photos, uploaded by P3, of one of the charity's largest annual fundraising events - a night-walk across London that includes visits to cultural sites.

30

2. Batch Two (B2). 43 photos, uploaded by P7, of guests at an exhibition launch. 3. Batch Three (B3). 9 photos, uploaded by P7, of the artwork at the exhibition that is the subject of

Batch Two. 4. Batch Four (B4). 3 photos, uploaded by P7, of VIPs at a fundraising event. 5. Batch Five (B5). 1 Adobe Photoshop file uploaded by P7 - an illustration for a fundraising event.

The first field in the metadata form is the Caption field, which is optional and is populated with free text. It is supposed to be used for giving a short description of the nature and/or subject of the resource and for providing information that cannot be put in the other fields. For example, in B3, the Caption field contains detailed descriptions of artworks, such as their dimensions. This kind of information would not be suitable for the Keywords field as it is too specific. B1, B2, B4 and B5 all have exactly the same caption in each batch, which suggests this metadata was applied in bulk, probably because there was not time to do separate captions for each file. All the captions repeat information in other fields. For example, the caption for B1 repeats the value in the Event field and the caption for B2 repeats the value in the Copyright Notice field. This repetition is not necessarily a problem but begs the question why users feel they have to repeat information from other fields in the Caption field. The next field in the metadata form is the Keywords field, which is compulsory and is populated with a controlled vocabulary. It is for providing subject metadata. The number of keywords applied in this field range from two to five terms across all of the batches. Despite each batch representing a range of subjects, three of them (B1, B2 and B3) all had the same terms in the Keywords field, which suggests that the metadata was applied in bulk. This in turn suggests that users do not have time to apply subject metadata to such large numbers of assets (B1 consists of 437 files). Further research is needed to investigate why batches lack detailed subject metadata and whether this is always necessarily a problem. Furthermore, the keywords added by P7 are all top-level terms in the controlled vocabulary. This is perhaps because only the most general terms apply to all files in a batch. Also, the photos of artwork (B3) and the illustration (B5) do not lend themselves to tagging with subject keywords. However, the photos of guests at the exhibition launch and VIPs at the fundraising event venue could have been described in more detail. This would have depended on the relevant terms being added to the controlled vocabulary though. And, in the case of four out of the five batches, the necessary terms were not added to the controlled vocabularies. Further research is needed into how controlled vocabularies can be kept up to date. The Centre field is compulsory and is populated with a controlled vocabulary. It is for indicating which centre the file relates to (if any). It is correctly filled in for all of the files. The Event field is compulsory and is populated with a controlled vocabulary. It is for indicating the name of the event associated with files. This is correctly filled in for three of the batches (B1, B2 and B3). However, B2 and B3 are tagged with a top-level term when they could have been tagged with a more specific term - the name of the exhibition. This term should have been added to the controlled vocabulary but was not. Again, further research is needed as to how the controlled vocabularies can be up to date. B4 and B5 (files relating to a fundraising event) were not assigned the correct term, possibly because the classification of events in the Events field controlled vocabulary is not clear. The Resource Type field is compulsory and is populated with a controlled vocabulary. It is for indicating what type of resource the file is, for example whether it is a photo or an architectural resource or a logo. All files were given the correct metadata in this field. However, the photos in B1 were described as 'artwork' when they are in fact photos of artwork. Therefore the keyword, 'photo', should also have been applied. Perhaps the researcher should have made it clearer in the training she gave that when describing photos of artwork, the two keywords 'photo' and 'artwork' need to be applied in the Resource Type field. The Copyright Notice field is optional and is populated using free text. It is for indicating who owns the rights to the file and how the file can be used. B2, B3 and B4 contain copyright information, although in B2 and B4 it is also in the Special Instructions field. B5 does not contain any copyright information. In B1, the Copyright Notice field has been filled in for 121 of the assets. However, it is likely that this was

31

automatically pulled from the corresponding Exif metadata field as the values match exactly. In the case of B3, it is not clear who the copyright belongs to - the artist or the photographer. And the specific terms of the licences are not clear from any of the files. Perhaps users do not see copyright information as essential because the field is not compulsory to fill in. Further investigation is needed as to why there is a lack of copyright information. The Special Instructions field is optional and is populated using free text. It is for adding instructions (other than copyright instructions) for how the file can be used. For example, photos of HRH The Duchess of Cornwall sometimes require the user to ask her permission before using them. Only B2 and B4 had metadata in the Special Instructions field. This was copyright information, which should have gone in the Copyright Notice field. The name of the field, 'Special Instructions', is perhaps misleading. The Date Created field is filled out automatically. It is not clear where the DAM gets this metadata from - possibly from the Exif metadata that is already attached to files when they are uploaded. In one case there is a discrepancy between the date that the photos were taken and the date in the Date Created field. B2 consists of photos of an exhibition launch, which took place on 31 July 2015. However, according to the Date Created field, the photos were taken on 7 August 2015. The Exif metadata shows that the photo was taken on 31 July 2015 and modified on 7 August 2015. So the DAM is showing the date when the edited version was created in the Date Created field. This shows how misleading 'Date Created' is as a title for the field. Furthermore, P7 put the wrong date of the event in the Caption field (30 July 2015), which shows how important it is to have accurate metadata in the Date Created field. It is unfortunate that the researcher did not have permission to analyse the metadata applied by P5 because P5 was in charge of uploading files to the DAM while she was working at the charity. P3 has less time than P5 would have had to upload and apply metadata to files. Nevertheless, uploading and applying metadata to files has always been part of P7's job so the sample at least includes one user who has had dedicated time to upload and apply metadata to files. Overall, it is clear that metadata is not being applied as fully and accurately as it could be. One possible reason for this is the size of the batches (ranging between 1 and 437 files) and participants lacking time to apply metadata separately to each file within a batch. In addition, P7 did not receive training in how to use the metadata schema, which might be why she has filled in some fields incorrectly or only superficially. Another reason for the superficial metadata might be the lack of relevant terms in the controlled vocabularies, which begs the question how these can be kept up to date. Finally, the metadata form might be unclear, for example the names of the fields might be misleading or the terms in the controlled vocabularies unclear. This is backed up by the results of the 'orientation and overview' phase, which found that the purpose of two of the Primary Metadata fields - Resource Type and Special Instructions - was unclear to some participants. Interviews Photos are acquired in different ways. Sometimes the charity is sent pictures, sometimes they take pictures with their own cameras, but they mainly commission them. The Marketing department commissions photos of the charity's centres and programme of support. Together with the PR and Events departments, they also commission photos of events. In the member-checking phase, P7 added that the Marketing team also organise press calls and media photo opportunities. The charity draws up quite specific briefs for what they want in photos, including whether they should be portrait or landscape. A member of staff sometimes accompanies the photographer to guide the shoot. The fact that different departments acquire photos suggests different people are likely to know different parts of the collection better. This highlights the importance of the DAM for bringing together the assets and making them discoverable to all users. Participants said that in some cases copyright belongs to the photographer. They also said that some photographers have to be credited whereas others do not. The different copyright agreements show the need to include the name of the copyright holder and any terms and conditions of licences in the metadata. There

32

should also arguably be a separate field for supplying the name of the photographer as, according to P7, even if the charity owns the copyright, the photographer should be credited. And if the images were ever transferred to an external archive, the name of the photographer would be essential (Dickinson, 2015). It would also perhaps be advisable to make this field and the field for supplying copyright information compulsory as currently not all files that are uploaded are given copyright metadata, which suggests some users uploading files do not see it as essential. Participants described the selection policy for photos commissioned by the Marketing department. Firstly, the photographer uploads low-resolution files to his/her website. There are usually hundreds to choose from. Then the Marketing and Design teams make a shortlist. After the selection has been made, the photographer sends edited, high-resolution versions. The Marketing department usually spends more time selecting their photos than the Event department spends selecting theirs because photos of events are used less. P7 said that she does not upload every photo, just the 'cream of the crop' and the ones that could be useful later. The fact that participants have to select from hundreds of photographs helps explain why the batches they upload are sometimes very large (e.g. 437 photos of a fundraising event). This in turn helps to explain why the subject indexing is sometimes superficial, with the same keywords applied to all photos in a batch. The researcher asked P7 why some events are represented more than others. For example, there are not many photos of 'Centre events' (apart from centre openings) or of 'National events' other than one particular fundraising event. P7 was unsure why some events are represented more than others but in the member-checking phase, she suggested that the reason why there is a lack of 'Centre event' photos might be because it is believed to be a Centre Fundraiser responsibility to upload 'Centre event' photography. The lack of 'Centre event' photos suggests they might not be aware that they have this responsibility or that they are withholding photos from the DAM for other reasons. Alternatively, there might just be less photos of certain events than others. The reason why there are more images of some events than others could be investigated further as part of research into the selection policy for the DAM. There was some discussion of how selective the charity should be. P6 suggested that the photographer could be asked to help select which photos would be best. However, P4 commented that having a larger selection can be useful for design choices. For example, having both a landscape and portrait version of photographs and different versions of a similar image can help when trying to find images for different locations (Twitter, Facebook, the website etc.) This shows there are some advantages in not being too selective about which files to upload, especially as the selection process is itself time-consuming. Indeed, the charity has not yet exceeded its storage limit due to Third Light periodically adding free extra storage capacity so the question of how to accommodate new files is not yet a problem. However, the adding of free extra storage capacity is at Third Light's discretion so cannot be counted on to accommodate the charity's growing collection. And time has to be spent indexing new files, which could make it worthwhile to whittle down the selection of files for upload. All in all it is perhaps worth researching the possibility of a selection policy in case of future lack of storage space and/or excessive time being spent indexing files, even though this is not urgent given the current availability of storage space. Participants said that they find the Folder system, which is based on a classification of the subjects of images, clear. This classification is supposed to reflect how the charity already classifies concepts. For example, the Events team at the charity helped the researcher come up with the following classification for Folders of events photos: 'Centre events', 'National events', 'Special events' and 'Corporate events'. By reflecting the existing classification, the Folder structure is supposed to be as intuitive as possible. However, files are still sometimes assigned to the wrong Folders. When analysing the metadata applied by P3 and P7, the researcher noticed that three out of the five batches had been assigned to the wrong Folders. One batch (photos of artwork) was put in a 'Special Events' Folder when it should have been put it in the 'Artwork' folder. And two batches (files relating to a fundraising event) were put in a 'Centre events' Folder when they should have been put in a 'National events' Folder. Ultimately files can still be retrieved by metadata so it is not necessarily a problem if they are put in the wrong Folders. However, it does undermine the logic of the Folder structure if the classification of files is not consistent. The fact that files are being put in the wrong

33

Folders reinforces Jisc's (2015) point that folder systems often do not make sense to those who did not create them. It also shows why metadata can be a better way than folders of organising large digital collections. P3 and P7 said how useful they find Collections. However, although participants use Folders and Collections, they do not seem to fully understand their purpose. (A description of the different types of folder is included in Appendix D, p.54.) Confusion over the purpose of Collections is evident from the fact that users have created Folders that should be Collections. For example, a Folder entitled 'Best Lanarkshire' has been created within the 'Centres > Lanarkshire' Folder. This contains a curated selection of the best photos of the Lanarkshire centre. It is unclear whether it contains copies of files in the 'Centres > Lanarkshire' Folder but for the avoidance of doubt it should have been a Collection. For Collections allow one to create collections of files that already exist in Folders without having to create copies of files or move them from their original locations in Folders, thereby saving on storage space. When asked in the group interview whether they were aware of Collections and how they are used, only P4 said that she knew what Collections are for, explaining that Collections do not remove the images from where they 'live'. None of the participants were sure what Smart Folders are for. Overall, the confusion over the folder system shows that training needs to do more to explain the different purposes of Folders, Collections and Smart Folders. At present, any user can add, edit or remove the following types of folder: Folders, Collections, Smart Folders and Lightboxes. However, 'admin users' can create both top-level and sub-level folders whereas 'normal users' can only create sub-level folders. The fact that every user can manage folders emphasises the importance of training. The researcher can see that some Folders are being added to the wrong places in the Folder hierarchy. For example, a Folder containing photos of the chief executive has been added to the top level of the Folder structure when the files should just have been added to a sub-folder entitled 'Staff/founders/patrons'. According to the group interview, no one is in charge of overseeing the creation of new folders. However, P7 said she thinks there needs to be someone overseeing the folder system. This person could look out for new Folders that should be Collections or Folders that have been put in the wrong place in the Folder hierarchy or files that have been put in the wrong Folders. The idea that browsing folders is an inefficient way of searching was challenged by P4, who pointed out that browsing folders serves a purpose when you don't know exactly what you are looking for. This shows that folders should not be totally superseded by metadata as a way of organising assets and making them retrievable. The indexing of files is currently done at the upload stage, although the metadata of existing files can also be edited. The logs show that, out of all the users of the DAM, P5 has uploaded the most files, followed by P3 and P7. They also show that only a few people do most of the uploading. In the group interview, participants confirmed that since P5 left there has been a lack of 'resourcing' for uploading files to the DAM, i.e. a lack of staff with dedicated time for doing this. They then discussed the idea of users who acquire photos also uploading them, as opposed to having a few users who upload on behalf of others. As P1 pointed out, this would make sense given that the person who acquires the photos probably knows more about how to describe them. However, it would mean that indexing is the responsibility of many different people as opposed to a few. This would make it harder to guarantee the quality of indexing. A solution to this could be to have an 'expert' user checking the accuracy and comprehensiveness of metadata (the DAM allows all new uploads, including their metadata, to have to be approved by an 'admin user'). Participants thought this would be a good idea but that it would depend on resourcing. So it seems that either way, applying metadata has to have some resourcing behind it - if centralised, then it requires one person whose job it is to upload and index files, if decentralised, then it requires one person whose job it is to check the metadata applied by other users. Similarly, keeping the controlled vocabularies up-to-date and reflecting user language also requires resourcing. In the interviews, the researcher suggested having an expert user, or 'steward' of the controlled vocabularies, whose job it would be to keep the controlled vocabularies up-to-date and reflecting user language. Participants agreed with this idea and P4 suggested that responsibility for the controlled vocabularies should be written into someone's job description. One problem with the idea of a steward

34

though is that this person might not have the necessary information to be able to update the controlled vocabularies before the files are uploaded. In this case, it would also be necessary for users to be able to add new terms to the controlled vocabularies themselves. And this would depend on users feeling confident enough to edit the controlled vocabularies. Perhaps the best solution would be to have a steward who updates the controlled vocabularies as soon as new developments require the controlled vocabularies to be updated. Then, if for some reason the controlled vocabularies still lacked terms, users could add any missing terms themselves and the steward would check them to make sure they had been properly added. The researcher asked whether the mood of photos should be indexed. Two of the participants (P1 and P2) said that they had searched for photos with a particular mood. P2 had searched for images conveying 'joy' (as part of a campaign called 'joy of living'). P1 had searched for images conveying 'hope'. After some initial uncertainty, P3 thought that tagging photos as 'uplifting' and 'quiet' could also be useful as these are moods that they are sometimes asked for photos to convey. As regards indexing visual information, participants did not think it necessary to describe the colour and shapes of images. However, P3 suggested that some of the abstract images could include colour keywords. P1 also pointed out that some of the centres are linked to specific colours. P3 replied that she already knew this was the case so wouldn't need keywords to help her find them. However, this does not take into account the fact that new users would not have this knowledge. So, whilst there was general agreement not to index the shapes of images, the indexing of colours was not ruled out and could in fact help highlight centre colour schemes to new users. The problem of indexing large numbers of files was discussed. P3 said that it is not realistic for the charity to index hundreds of photos at the moment. P6 suggested that some photos could be given priority when it came to indexing. However, P3 pointed out that photos that are used less, such as the events photos, can still contain useful information. So even if hundreds are uploaded it could still be worth sifting through them to find the most useful things to index. P7 suggested indexing files in batches. For example, you could apply metadata in bulk to 50 images and then apply it in bulk to another 50 images etc. She also said that she only uploads images sporadically. For example, after a centre opening there will be lots of things to upload but then there could be a couple of months with nothing to upload. This means that in theory there is at least time to index one upload before another one is added. When asked about the problem of old photos lacking subject metadata, P1 suggested that users who know the old photos well enough not to need to use the search engine could look for them on behalf of other users. No other suggestions were made for how to deal with this problem. P1's suggestion is interesting in that it flags the role of 'mediators', i.e. those who use the DAM on behalf of other users. P1 and P7 have both said in their interviews that they sometimes search on behalf of other users. The problem with using this as a solution to the problem of old photos lacking subject metadata is that when those mediators leave the charity, the old photos will not be retrievable any more. For, the mediators would take with them the knowledge that allowed them to find old photos without using the search engine. In the 'orientation and overview' phase, participants said they thought it would be worth adding subject metadata to old photos. However, they do not seem to know how to bring this about, suggesting that either it is not realistic for the charity to take on this work or that participants do not consider it a priority for the moment. Participants considered different options for archiving or disposing of assets that have reached the end of their active use. Participants in the group interview said they would buy more storage space to accommodate the collection 'if it came to it'. However, P7 said she would be more inclined to 'thin out what's on there', especially the images that are very similar. P4 seemed wary of the idea of weeding the collection (i.e. disposing of parts of it), pointing out that it is useful to have images as a record. The researcher suggested depositing parts of the collection in an external archive such as the RIBA archive, which would mean they were still accessible to the charity but not taking up room in the DAM. However, participants in the group interview seemed uncertain about this idea. P7 said that as long as the charity would still have access to them, she thinks depositing the images in an external archive is a good idea. But when the researcher pointed out that the copyright might eventually have to be assigned to the external archive, P7 pointed out

35

that architectural photographers can be precious about copyright. Overall, it seems that buying more storage space to accommodate the collection is the least controversial option for dealing with the increasing number of files. However, the fact that the storage capacity has not yet been exceeded (partly due to Third Light adding free extra storage, which cannot be guaranteed in the future) and is currently (as of 27/12/15) only 41.54% full means this is not yet an urgent problem. P1 said she gives training in how to use the DAM over the phone and points users to the training materials created by the researcher (a set of PowerPoint presentations), which are stored in a top-level Folder of the DAM. However, P7 said that in an ideal world there would be face-to-face training sessions. In the group interview, the researcher suggested the idea of a designated 'expert' who is happy to answer questions about the DAM and run training sessions. Participants thought this would be a good idea. They also thought it would be a good idea for the 'expert' user to receive training from Third Light, which could then be filtered down to other users where necessary. The need for training is highlighted by the confusion over the purpose of different types of folder and by the fact that P7, who did not take part in the training organised by the researcher in November 2014, said she initially found the metadata form 'very confusing'. The DAM is promoted to staff during their inductions. As part of their inductions, P1 sets up an account for new staff and gives them training in how to use it. There is also a link to the DAM on the Intranet. P6 said that the DAM needs someone who can champion it. And P7 suggested that internal communications could be used to promote the DAM, such as staff bulletins, email newsletters and pot-luck meetings (where staff meet randomly with other staff members to talk about what they are working on). Image-tagging exercise The results of the image-tagging exercise (attached in Appendix I, p.62) show how specifically and exhaustively participants recommend describing the subject of images in the Caption and Keywords fields. The researcher analysed the terms they applied in the Keywords fields using a matrix for classifying descriptors according to specificity and exhaustivity. This analysis found that participants recommend being as specific as possible in describing people (except for centre visitors) and things such as names of centres, artwork and support activities. They also recommend giving the specific names of fundraising events, as well as the specific time and place of events. The specific names of artists, architects and landscape designers whose work is represented in the images should also be indexed. Objects were described fairly exhaustively, including cups of tea, flowers, trees, a stilt supporting a treehouse-inspired centre, a balcony and grass. P3 and P7 both described whether certain of the images represented day or night and P3 also described what seasons were shown in some of the images. Almost all participants recommended describing Images 1 to 4 (images of centres or centre visitors) as either 'interior' or 'exterior'. And P3 and P7 recommended including the charity's classification of support activities ('practical support', 'emotional support' and 'social support') in their descriptions of Images 2 and 4. 5.iii Discussion The 'focused exploration' phase has allowed the researcher to explore the themes of digital asset management and indexing policy. It is now possible to answer the questions outlined at the end of the 'orientation and overview phase' (p.28) and to consider any other issues that have come to light. A set of recommendations based on the results of the research are included on page 37. What are the phases in the lifecycle of files and how are they managed? At the beginning of their lifecycle, files go through the following phases:

• Acquisition - Various departments commission digital photos by professional photographers. This involves creating a specific brief and sometimes guiding the photographers on their shoot. Acquisition can also involve taking receipt of photos sent to the charity or taken by members of staff,

36

photos taken by the media, architectural design files and a small number of graphic design files, videos and other documents used by the charity in their work. Agreements are made about copyright and any terms and conditions of the files' use.

• Selection - P7 said she tries just to pick 'the cream of the crop' and what will be useful later. When selecting photos they have commissioned, the Marketing and Design teams choose from files on the photographer's website, which can sometimes be in the hundreds. There does not seem to be a clear policy as to what is added to the DAM and what is not as P7 was unsure why some of the charity's events are represented more than others. The researcher did not ascertain how staff select other files to add to the library, for example architectural design files.

• Uploading to the DAM (including adding metadata and assigning files to Folders) - When uploading files to the DAM, users apply metadata to them and assign them to Folders. Uploading, indexing and assigning new files to Folders are still the responsibility of just a few users. And since P5 left in summer 2015 there has been a lack of staff with dedicated time to do this, which is reflected in the quality of the metadata that the researcher analysed.

Once files are uploaded to the DAM, files are:

• Accessed - Currently, all users can access all folders but 'admin users' can limit certain users' access to certain folders if necessary.

• Used - 'Admin users' can grant or withhold permission to, for example, download files, share them via Lightboxes or email them from the DAM.

At the end of their lifecycle, files are:

• Evaluated - Users evaluate whether files are worth keeping or not and what to do in the case of the latter. It is not clear what criteria are used to decide whether files are worth keeping on the DAM, except that in the 'orientation and overview' phase two users said they delete duplicates and in the 'focused exploration' phase P7 said she would 'thin out' images that are very similar. Files that have reached the end of their active use are kept as they are useful as a record of the charity's activities. Currently files that are not worth keeping are disposed of by deleting them.

• Disposed of - The 'orientation and overview' phase established that certain users dispose of files by sending them to the Recycle Bin (to be temporarily stored before being automatically deleted) or deleting them.

Is the Folder structure clearer? Participants said they find the Folder structure clear but the researcher noticed that some of the files whose metadata she analysed were assigned the wrong Folders. This shows how important it is that files can be retrieved by metadata too. How is the folder system managed? At present, any user can add, edit or remove the following types of folder: Folders, Collections, Smart Folders and Lightboxes. However, 'admin users' can create both top-level and sub-level folders whereas 'normal users' can only create sub-level folders. The 'orientation and overview' phase established that P5 was the main participant to manage folders before she left the charity in August 2015. No one is currently overseeing the folder system. As a result, some Folders are being created that should be Collections and some new Folders are being put in the wrong place in the Folder hierarchy. Is metadata consistently, accurately and fully applied? According to the analysis that the researcher did of five batches of files uploaded by P3 and P7, metadata is not always fully applied, especially in the Keywords and Copyright Notice fields. This might be because participants lack time to apply subject metadata, especially when the batch is large, or because they lack

37

training or because the controlled vocabularies do not contain the necessary terms. The metadata applied is mainly accurate but sometimes the wrong term is chosen in the Event field, perhaps because the classification of events is unclear. The researcher did not have time to analyse the consistency of metadata. How are the controlled vocabularies kept up-to-date and relevant to user needs? According to the 'orientation and overview' phase interviews, P5 updated the controlled vocabularies when uploading if necessary. The researcher got the impression in the 'focused exploration' phase group interview that no one has taken on this work since she left as participants talked about a 'lack of resourcing for the photo library' and the fact that the person who replaces P5 will take on some of her old duties. What measures are in place for training and promotion of the DAM? Training in how to use the DAM is given over the phone. P1 also points users to a set of PowerPoint presentations that the researcher created to train users in November 2014, which are stored in the DAM. The DAM is promoted to staff during their inductions. There is also a link to the DAM on the Intranet. How can users best index assets to meet user needs? Analysis of the search log during the 'orientation and overview' phase gave an idea of what concepts are most searched for (specific people, event concepts, support activities, areas and objects), which can guide subject indexing to some extent. The results of the image-tagging exercise show how specifically and exhaustively participants recommend indexing the subject of images. In addition, the interviews revealed that describing colours associated with centres could be useful, as could describing the mood of pictures, for example 'hope', 'joy', 'uplifting' and 'quiet'. How can large numbers of files be indexed efficiently? Despite the fact that it is not realistic for staff to individually index all new files uploaded, certain images that would be particularly useful should still be individually indexed. When uploading large batches, users could bulk-index files in groups of about 50 images at a time (P7's suggestion) but also sift through each batch to find any assets that it would be useful to index individually (P3's suggestion). How can subject metadata be added to assets lacking it? Participants were unsure how to add subject metadata to assets lacking it. This is perhaps because if there is a lack of resourcing for uploading files, the same is probably true for editing the metadata of existing assets. Other significant issues P4's point that browsing folders serves a purpose when you don't know exactly what you are looking for shows that metadata is not always preferable as a way of organising assets and making them retrievable. When asked if controlled vocabularies help her apply metadata, P7 said that controlled vocabularies are 'a good idea - it makes much more sense to have terms that you select'. The interviews flagged the role of 'mediators' - users who search on behalf of other users. 5.iv Recommendations Managing files

• Acquisition. Currently, the majority of the images acquired are high-resolution Jpegs. As Jpegs are the common format for sharing and using images on the Web it makes sense to use this file format.

38

However, the choice of file format should also take into account long-term compatibility issues (Jisc, 2015) and the fact that higher quality Tiffs are preferable for archival purposes.

• Selection. P4 pointed out that having a larger selection of images to choose from can be useful for design choices. Her comments suggest that users should not discard similar images at the selection phase. It might be worth researching the possibility of a selection policy in case of future lack of storage space and/or excessive time being spent indexing files, but this is not urgent given the current availability of storage space.

• Selection. One aspect of selection policy that could be investigated further is why some events are represented more than others.

• Uploading, indexing and assigning files to Folders. The charity could try distributing the responsibility for uploading files to the staff that acquire and select images to put on the DAM. This would share the workload and might result in better metadata as the staff that acquire images probably have a better knowledge of how to index them. To ensure the quality of metadata though, the charity should require all new files and their metadata to be approved by an 'admin user' before they can be uploaded (this can be done by changing the user permissions on the DAM).

• Evaluation. When deciding whether to dispose of duplicate images, users should bear in mind the point made by P4 that having different versions of similar images can be useful for design choices.

• Evaluation. There is currently enough space to accommodate the charity's growing collection but this cannot be guaranteed in the future. If the charity ever ran out of storage space, it might be worth considering archiving files that are no longer being actively used.

Managing folders

• There should be someone responsible for making sure files are assigned the correct Folder and that any new folders are created correctly (i.e. the correct type of folder is chosen and put in the correct place in the folder structure).

Managing the controlled vocabularies

• The charity should appoint a 'steward' who updates the controlled vocabularies as soon as new developments require the controlled vocabularies to be updated. If for some reason the controlled vocabularies still lacked terms, users could add any missing terms themselves and the steward would check them to make sure they had been properly added. The settings for the 'normal user' group would have to be changed to enable them to edit the controlled vocabularies.

Metadata

• The Copyright Notice field should be compulsory to fill in as it is essential for users to know how they can use files but currently not all files that are uploaded are given copyright metadata, which suggests some users do not see it as essential.

• A new field for supplying the name of the photographer should be added to the Primary Metadata fields as, even if the charity owns the copyright, the photographer should be credited. P7 and the Historic England Archive have both recommended this.

• The results of the image-tagging exercise should be used to guide users in how to index the subject of resources. The colour descriptors in the Keywords field controlled vocabulary should be used to describe the colours of centres. Mood descriptors such as 'joy', 'hope', 'uplifiting' and 'quiet' should be added to the controlled vocabulary for the Keywords field and used to describe images.

• When uploading large batches, users could bulk-index files in groups of about 50 images at a time (P7's suggestion) but also sift through each batch to find any assets that it would be useful to index individually (P3's suggestion).

• If more resourcing becomes available, it might be worth adding subject metadata to assets lacking it. Training

• A staff member who knows the DAM well should be designated as responsible for training other users in how to use it and answering queries about it. Ideally, training sessions should be given face-

39

to-face or at least over the phone. The person who trains other staff should also be able to receive training from Third Light.

• All new users should be trained in how to search the DAM and the purpose of different types of folder, particularly the difference between Collections and Folders. Any users who have responsibility for uploading files should be trained in how to apply metadata and how to classify files within the Folder structure. And, if the charity decides to let the 'normal user' group edit the controlled vocabularies, they should all receive training in how to do this.

Promotion of the collection

• Internal communications (e.g. staff bulletins, email newsletters and pot-luck meetings) should be used to promote the collection to all staff.

6. Conclusion This case study explores how a digital image collection belonging to a charity is organised, managed and used. As well as describing the case, it also reflects on how it relates to the academic and professional literature and how aspects of it might be developed or improved. The case provides insights into information organisation, information behaviour, user experience and digital asset management. And a focused exploration is made of the themes of indexing policy and digital asset management. This focused exploration phase has resulted in a series of recommendations that can be used to guide practice. Overall, there are a few key findings. Firstly, the collection plays a key role in promoting the charity and recording its work. For, as the charity point out, images are particularly effective at conveying the role of design in their work. Secondly, the collection is still relatively small (about 8,000 files) but growing rapidly. As a result, the search engine, supported by good quality metadata, will be increasingly important for the discoverability of files. Thirdly, the case shows how DAM software can support information organisation, information retrieval, information seeking and digital asset management. Fourthly, the case shows the importance of training in how to organise, manage and use the collection, particularly how to fill out the metadata form and use folders. And finally, the problems encountered by the charity often come down to a lack of time and staffing. The charity needs to be able to organise, manage, search and use ever-larger amounts of information but the resources available for doing this are limited. This shows the importance of tools for organising information and making it searchable. In the charity's case, the new metadata schema has made it easier to apply metadata and search assets, particularly thanks to the controlled vocabularies. And a clearer folder structure has made it easier to browse the collection. These tools still need to be maintained but, in the opinion of the the researcher, the effort that goes into maintaining them should be well worth the benefits they bring.

40

7. References

Al-Maskari, A. & Sanderson, M. 2010, A review of factors influencing user satisfaction in information retrieval, Journal of the American Society for Information Science and Technology, 61(5), 859-868.

Bates, M.J. 1989. The design of browsing and berrypicking techniques for the online search interface. Online Review, 13(5), 407-424.

Bawden, D. & Robinson, L. 2012, Introduction to Information Science, London: Facet.

Belkin, N.J., Oddy, R.N., Brooks, H.M. 1982, ASK for Information Retrieval: Part 1. Background and Theory, Journal of Documentation, 38(2), 61-71.

British Library, 2013. Digital Preservation Strategy 2013-2016 [online]. Available: http://www.bl.uk/aboutus/stratpolprog/collectioncare/digitalpreservation/strategy/BL_DigitalPreserv ationStrategy_2013-16-external.pdf [4 Dec 2014]

Broomfield, J. 2009, Digital asset management case study - Museum Victoria, Journal of Digital Asset Management, 5(3), 116-125.

Byström, K. 2005, Information Activities in Work Tasks, in Fisher, K.E., Erdelez, S. and McKechnie, L.E.F. (eds.), Theories of information behavior, Medford NJ: Information Today, 174-178. Case, D.O. 2012, Looking for information: a survey of research on information seeking, needs, and behavior, 3rd edn. Bingley: Emerald Chen, H., Wigand, R.T., Nilan, M. 2000, Exploring Web users' optimal flow experiences, Information Technology & People, 13(4), 263-281. Dervin, B. 2005, What Methodology Does To Theory: Sense-Making Methodology as Exemplar, in Fisher, K.E., Erdelez, S. and McKechnie, L.E.F. (eds.), Theories of information behavior, Medford NJ: Information Today, 25-30. Dickinson, R., [Historic England Archive] Personal communication, 13/11/2015 Digicamhistory, 2015, 1999+ [online] Digicamhistory. Available: http://www.digicamhistory.com/1999+.html [30/10/2015] Dragich, L. 2012, Prioritizing Gartner's APM Model [online] APM Digest. Available: http://apmdigest.com/prioritizing-gartners-apm-model [25/10/2015] Dubie, D. 2006, Performance Management from the Client's Point-of-View [online] Network World. Available: http://www.networkworld.com/article/2300639/data-center/performance-management- from-the-client-s-point-of-view.html [25/10/2015]

Enser, P. 2008, The evolution of visual information retrieval, Journal of Information Science, 34(4), 531-546.

Fidel, R. and Pejtersen, A.M. 2005, Cognitive Work Analysis, in Fisher, K.E., Erdelez, S. and McKechnie, L.E.F. (eds.), Theories of information behavior, Medford NJ: Information Today, 88-93. Google, 2015. Algorithms [online]. Available: https://www.google.com/intl/en_us/insidesearch/howsearchworks/algorithms.html [25/8/2015]

41

Hansen, P. 2005, Work Task Information-Seeking and Retrieval Processes, in Fisher, K.E., Erdelez, S. and McKechnie, L.E.F. (eds.), Theories of information behavior, Medford NJ: Information Today, 392- 396. Haynes, D. 2004, Metadata for information management and retrieval, London: Facet.

Hider, P. 2008, How Much are Technical Services Worth?: Using the Contingent Valuation Method to Estimate the Added Value of Collection Management and Access, Library Resources & Technical Services, 52(4), 254-262.

Hider, P. 2012, Information resource description: creating and managing metadata, London: Facet.

Hjørland, B. & Nissen Pedersen, K. 2005, A substantive theory of classification for information retrieval, Journal of Documentation, 61(5), 582-597.

Hollink, L., Schreiber, A.T., Wielinga, B.J. & Worring, M. 2004, Classification of user image descriptions, International Journal of Human - Computer Studies, 61(5), 601-626.

Jisc, 2015. Jisc Digital Media: Managing a Digital Media Collection [online] Jisc. Available: http://www.jiscdigitalmedia.ac.uk/guide/managing-a-digital-media-collection [10/9/2015]

Jörgensen, C., Jaimes, A., Benitez, A.B. & Chang, S. 2001, A conceptual framework and empirical research for classifying visual descriptors, Journal of the American Society for Information Science and Technology, 52(11), 938-947.

Keathley, E.F. 2014, Digital Asset Management [ebook], Apress, Berkeley: CA. Available through: City University London Library website, http://www.city.ac.uk/library [30/8/15]

Komorowski, M. 2014, A history of storage cost (update) [online] mkomo. Available: http://www.mkomo.com/cost-per-gigabyte-update [20/9/2015] Kremerskothen, K. 2011, 6,000,000,000 [online] Flickr blog. Available: http://blog.flickr.net/en/2011/08/04/6000000000/ [20/9/2015] Kuhlthau, C.C. 2005, Kuhlthau’s information search process, in Fisher, K.E., Erdelez, S. and McKechnie, L.E.F. (eds.), Theories of information behavior, Medford NJ: Information Today, 230-34. Lancaster, F.W. 1998, Indexing and abstracting in theory and practice, Champaign: University of Illinois. Leckie, G.J., Pettigrew, K.E. & Sylvain, C. 1996, Modeling the Information Seeking of Professionals: A General Model Derived from Research on Engineers, Health Care Professionals, and Lawyers, The Library Quarterly: Information, Community, Policy, 66(2), 161-193. Available: http://www.jstor.org/stable/4309109 [23/12/15] Leith, I., [Historic England Archive] Personal communication, 04/09/2015

Levinson, D.A. & Schlatter, T., 2013, Visual usability: principles and practices for designing digital applications [ebook], Amsterdam: Morgan Kaufmann. Available through: City University London Library website, http://www.city.ac.uk/library [30/8/15]

Lippell, H. 2015. Building a Corporate Taxonomy, in: Schopflin, K., ed. 2015, A handbook for corporate information professionals, London: Facet, 57-76.

42

MacFarlane, 2013. Lecture 04: Information Retrieval, INM348 Digital Information Technologies and Architectures [online via internal VLE], City University London. Available at: http://moodle.city.ac.uk [1/11/2015]

Matusiak, K.K. 2006, Towards user-centered indexing in digital image collections, OCLC Systems & Services: International digital library perspectives, 22(4), 283-298.

Morville, P. and Rosenfeld, L. 2006, Information Architecture for the World Wide Web, 3rd ed. [e-book] Farnham/Beijing: O'Reilly. Available through: City University London Library website http://www.city.ac.uk/library [30/8/15] Oxford University Press, 1994. Pocket Oxford Latin Dictionary, 2nd edition. Oxford: Oxford University Press. Oxford University Press, 2015. Oxford English Dictionary [online] Oxford University Press. Available: http://www.oed.com [3/1/2016] Pickard, A.J. 2007. Research Methods in Information. London: Facet.

Russell-Rose, T. & Tate, T. 2013, Designing the search experience: the information architecture of discovery, Amsterdam: Morgan Kaufmann.

Sarwan, N. 2014, Review of available open source DAM software [online] Open Source Digital Asset Management. Available: http://www.opensourcedigitalassetmanagement.org [12/9/15] Savolainen, R. 2009, Information use and information processing, Journal of Documentation, 65(2), 187- 207.

Sarkanen, A. and Stoddard, K. 2015, Training end-users in the workplace, in: Schopflin, K., ed. 2015, A handbook for corporate information professionals, London: Facet, 159-178.

Schopflin, K. 2015, The history and profile of the corporate information service, in: Schopflin, K., ed. 2015, A handbook for corporate information professionals, London: Facet, 1-12.

Shah, S. 2014, Collaborative Information Seeking, Journal of the Association for Information Science and Technology, 65(2), pp. 215-236.

Shatford, S. 1986, Analyzing the subject of a picture: a theoretical approach. Cataloguing and Classification Quarterly, 6(3), pp. 39-62. TechTarget, 2015. UX (user experience) definition [online]. TechTarget. Available: http://searchcio.techtarget.com/definition/UX-user-experience [2/9/2015] Terras, M., 2008, Digital Images for the Information Professional, Aldershot: Ashgate. Tesco, 2015, Kodak Pix Pro FZ151 Digital Camera, Red, 16MP, 15x Optical Zoom 3" LCD Screen [online] Tesco. Available: http://www.tesco.com/direct/kodak-pix-pro-fz151-digital-camera-red-16mp-15x- optical-zoom-3-lcd-screen/487-5990.prd?skuId=487-5990&pageLevel=sku&sc_cmp=ppc_sh-_-sh-_- tesco-_-487-5990&gclid=CNHJ8d-86sgCFaoEwwodT90Hzw&gclsrc=aw.ds [30/10/2015] Third Light, 2015 (a), MAM, DAM, Image or Media Library? What should you look for? [online]. Third Light. Available: https://www.thirdlight.com/articles/mam-dam-image/?gclid=CMGi2Orho8kCFU- 6Gwodb04JmA [22/11/2015]

43

Third Light, 2015 (b). Products [online]. Third Light. Available: https://www.thirdlight.com/products/ [22/12/15] Third Light, 2015 (c), Metadata [online]. Third Light. Available: https://www.thirdlight.com/docs/display/ims6/Metadata [15/5/15] Third Light, 2015 (d). Duplicate Detection Criteria [online]. Third Light. Available (to clients): https://www.thirdlight.com/docs/x/NoEO [26/8/2015] Third Light, 2015 (e). Digital Asset Management cloud hosting: Hosting Your Third Light Intelligent Media Server [online]. Third Light. Available: https://www.thirdlight.com/products/standard-edition (click on the hyperlink 'hosted for you') [13/9/2015] Third Light Support, 2015. IMS version. [email] Message to I. Ramsden. Date sent: Tuesday 22 December 2015: 14.33.

44

APPENDIX A Account of researcher's internship at the charity In February 2014, an internship was advertised on the job board of the Library and Information Sciences Scheme section in Moodle. Internship advert 21/2/2014: The charity I used to work for... is looking for someone who can help them build a digital library for all of their image, video and design files. They currently use ThirdLight as a photo library, and they have decided to expand its use to include all of the publication and video files. The project will require experience with a digital library system (ThirdLight or other), and the work will be focused on revising the current metadata structure, organising the filing system and improving the search capabilities of the site. Working days and times are flexible, between one and three days a week until the project is complete. As they are a charity they are unable to provide monetary compensation for work but can expense food and travel costs. A great opportunity for digital library experience! The researcher applied for the internship and met with the Head of Digital and two other staff in March to discuss the project. She was offered the post and went into the office once or twice a week between May and November 2014 until the main phase of the project was completed. Purpose of the internship Before the internship, there was confusion as to the purpose of metadata fields and how to fill them in. For example, information would be repeated unnecessarily, as in the following metadata applied to a photo of two members of staff at a fundraising event: Caption: 'London, 21.11.2012. A night held at The Ritz for [shortened name of charity] ([full name of charity]).' Keywords: '2012' '[full name of charity]' '[shortened name of charity]' 'november' 'Ritz' 'the ritz' Users were also unsure which Folders to put resources in. For example, there were photos of the Aberdeen centre in both a top-level Folder called 'Aberdeen Images' and in a sub-folder of the top-level 'Centres' Folder called 'Aberdeen'. Some photos had lots of metadata and others did not. And none of the metadata fields were compulsory to fill in. Finally, staff were finding it difficult to find what they needed and were also concerned that they were not getting the most out of Third Light for helping them to organize the assets. To address these issues, the researcher needed to improve the following things:

• quality of the metadata • ease of filling in the metadata forms • clarity of the folder structure • discoverability of resources

The project - planning, delivery and outcomes The researcher was given two documents that had been prepared to help plan the process for reorganizing the photo library (as it was referred to, despite the fact that it also contains video and text files). The first

45

document, entitled ‘Third Light Process’, contained a draft procedure for improving the library. The process it outlined can be summarised as follows:

• set up new metadata fields based on staff suggestions • survey staff • compile a ‘list of useful keywords’ based on survey • apply the new metadata to the old photos • reorganise the Folder structure and make use of Smart Folder functionality2 • train staff

The other document entitled 'Third Light Reorganisation' detailed aspects of the existing metadata scheme. It mapped the Folder structure and drafted a potential new Folder structure. It also suggested new metadata fields and showed a list of all the user-generated keywords that had been applied in the 'Keywords' field for describing the subject of photos. The researcher started by reading through these documents and looking through the DAM. The metadata fields that were used before the researcher arrived were: Name of field

Value to be entered Purpose

Caption Text A short description of the nature/subject of the resource and any information that could not be expressed in other fields

Keyword Text Describing the subject of the photos Date created

Date Showing date created (automatically populated)

Copyright Notice

Text Showing rights holder and any copyright restrictions

The suggested new metadata fields were: Caption (abstract) File name (title) Keywords (descriptives) Audience Date created Credit (rightsholder/creator) Related files Format/media type Location (geographic) Event Some of these suggestions did not need to be metadata fields as the information was already provided elsewhere. 'File name' existed as a piece of information provided automatically along with the file size and type. However, as part of the internship, the researcher was asked to make sure that files were given recognisable names so that when they were downloaded they could be easily identified. There was also already a 'Related files' function that allows users to attach files to each other and then access related files using a tab on the file's record. The researcher did not create an 'Audience' field because she thought it would be difficult when tagging photos to know for certain which audience they would be useful for. Also, the researcher thought users

2Smart Folders are saved searches. For example, you could have a Smart Folder for all photos tagged with 'Centre>Detail>Abstract' that updates every time a new photo with this tag is added.

46

would be more likely to want to search using the other metadata than by photos tagged as specifically of interest to them. The researcher used the other suggested fields in the end but adapted the names slightly. The fields she created are as follows (in the order they are displayed): Field Value to be entered Purpose Caption Text A brief description of the nature/subject of the resource

and/or any information that cannot be provided in the other fields

Keywords Controlled term Description of the subject of the resource Centre Controlled term Centre associated with the resource Event Controlled term Event associated with the resource Resource type Controlled term Type of resource - photo/video/architectural file etc. Date created Date and time

(automatically populated) When the photo was taken

Copyright notice Text Name of the copyright owner and any copyright instructions

Special instructions Text Special instructions for how the resource should be used These fields are known in Third Light as 'Primary' metadata. That is, they are the most important fields and therefore displayed more prominently. The researcher also included the following fields as 'Secondary' metadata: Field Value to be entered Purpose Last Changed Date and time

(automatically populated) When the file was last edited

Upload Date Date and time (automatically populated)

When the file was uploaded

The researcher decided to use controlled vocabularies for four of the fields. This was to help with tagging and searching for the resources. The controlled vocabularies are displayed in Appendix C, p.49. The 'Centre' controlled vocabulary is displayed as a dropdown list. Terms are in alphabetical order and only one term can be chosen to populate the field. The 'Resource type', 'Event' and 'Keywords' controlled vocabularies are displayed as 'trees' (to use Third Light's terminology). When displayed as 'trees', the terms are hierarchically structured and more than one term can be chosen to populate the field. All terms from a controlled vocabulary are hyperlinked so that they can be clicked to show all other documents tagged with those terms. The Advanced Search function also allows users to search using controlled terms. Although the researcher had been advised to survey staff to get a list of the keywords they found useful for searching, in the end she looked at the keywords that had been applied to photos to get an idea of what features users found useful or interesting to describe. Although she did not realise this function existed at the time, there are also logs of the search terms users have used, which, with hindsight, would have been useful to look at in creating the controlled vocabularies. The 'Keywords' controlled vocabulary took the longest to create because of the range of subjects of the resources. Some of the resources the researcher had looked at had been described in considerable detail, including keywords like 'table', 'red', 'cushion' and names of people. Thus the researcher wanted the controlled vocabulary to allow for tagging and searching at a detailed level, which meant spending a lot of time compiling terms that would cover the subject of all the resources. The 'Events' controlled vocabulary also took time to create because of the range of events that the charity organises. The Events team advised the researcher on how to classify them. When it came to the events

47

organised by the centres, each centre had its own list of events, although many organised the same core set of events. As a result, the researcher emailed all the centres to ask them for a list of the events that they had organised or were planning to organise to compile the keywords accordingly. After the researcher had created the new fields and set up the vocabularies, she went through the library to apply missing metadata and edit existing metadata ('legacy metadata') to conform to the new scheme. (The researcher had kept all the legacy metadata but renamed fields where necessary.) The researcher imported metadata from Excel to the 'Resource Type', 'Centre' and 'Event' fields to make it easier to apply in bulk. All assets now have values in these fields. However, the researcher did not have time to update all legacy metadata in the 'Keywords' field of resources given that each file has to be analysed individually to apply subject keywords. The researcher made the 'Resource Type', 'Centre', 'Event' and 'Keywords' fields compulsory for users to fill in. All metadata fields are indexed for retrieval and displayed next to the file to aid selection. The researcher then reorganised the folder structure to make it as simple and intuitive as possible. She did not create Smart Folders (as had been suggested initially) because of lack of time. Finally, the researcher trained staff in how to use the new system. This involved spending a morning with the relevant staff members taking them through five Powerpoint presentations:

• Introduction - an introduction to the photo library and explaining the need for a new metadata schema, including the need for controlled vocabularies

• Applying metadata • Configuring metadata (i.e. editing metadata fields and controlled vocabularies) • Search • The new folder structure

The presentations also included exercises for them to have a go at applying and configuring metadata and searching for resources. They did not have time to do all the exercises. Third Light provides documentation and email support for users to learn about the system. The researcher used these heavily to learn about the system and create the metadata scheme.

48

APPENDIX B Metadata fields Users can view and edit all of the metadata attached to a file by going to a page called the File Console (by clicking on a thumbnail or name of the file). The File Console displays a small version of the image and all of its metadata. The metadata in the File Console is divided into three sections: Primary Metadata, Secondary Metadata and File Info. Primary metadata are the most important fields and therefore displayed more prominently. The tables below reflect the order in which the fields are displayed. The fields shaded in red are compulsory to fill in. Users can also view metadata by hovering their cursor over the thumbnail of an image, which displays some of its metadata in a pop-up box. All metadata fields are indexed for retrieval. Primary metadata Field Value to be entered Purpose Caption Text A brief description of the nature/subject of the resource

and/or any information that cannot be provided in the other fields

Keywords Controlled term Description of the subject of the resource Centre Controlled term Centre associated with the resource Event Controlled term Event associated with the resource Resource type Controlled term Type of resource - photo/video/architectural file etc. Date created Date and time

(automatically populated) When the photo was taken

Copyright notice Text Name of the copyright owner and any copyright instructions

Special instructions Text Special instructions for how the resource should be used Secondary metadata Field Value to be entered Purpose Last Changed Date and time

(automatically populated) When the file was last edited

Upload Date Date and time (automatically populated)

When the file was uploaded

File Info Field Value to be entered Purpose Filename

Various (automatically populated)

Name of the file Reference Unique reference number Location Folder where the file is stored Owner Person who uploaded the file File size File size in MB and Megapixels Est. TIFF size Estimated TIFF size in MB Quality level Quality rating out of 8 Views Number of times file has been viewed Downloads Number of times file has been downloaded Emails Number of times file has been emailed from the DAM EXIF data This field contains all of the Exif (Exchangeable image

file format) metadata attached to a file. Exif data provides technical data about photos, e.g. resolution and exposure time.

49

APPENDIX C Controlled vocabularies Below are the controlled vocabularies used to populate the Keywords and Resource Type fields. Some keywords (in square brackets) have been changed to maintain the anonymity of the charity. The researcher has not included the controlled vocabularies for the Event and Centre fields because they would have identified the charity. Resource Type field controlled vocabulary architectural

computer-aided design (CAD)

computer-generated image (CGI)

hand drawing

model

plan

artwork

audio

creative

landscaping

logo

magazine layout

map

photo

poster

presentation

text

video

Keywords field controlled vocabulary

Art

Artist

[name of artist]

Artwork

drawing

painting

photograph

print

sculpture

sketch

tapestry

Collection

[name of collection]

50

Title

[name of artwork] Centre

Architect

[name of architect]

Award

[name of architectural award]

Campaign

Construction

Detail

Abstract

Colour

black

blue

brown

green

grey

orange

purple

red

yellow

white

Effect

shadow

reflection

Material

concrete

metal

wood

Time of day

day

night

sunset

View

garden

sea

sky

Exterior

Area

courtyard

garden

hospital

parking

roof

staircase

terrace

Feature

bamboo

flower

flower bed

grass

labyrinth

road

sign

solar panel

stilt

stones

51

stream

tree

woodwork

Firm of architects

[name of architectural firm]

Groundbreaking

Interior

Area

corridor

counselling

fireplace

information

kitchen

lavatory

library

meeting

office

sitting

staircase

welcome

Feature

ceiling

door

light

skylight

window

Furniture

armchair

bench

bookshelf

chair

coatstand

lamp

rocking chair

sofa

table

Object

basket

book

cake

coaster

computer

fruit

leaflet

mug

paperweight

scarf

teapot

tin

toy

visitors book

Ornament

cushion

flower

ornamental bowl

photo

rug

vase

52

Interior designer

[name of interior designer]

Landscape architect

[name of landscape architect]

Event

Activity

cycling

drawing

drinking

eating

running

skydive

speech

talking

walking

Location

[location of event]

[Name of event partner]

Other

balloon

bus

flag

motorbike

Refreshments

afternoon tea

cake

coffee

dinner

drinks reception

lunch

People

Animal

dog

[Name of co-founder]

Generic

boy

centre visitor

children

film crew

fundraiser participant

girl

guest

health professional

man

musician

photographer

reporter

volunteer

waiter

woman

[Name of co-founder]

[Name of charity] staff

Centre head

[name of Centre]

[name of Centre Head]

Chairman

[name of Chairman]

53

Director

[name of Director]

Executive

[name of Executive]

Major donor

[name of major donor]

Patron

[name of Patron]

President

[name of President]

VIP guest

[name of VIP guest]

VIP supporter

[name of VIP supporter] Support

Emotional support

Creative writing

Expressive art

Managing stress

Mindfulness

One-to-one

Relaxation

Other support

Reading

Practical support

Benefits advice

Exercise

Nordic walking

Qi gong

Tai chi

Walking

Yoga

Information

Look good feel better

Nutrition

Talking heads

Weight management

Social support

Friends and family

Gardening

Kid's day

Kitchen table

Cake

Conversation

Tea

Singing

Support groups

Men's support group

Women's support group

Young women's support group

54

APPENDIX D Folder system Third Light has five different types of folder with different functions. The names of the folders are capitalised throughout the dissertation. Name of folder Purpose Folder Folders are the primary way of organising files and giving them a physical location in the DAM. Collection Collections are for collections of files based, for example, around a particular theme or project. Smart Folder Smart Folders are 'saved searches'. They automatically compile collections of resources based on

their metadata, pulling in new files with the relevant metadata each time they are opened. The metadata criteria are specified when Smart Folders are created and stored in their settings.

Lightbox Lightboxes enable adding files to a 'clipboard-style' working area. They provide a convenient way to gather together a diverse selection of files and work on them as a group. They are only visible to their creators and anyone invited to share them. Ultimately, a Lightbox can be saved as a Collection.

Event Events are folders that are accessible via the login page of the DAM. They enable sharing resources with people who are not users of the DAM as no login to the DAM is required to view them.

Folders, Collections and Smart Folders are displayed on the homepage of the DAM. Folders are displayed first, followed by Collections and then Smart Folders. A file can only physically exist in a Folder. Once it has been added to a Folder, it can also be added to a Collection, Smart Folder, Lightbox or Event, but it is not physically copied to these locations. This helps save on storage space. This is the basic Folder structure that the researcher created during her internship. For the sake of anonymity, it does not show all the sub-folders and the name of one of the Folders (in square brackets) has been changed. The structure is an informal hierarchy, based on a classification of the subject or purpose of files, which is supposed to help users choose which Folder to put a new file in or which Folder to search for a file in. New Folders have been added since the researcher created the Folder structure. These have been omitted to make the original design of the Folder structure clearer. Art and design

Architectural plans, models and landscaping

[name of publication]

Artwork Centres

[name of centre]

Campaign3

Groundbreaking/Construction/Build4

Events

Centre events

Corporate events

National events

Special events Organisational

Logos/branding

Presentations

Videos

3'Campaign'referstothecharity'scampaigntobuildanewcentre.4'Groundbreaking/Construction/Build'referstotheconstructionofanewcentre.

55

APPENDIX E Orientation and Overview Phase Interview Questions BACKGROUND 1 How long have you been working at Maggie's? 2 How long have you been using the resource library? 3 Have you ever used this software or similar software before? 4 Did you take part in the resource library training session in November last year? INFORMATION NEEDS, SEEKING AND USE 5 What do you use the resource library for? 6 Which of the following search methods do you use?

- general search - advanced search - browsing the folders

7 Which of the following are important to your search? - non-visual information such as the photographer, date created or image resolution - visual information such as the colour or composition of the photo - conceptual information such as who or what is in the photo

8 Which of the following searches do you usually perform? - searching for a specific file - searching for a subject, e.g. a person or activity - browsing

9 Does searching the resource library allow you to complete your tasks/find the information you need? 10 Do you usually find the resource(s) you need on the first page of search results or do you have to

browse the other pages and/or refine your search? 11 Do you make use of the following tools to refine your search?

- search box - facets

12 Do you make use of the Collection functionality? Collections are a type of folder which you can save files to without having to create copies of the files or move them from the folders in which they are stored.

13 Do you make use of the Lightbox functionality? Lightboxes are like Collections but they aren't visible to other users unless you invite them to view/edit them. They are useful when you're at the draft stage of a project, e.g. planning a publication.

14 Do you make use of the Smartfolder functionality? Smartfolders are saved searches. For example, you could have a Smartfolder for all photos tagged with 'Centre>Detail>Abstract' that updates every time a new photo with this tag is added.

15 Do you find your information needs change as you search? 16 Do you ever find information by chance? 17 What are your feelings as you go through the search process? 18 Do you ever search the library just out of interest? 19 How much time do you usually have to search for resources in order to complete your tasks? 20 When you find a resource you need, what do you usually do?

- download it - email it to yourself/a colleague - put it in a Collection or Lightbox

METADATA 21 Please have a look at the list of metadata fields and put a cross next to any whose purpose you're not

sure of. [list contained Primary, Secondary and File Info fields] 22 When you're looking for photos, do you ever look at the metadata? If so, how helpful is the metadata

for helping you identify what's in the photo and judging whether it might be useful? 23 Please have a look at the controlled vocabularies used to populate four of the metadata fields. As a

56

reminder, controlled vocabularies are terms used to describe resources. Last November I advised against using the controlled vocabularies to search because of having to apply the controlled terms to the old photos first. It is now possible to search the Centre, Event and Resource Type fields using controlled vocabularies. Do you think you will make use of these controlled vocabularies to improve your searches?

24 Many of the old photographs lack metadata in the Keywords field. Do you think that it'd be worth adding keywords to these photos so they are more easily retrievable?

25 It takes a long time to convert the old user-created tags in the Keywords fields into controlled terms. I have only managed to do this for a small percentage of the resource library. Do you think that it'd be worth continuing to convert these tags into controlled terms so the photos are more easily retrievable?

26 Do you think these controlled vocabularies could be useful as an information-giving tool in the wider context of the charity?

27 Do you upload photos? 28 How often do you upload photos? 29 Do you find it easy to apply metadata when uploading photos? 30 Do you find the controlled vocabularies help you apply metadata? 31 Do you think the metadata fields allow you to provide all the relevant information about photographs? 32 Do you think it's helpful that it's compulsory to fill in the Resource Type, Centre, Event and Keywords

fields? 33 Do you know what kind of keywords to add to match user needs? i.e. Do you know what they're likely

to search for and how specific their search terms are likely to be? 34 Have you edited the metadata of any existing resources? 35 What changes have you made and why? 36 Have you edited the metadata schema? i.e. made changes to the number of fields or the controlled

vocabularies? If so, what changes did you make and why? 37 The 'Support' branch of the controlled vocabulary for the Keywords field has been changed. Could you

talk me through how and why this was changed? 38 I see you added the term 'Abstract' to the 'Centre > Detail' branch of the Keywords controlled

vocabulary. Could you talk me through how and why this term was added? USER EXPERIENCE 39 How user-friendly do you find the library? 40 What device(s) do you use to access the resource library? 41 What is your experience of the following aspects of using the resource library?

- interface - graphics - design - navigation of the software - overall performance of the system

42 Do you think Third Light offers adequate support in how to use the software? DIGITAL ASSET MANAGEMENT 43 Please comment on the following aspects of managing the resource library if they are relevant to you:

- keeping the metadata fields and controlled vocabularies up-to-date and relevant to user needs - technical issues, e.g. implementing system updates, dealing with technical errors and monitoring storage capacity - inductions for new users

44 Do you think the time invested in maintaining controlled vocabularies is worth it? Or do you think it would be better to go back to using user-created terms in the fields where controlled vocabularies are now used?

45 How do you select which resources to store in the resource library? 46 Do you go through and delete duplicates? Are you aware that you can enable the interception of

duplicates at the upload stage? 47 Do you envisage no longer needing to store some of the resources in Third Light in the future? 48 Do you have any questions for me?

57

APPENDIX F Orientation and overview phase: log analysis methods and results Audit logs To generate the audit logs, the researcher specified that for each participant the system should return an unlimited number of records for all types of activity with no time restriction. In fact, only a year's worth of activity can be logged so the number of records returned is not unlimited. For two of the users (P1 and P4), the logs are incomplete because they changed to being a different type of user (i.e. with different permissions) and the logs cannot show their activity before they changed. In these two cases I could only see the logs of their activity since they changed to being 'admin users' in the summer of 2015. The log of the Marketing Coordinator's (P1) activity was created on 11/10/15 and shows activity between 10/8/15 and 9/10/15. The Marketing Coordinator used the system before 10/8/15 but as a 'normal' user so her previous use of the IMS is not represented in this log. The log shows that she logged in 46 times during these two months. It shows that she moved 104 files, emailed a file, edited a user, edited the metadata of 21 files, deleted a file and sent a file to the recycle bin to be temporarily stored before being automatically deleted. P1 also created, edited and deleted a number of 'Events'. An 'Event' is a folder that makes selected parts of the library publicly accessible from the login page of the IMS. The log of the Publications Manager's (P2) activity was created on 11/10/15 and shows activity between 22/4/15 and 9/10/15. The Publications Manager started his job at the charity in April 2015, which is why the log is so short. The log shows that he logged in 54 times during this period and that be published one file - i.e. made it publicly accessible online via a URL. The log of the Website and Social Media Editor's (P3) activity was created on 20/10/15 and shows activity between 20/10/14 and 14/10/15. It shows that she logged in 79 times, moved 39 files, uploaded and applied metadata to 437 files, deleted and then restored 12 Folders and permanently deleted 1 Folder. The log of the Digital Production Coordinator's (P4) activity was created on 11/10/15 and shows activity between 18/6/15 and 9/10/15. The Digital Production Coordinator used the system before 18/6/15 but as a 'power' user so her previous use of the IMS is not represented in this log. The log shows that she logged in 24 times. The log of the Marketing Coordinator's (P5) activity was created on 20/10/15 and shows activity between 20/10/14 and 7/8/15. This user left the charity in August 2015, which is why she has not used it since then. The log shows that she logged in 164 times and used the IMS in many different ways. Her use of the IMS can be divided into the following broad categories: user management, file management, folder management, metadata management and sharing and publishing files. User management involved deleting and editing the permissions of users. File management involved uploading files, sending them to the recycle bin and deleting them. Folder management involved moving files to different Folders and adding files to Collections. It also involved editing the properties of Folders, Collections, Events and Lightboxes, and deleting Folders, Lightboxes and a Collection and recycling a Collection. Metadata management involved applying metadata to uploads and editing the metadata of existing assets. It also involved editing the Custom Metadata fields (fields such as 'City' or 'Embargo Date' that are suggested by the IMS but not in use) and editing the controlled vocabularies. Sharing and publishing files involved creating an Event and emailing files. Search logs A sample of 113 searches made between 1/5/15 and 31/7/15 was analysed. This sample was chosen because the period from May to July 2015 was when all five participants in the case study were working at the

58

charity (i.e. after P2 arrived and before P5 left). The number of searches performed by each participant was as follows:

• Marketing Coordinator (P1) - 36 • Publications Manager (P2) - 20 • Website and Social Media Editor (P3) - 13 • Digital Production Coordinator (P4) - 14 • Marketing Coordinator (P5) - 30 • Total - 113

To analyse the types of information searched for, the researcher classified the keywords used in the searches as follows:

• art concept • centre concept • event concept • people concept • support concept • file reference number • orientation (portrait or landscape) • resource type • unclear

To help classify the queries, the researcher based the categories on the taxonomy she created for the controlled vocabularies. Thus, the 'concept' categories are all based on the controlled vocabularies used to populate the Centre, Event and Keywords fields. And the 'resource type' category is based on the controlled vocabulary used to populate the Resource Type field. If a query matched a term in any of these vocabularies the researcher would classify it under the corresponding category. So, for example, a search for an artist would be classified as 'art concept' rather than 'people concept' as artists are classified under 'Art' rather than 'People' in the controlled vocabulary for the Keywords field. In some cases it was not clear what users were searching for. For example, a search for 'eating' could be classified either as an event concept or a support concept, depending on whether it was for people eating during an event or during a support activity. I asked the research participants to clarify as many of these queries as possible. I classified the rest as 'unclear'. Some searches included more than one type of concept or, if using Advanced Search, two different conditions that the search should fulfil. For example, one search was for a support concept and a people concept ('HSBC volunteer gardening'), and another search required a keyword match but also that the orientation could be either portrait or landscape. So even though 113 searches were performed, the total number of things searched for is greater. In one case, a query could not be classified using any of the above categories. This was the query 'hope'. The searcher clarified that 'I was looking for an image which illustrated 'hope' - just in terms of being joyful etc'. The researcher classified this query as a support concept, even though it could be thought of as a people or an event concept too. There were no searches for visual information (i.e. colours or shapes). Four of the keywords were typos so are not included in the chart below. The results are as follows:

59

The conceptual searches are mainly for specific people, events, support activities etc. However, to analyse the specificity of these search terms is beyond the scope of this research. Out of 113 searches, 3 were performed using Advanced Search (i.e. the exact match information retrieval model) and the rest using General Search (i.e. the best match information retrieval model). Sixteen of the terms were not in the controlled vocabularies and eighty-seven were. The IMS also reports the 20 most opened folders and 20 most opened files. This could also give an indication of what users are searching for. On 1/10/15, the 20 most opened files were photos of centres (13), different versions of the charity's logo (5) and photos of centre visitors (2). The 20 most opened folders were folders of centre photos (13), folders of events photos (4), a folder of staff/founder/patron photos, a folder of artwork and a folder of logos. Download logs A sample of 206 downloads was analysed. The downloads were made between 1/5/15 and 31/7/15. Again, this sample was chosen because the period from May to July 2015 was when all five participants in the case study were working at the charity. The results are as follows:

051015202530

Num

berofqueries

Whatdoparticipantssearchfor?

020406080100120140160180

MarketingCoordinator

(P1)

PublicationsManager(P2)

WebsiteandSocialMediaEditor(P3)

DigitalProductionCoordinator

(P4)

MarketingCoordinator

(P5)

Num

berofdow

nloads

Numberofdownloadsperparticipant

60

APPENDIX G Focused exploration phase: group interview INTRODUCTION 1 Is there anything you would like to discuss before we begin? ACQUISITION 2 Who acquires photos? Do they give any instructions to photographers about what or how many

photos to take? 3 Who owns the rights to the photos? How could you find this out? What about the architectural files?

Do the rights belong to the architect? SELECTION 4 Who selects which files to add? The photographer? The architect? The charity? FOLDER SYSTEM 5 How clear is the folder system? 6 When a new Folder needs to be created, who is in charge of this? 7 Are users aware of Collections and how they are used? 8 Are users aware of Smart Folders and how they are used? INDEXING POLICY 9 Should the mood of photos be indexed? 10 How could indexing policy support searching for colours/shapes? 11 How could the charity solve the problem of having lots of files to index? 12 How could the charity solve the problem of old photos lacking subject metadata? 13 What do you think about having an 'expert' user be assigned to check the accuracy and

comprehensiveness of metadata applied to assets? CONTROLLED VOCABULARY GOVERNANCE/STEWARDSHIP 14 Should there be an 'owner' of the controlled vocabularies who is responsible for keeping them up to

date? DISPOSAL/ARCHIVING 15 Comment on the following ideas:

- buying more storage space to accommodate collection - depositing the collection in an external archive - systematically weeding the collection

TRAINING 16 Users still browse folders, which is not the most efficient search method. Some members of staff

struggle with the metadata schema. 17 What is the most efficient way of training users in how to use the photo library? Comment on the

following ideas. - let them work it out for themselves using manual, training materials that the researcher created and colleagues - have a designated 'expert' who is happy to answer questions about the photo library and run training sessions - ask Third Light to give training sessions

PROMOTION 18 How could the charity promote the photo library to all staff?

61

APPENDIX H Focused exploration phase: P7 interview questions BACKGROUND 1 How long have you been working at Maggie's? 2 How long have you been using the photo library? 3 Have you ever used this software or similar software before? 4 Did you take part in the photo library training session in November last year? If not, have you had a

chance to look at some of the training materials uploaded to the photo library? 5 What do you use the photo library for? 6 How do you find using the photo library? ACQUISITIONS 7 Do you commission photos? If so what type of photos? 8 Do you know the copyright agreements? SELECTION 9 Do you select which files to add to the photo library? 10 Do you know why there are more photos of some events than others? FOLDER SYSTEM 11 How clear is the folder system? 12 Are you aware of Collections, Lightboxes and Smartfolders and how they are used? 13 How should the folder system be managed? INDEXING 14 How often do you upload photos? 15 What do you think of the new metadata form? You might like to comment on the fields, the controlled

vocabularies and the fact that some fields are compulsory to fill in. 16 Do you find the controlled vocabularies help you apply metadata? 17 Do you know what kind of keywords to add to match user needs? i.e. Do you know what they're likely

to search for and how specific their search terms are likely to be? 18 How could large numbers of files be indexed effectively and efficiently? 19 What do you think about having an 'expert' user assigned to check the accuracy and

comprehensiveness of metadata applied to assets? The photo library could require that all new uploads are checked by a specific admin user..

CONTROLLED VOCABULARY GOVERNANCE/STEWARDSHIP 20 Should there be an 'owner' of the controlled vocabularies who is responsible for keeping them up to

date? ACCESS AND USE 21 Do you search on behalf of other users? DISPOSAL/ARCHIVING 22 In just under 4 years, the charity has used up 82.76% of storage space [the storage limit at the time of

the interview was still 125 GB]. Comment on the following ideas: - buying more storage space to accommodate collection - depositing the collection in an external archive - systematically weeding the collection

TRAINING 23 What is the most efficient and effective way of training users in how to use the photo library? PROMOTION 24 How could the charity promote the photo library to all staff?

62

APPENDIX I Focused exploration phase: image-tagging exercise forms and analysis Instructions: Please write how you would describe these seven images. Imagine you are filling out the Caption and Keywords fields in the metadata form. The Caption field is optional and you can write whatever you want in it. The Keywords field is compulsory (so please assign at least one keyword to each image) and is filled using the controlled vocabulary for the Keywords field. To browse the controlled vocabulary for the Keywords field, log in to the Photo Library, go to Search > Advanced Search and choose 'Keywords' from the first menu. The controlled vocabulary is then displayed in the third menu. If you can't find the keyword you're looking for, make up your own. And if you don't know the specific name for something, write 'specific x'. Your indexing should be guided by user needs - try and think what users would search for and what metadata would be helpful to them when selecting images. P1 Marketing Coordinator Image Field Values 1 Caption Statue at [name of Centre]

Keywords Statue, [name of Centre], Garden, Exterior 2 Caption

Keywords Kitchen table, Tea, People, Centre visitors, Flowers 3 Caption [name of Centre] exterior

Keywords [name of Centre], Treehouse, Exterior, Tree, Balcony, Garden 4 Caption [name of charity] walking group

Keywords Walking group, Outside, Garden, Centre visitors, Exercise 5 Caption Steven Holl’s illustration of [name of Centre]

Keywords Steven Holl, [name of Centre], Campaign, London, Art, Illustration, Architect, Architecture

6 Caption Trinny Woodall and Susannah Constantine at The Autumn Party Keywords Trinny Woodall, Susannah Constantine, Event, London, Patrons, Autumn party

7 Caption Light installation at Chelsea Physic Garden Keywords [name of fundraising event], 2015, Light, Installation, Cranes, Paper, Chelsea

Physic Garden, Walkers, Garden, Outside P3 Website and Social Media Editor Image Field Values 1 Caption Exterior view of [name of Centre] by architect Frank Gehry with the sculpture,

Another Time X, by artist Anthony Gormley in the foreground. Keywords Sculpture, Frank Gehry, Exterior, Anthony Gormley, Arabella Lenox-Boyd, [name

of Centre], Another Time X, metal, day. 2 Caption Centre visitors in conversation around the kitchen table at [name of Centre].

Keywords People, Centre visitor, man, woman, kitchen table, tea, conversation, social support, flower, interior, spring

3 Caption Exterior view of [name of Centre] by architect Wilkinson Eyre. Keywords [name of Centre], Wilkinson Eyre, exterior, tree, treehouse, stilt, garden, day,

green, summer. 4 Caption Group of Centre visitors taking part in the Nordic Walking group at [name of

Centre]. Keywords Centre visitors, people, practical support, social support, Nordic walking, exercise,

woman, day, exterior, grass, winter, [name of Centre]. 5 Caption Architects’ drawing of the planned interior for [name of Centre].

63

Keywords [name of Centre], London, Steven Holl, drawing, interior, sketch. 6 Caption Trinny Woodall and Susannah Constantine at the [name of charity] autumn party

special event in 20xx(?) Keywords Trinny Woodall, Susannah Constantine, Autumn party, special events, celebrity

7 Caption [name of fundraising event] walkers looking at the origami art installation by BPD Lighting at Chelsea Physic Garden at [name of fundraising event] London 2015.

Keywords National event, [name of fundraising event], London 2015, Chelsea Physic Garden, [name of fundraising event], walkers, night, origami, installation, BDP Lighting

P7 Communications Manager, Scotland Image Field Values 1 Caption Exterior shot of [name of Centre] taken from behind the Anthony Gormley

sculpture. Keywords [name of Centre], exterior, sculpture, Anthony Gormley, roof, Frank Gehry,

Another Time X, day, sky 2 Caption Centre visitors enjoying a cup of tea and a chat at [name of Centre]

Keywords [name of Centre], Richard Murphy, interior, sitting, counselling, armchair, mug, tea, centre visitor, conversation, support

3 Caption Exterior shot of [name of Centre] Keywords [name of Centre], exterior, specific architect, garden, tree, path, centre visitor

4 Caption Centre visitors from (Specific centre) take part in a Nordic walking group as part of the programme of support.

Keywords Specific Centre, centre visitors, Nordic Walking, exercise, practical support 5 Caption Architect’s drawing of [name of Centre] by Steven Holl

Keywords [name of centre], Steven Holl, staircase, welcome, centre visitors, window, table 6 Caption TV personalities Trinny & Susannah at The Autumn Party (year?)

Keywords The Autumn Party, drinks reception, guest 7 Caption Participants at [name of fundraising event] London (year) enjoy art installation

(details). Keywords [name of fundraising event] London, walking, fundraiser participant, art installation

Analysis of image descriptors The descriptors applied by participants during the image-tagging exercise have been classified to help the researcher analyse what they describe and how specific they are. N.B. Only the descriptors applied in the Keywords fields have been analysed. The classes, 'Generic of', 'Specific of' and 'About', are taken from the Panofsky-Shatford matrix (Shatford, 1986, 49). The 'Generic of' class is for descriptions of generic concepts, the 'Specific of' class is for descriptions of specific concepts and the 'About' class is for descriptions of abstract concepts such as ideas or emotions. The 'About' class is also for terms such as 'Campaign5', 'support', 'practical support', 'emotional support' or 'social support'. The 'Object' class is for people or things. And the 'Activity' class is for actions or events. N.B. The researcher did not classify the term, 'green' (used to describe Image 3), because this is not strictly a description of the subject of the image, although a valid descriptor in other respects.

5'Campaign'referstotheplanningphaseforanewCentre.

64

Image 1 Image 2 P1 P3 P7 P1 P3 P7 Generic of

Object 2 2 3 4 6 2 Time 1 1 1 Place 1 1 1 1 1 Activity 1 1

Specific of

Object 1 5 4 4 Time Place 1 1 Activity 2

About Object Time Image 1

totals: Image

2 totals:

Place Activity 1 1

Total Generic of 3 (75%)

4 (44.4%)

5 (55.6%)

12 (54.5%)

4 (80%)

9 (82%)

4 (36.36%)

17 (63%)

Specific of 1 (25%)

5 (55.6%)

4 (44.4%)

10 (45.5%)

1 (20%)

1 (9%)

6 (54.54%)

8 (30%)

About 0 0 0 0 0 1 (9%) 1 (9.1%) 2 (7%) Overall no.

of terms applied

4 9 9 22 5 11 11 27

Image 3 Image 4 P1 P3 P7 P1 P3 P7 Generic of

Object 2 2 4 3 4 1 Time 2 2 Place 1 1 1 1 1 Activity 1 1 1

Specific of

Object 3 4 2 1 1 Time Place Activity 1 1

About Object Time Image

3 totals:

Image 2 totals:

Place Activity 2 1

Total Generic of 3 (50%)

5 (55.6%)

5 (71.4%)

13 (59%)

5 (100%)

8 (66.7%)

2 (40%)

15 (68%)

Specific of 3 (50%)

4 (44.4%)

2 (28.6%)

9 (41%)

0 2 (16.65%)

2 (40%)

4 (18%)

About 0 0 0 0 0 2 (16.65%)

1 (20%)

3 (14%)

Overall no. of terms applied

6 9 7 22 5 12 5 22

65

Image 5 Image 6 P1 P3 P7 P1 P3 P7 Generic of

Object 3 2 4 1 1 1 Time Place 1 Activity 1 1 1

Specific of

Object 2 2 2 2 2 Time Place 1 1 1 Activity 1 1 1

About Object Time Image 5

totals: Image

6 totals:

Place Activity 2 1

Total Generic of 3 (37.5%)

3 (50%)

4 (57%)

10 (47.6%)

2 (33.3%)

2 (40%)

2 (66.7%)

6 (43%)

Specific of 3 (37.5%)

3 (50%)

2 (29%)

8 (38.1%)

4 (66.7%)

3 (60%)

1 (33.3%)

8 (57%)

About 2 (25%)

0 1 (14%)

3 (14.3%)

0 0 0 0

Overall no. of terms applied

8 6 7 21 6 5 3 14

Image 7 P1 P3 P7 Generic of

Object 5 2 2 Time 1 Place 1 Activity 1 1

Specific of

Object 1 2 Time 1 1 Place 1 2 1 Activity 1 1 1

About Object Time Image

7 totals:

Place Activity

Total Generic of 6 (60%)

4 (40%)

3 (60%)

13 (52%)

Specific of 4 (40%)

6 (60%)

2 (40%)

12 (48%)

About 0 0 0 0 Overall no.

of terms applied

10 10 5 25

66

APPENDIX J Consent forms

CONSENT FORM [signed by P1, P2, P3, P4 and P5] Title of Study: 'A Case Study of Digital Information Resources belonging to a charity.'

Please initial box 1. I agree to take part in the above City University London research project. I have had

the project explained to me, and I have read the participant information sheet, which I may keep for my records. I understand this will involve:

• being interviewed by the researcher • analysis of my use of the digital asset management system using computer logs

I understand it might also involve:

• participating in a further interview, focus group or questionnaire asking me about my use of digital resources and/or the digital asset management system

• using a computer to test the usability of a system • letting the researcher analyse the metadata I have applied to files [this was

included later and signed only by P3]

2. This information will be held and processed for the following purpose(s):

• completion of the case study I understand that any personal data I provide (e.g. data collected from interviews, questionnaires etc.) will be anonymised to prevent my identity from being made public. AND I understand that I will be given a transcript of data concerning me for my approval before it is included in the write-up of the research.

3. I understand that my participation is voluntary, that I can choose not to participate in part or all of the project, and that I can withdraw at any stage of the project without being penalized or disadvantaged in any way.

4. I agree to City University London recording and processing this information about me. I understand that this information will be used only for the purpose(s) set out in this statement and my consent is conditional on the University complying with its duties and obligations under the Data Protection Act 1998.

5. I agree to take part in the above study.

____________________ ____________________________ _____________ Name of Participant Signature Date ____________________ ____________________________ _____________ Name of Researcher Signature Date

67

CONSENT FORM [signed by P7] Title of Study: A Case Study of Digital Information Resources belonging to a Charity.

Please initial box 1. I agree to take part in the above City University London research project. I have

had the project explained to me, and I have read the participant information sheet, which I may keep for my records. I understand this will involve:

• being interviewed by the researcher or filling in a questionnaire • allowing the researcher to analyse metadata that I have applied to photos

2. This information will be held and processed for the following purpose(s):

• completion of the case study I understand that any personal data I provide (e.g. data collected from interviews, questionnaires etc.) will be anonymised to prevent my identity from being made public. AND I understand that I will be given a transcript of data concerning me for my approval before it is included in the write-up of the research.

3. I understand that my participation is voluntary, that I can choose not to participate in part or all of the project, and that I can withdraw at any stage of the project without being penalized or disadvantaged in any way.

4. I agree to City University London recording and processing this information about me. I understand that this information will be used only for the purpose(s) set out in this statement and my consent is conditional on the University complying with its duties and obligations under the Data Protection Act 1998.

5. I agree to take part in the above study.

____________________ ____________________________ _____________ Name of Participant Signature Date ____________________ ____________________________ _____________ Name of Researcher Signature Date

68

APPENDIX K

Proposal for INM363 LIS Dissertation Project Working title 'A Case Study of Digital Information Resources belonging to a charity.' The case study will look at digital information resources belonging to a charity. It will be limited to the digital objects stored in the charity's digital asset management system (DAMS), Third Light. Introduction This project examines the digital information resources of a charity. The aim is to gain a greater understanding of how the resources are stored, organised, managed and used. A potential outcome of this is to see whether a metadata scheme that I created for them using Third Light software has helped them organise their information resources. The charity provides free practical, emotional and social support for people with cancer and their families and friends. The charity's centres are built in the grounds of NHS hospitals and offer support including clinical pyschology, nutrition advice, benefits advice and exercise. The charity uses Third Light Intelligent Media Server (IMS) to store and organise their digital assets. The software provides storage space (up to 125 GB in the case of the charity) and allows you to organize your files using folders and/or metadata. The search function includes an advanced search option and the ability to refine your results using metadata. The software aims to move users away from organizing assets via a folder structure to organizing them using metadata.6 As such, it provides sophisticated functionality for adding metadata to files. Specifically, it allows users to create controlled vocabularies for both indexing and searching for documents. It also allows users to specify compulsory metadata fields and to require metadata to be approved by an admin user before the file can be uploaded. 'Smart Folder' functionality can be used to automatically compile collections of resources based on their metadata. And 'Lightbox' functionality allows users to organise files on 'clipboards' without duplicating them or moving them from their original location. The charity started using Third Light in February 2012. Prior to this, their digital assets were stored on CDs. There are currently (as of 15/05/15) 7705 files on the system. These assets consist mainly of photos of centres and fundraising events. There are also architectural design files, graphic design files, videos and text. The Events, Marketing, Digital and Communication teams in Head Office and at centres are the main users of the IMS. They use it for storing, organising, managing and finding resources for marketing material and other types of company literature. Before I started the project, although staff added metadata when uploading files, this was not compulsory. Controlled vocabularies weren’t being used. Some resources contained lots of metadata, others contained very little. There was also confusion as to the purpose of the 'Keywords' field, with some people filling in subject information in the 'Caption' field instead and/or writing in full sentences in the 'Keywords' field when single words or phrases were called for. The folder structure was also confusing, especially the Events folder. The internship, which I applied for, required someone to improve the following aspects of their use of the IMS: - quality of the metadata 6Astheywriteintheir‘Help’document:'Metadataalsoprovidesamuchmoredynamicandpowerfulwaytoindexcontentthansimplefolderstructures,bettersuitedtomoderninformation-ledbusinesseswithrapidlyexpandingcollectionsofcontenttoindex.'(ThirdLight,2015)

69

- ease of filling in the metadata forms - clarity of the folder structure - discoverability of resources I worked part-time on the project for six months. I created a new set of metadata fields and, in some cases, controlled vocabularies for tagging and searching for resources. I made some metadata fields compulsory to fill in and reapplied metadata to the existing photos. I also improved the folder structure. I then trained staff in the new metadata system and how to make full use of the search capabilities of the IMS. Aims and objectives My aim in this case study is to gain a greater understanding of the charity's digital resources and how they are stored, managed and used using Third Light digital asset management software. In so doing I hope to be able to evaluate the system and provide insights into ways of improving it in practice, although this is not the ultimate goal of the case study. My objectives will be shaped and informed by my review of the professional and academic literature. They will also be determined by what issues are identified during the case study as particularly significant. Based on my prior knowledge of the case, I predict that the following questions will be prevalent: Since the introduction of the new metadata scheme and folder structure,

• is metadata consistently, accurately and fully applied? • does metadata provide the necessary information for staff? • does metadata help users find the resources they need? • is the folder structure clearer?

What information behaviour do staff display in relation to digital information resources? How do staff interact with the DAMS and what are their attitudes towards it? Scope and definition The scope of this dissertation is limited to the use of the system by myself and members of staff at the charity's head office. It will be limited to current use of the system, although I might compare the current system to the old system or reflect on the future of the system. I will look at all digital assets stored in the DAMS. Research context/literature review There have not been many case studies of digital resources in corporate environments. Kho (2007) studies the use of Getty Images by the news media. Stokes and Seers (2005) evaluate the use of a DAM by an advertising agency. And Broomfield (2009) and McGovern (2013) report on projects to set up DAMs in cultural institutions. Whilst these studies can help with assessing the need for or planning the implementation of a DAM, they are not grounded in reviews of the academic and practitioner literature. Thus they are less useful for gaining an understanding of how particular cases relate to research on organising digital information. Research relevant to digital image collections in corporate environments is wide-ranging. Relevant topics can be categorised as follows:

• Metadata • Information retrieval

70

• Information behaviour • User experience • Digital asset management systems

Monographs by Hider (2012) and Haynes (2004) cover the topic of metadata generally, including its purposes, creation and maintenance, quality and standards. Svenonious (2000), Lancaster (2003) and Broughton (2005) have written monographs specifically on classification. An article by Hjorland and Nissen Pedersen (2005) looks at the theory underlying classification and information retrieval and an article by Slavic (2011) includes an assessment of classification in the web environment. Lambe (2007) and Lippell (2015) consider the creation and use of corporate taxonomies. And Hider (2008) looks at how much technical services add value to collection management and access. Specifically to image collections, Hollinck, Schrieber, Wielinga and Worring (2004) investigate what users look for in images. Similarly, Jorgensen, Jaimes, Benitez and Chang (2001) look at how to categorise concepts used to describe visual content. Shatford (1986) explores the theory behind analysing the subject of a picture. And Baca (2003) evaluates the practical issues in applying metadata schemas and controlled vocabularies to cultural heritage information. Roberts (2001) investigates art indexing in electronic databases. There has also been much research on user-generated keywords in digital image collections. For example, Angus, Thelwall and Stuart (2008) look at tagging practices on Flickr and Matusiak (2006) looks at user-generated keywords in digital image collections. Greater insight into the systems and techniques of information retrieval (IR) in the digital age are given by Chu (2010). Her monograph explains the major components of an IR system and how to describe information for effective retrieval. She also explains retrieval techniques, relevance feedback and ranking algorithms. Al-Maskari and Sanderson (2010) provide more insight into user satisfaction in IR and Brodkin (2007) into the cost of ineffective search. Specific to image retrieval, Enser (2008) looks at the evolution of visual information retrieval and Jansen (2008) explores searching for digital images on the web. Various theories and models have been developed to help understand information needs, seeking and use. For example, Belkin's Anomalous State of Knowledge theory (1982) can help understand information need. Kuhlthau's 'information search process' model (2005) focuses on the cognitive processes in information seeking. And Dervin's Sense-Making methodology suggests that information seeking can be about formulating subjective perceptions of the world as much as about locating external, 'objective' information (Dervin, 2005). Leckie's model of the information seeking of professionals is perhaps of particular interest to this study (2005). Information use has been less subject to research but Savolainen (2009) identifies conceptualisations of information use as 'interpreting, relating and comparing qualities of things'. User experience is an 'end user's interaction with and attitude towards a given IT system or services, including the interface, graphics and design' (TechTarget, 2015). A key text is Morville and Rosenfeld (2006) on information architecture for navigation and design. Russell-Rose and Tyler (2013) focus on information architecture for discovery. Their monograph focuses on types of user, information seeking, designing for different search modes and displaying and manipulating search results. 'Primers' on digital asset management by Keathley (2014) and Hedges (2015) provide information on the type of work that goes in to setting up a DAM system and how implementing such a system can benefit an organisation. Topics include user research, creating and maintaining metadata, choosing a system, outreach and training and preservation. Various informal sources of information also exist, including the DAM Foundation's website (DAM Foundation, 2015). Methodology I will carry out a case study of the charity's digital assets in the DAM Third Light. This project seeks to gain greater understanding of the collection and potentially ideas for ways in which it can be developed or improved. It can be described as an intrinsic case study as it looks at all phenomena relevant to the case as

71

opposed to one in particular (Pickard, 2013). As a case study, it is not trying to generalise from the particular or to test a hypothesis. However, one of the expected outcomes is greater understanding of the effectiveness of the system. And it is likely that the findings from the case study could be transferable to other similar cases. Key informants for my case study will be members of the Marketing, Digital, Events and Communications teams at the charity, i.e. the main users of the digital resources. I will use purposive sampling to get the broadest picture of the information users in my case study. I will include myself in the sample as an expert user of the system. The design of my study might change as I gain an insight into the main issues. I will start by doing individual interviews with key informants. The design and scope of these interviews will be based on the literature review and will aim to uncover the main issues relevant to my case study. As such they will be part of the 'orientation and overview' phase identified by Pickard (2013) as the initial phase of case study research. I will also use audit logs to get a feel for how the system is being used during the orientation phase. These logs allow admin users of Third Light to see how users are interacting with resources. Activities that can be monitored using logs include the application and configuration of metadata and searching and downloading of resources. I will have informed participants about this observation, which might influence their behaviour. However, any distortion this gives to the 'true' picture of their use of the system is a necessary price to pay for being open with participants about research methods used. The interviews I carry out during the orientation and overview phase will be semi-structured. This will allow me to use the insights from my literature review and prior knowledge of the case to get information more likely to be significant but also leave room for participants to provide other information they think is significant. I will conduct the interviews in person at the charity if possible so that I can observe facial expressions and body language. Otherwise I will conduct interviews over the phone. I will take notes from the interviews and record these in a case database. Analysis of the interviews should make it clear what the salient issues in my case are. These will be investigated further in the next phase of my case study - the 'focused exploration' phase (Pickard, 2013). The data collection techniques I use for this phase of research will depend on the type of phenomena I am investigating and the requirements of participants. A mixed methods approach will probably be used as different issues will require different data collection techniques. Focus groups and/or interviews will be used if the number of desired participants is small and the research is still exploratory. Questionnaires will be used if large numbers of responses are needed quickly and/or data is needed for quantitative analysis. The critical incident technique could be used to elicit in-depth examples. Log analysis will be used if I need to get an idea of the type of searches performed or to monitor the applying and/or configuring of metadata. And usability testing could be used to gain greater insight into participants' interaction with and attitudes towards the DAMS. Throughout the 'focused exploration' phase I will be allowing the discovery of new themes and 'leads' to guide my research design. I will be open to revisiting and/or abandoning themes. I will not let prior assumptions limit my analysis. The final phase of my case study will involve presenting my case study report (i.e. the data collected and my analysis of it) to participants so that they can check it. This will allow participants to correct anything they feel is unclear or erroneous. It will also allow them to provide any further information that should occur to them when looking at the data afresh. I will finish my research when no new information is being gathered and/or time constraints or the availability of participants brings it to an end. There are various problems that could occur with the methodology I have outlined. The limited number of participants (the sample is restricted to the charity's head office) might give a limited picture of the case. The

72

fact that I designed the system might lead to unintentional bias in my analysis. And the open-ended nature of a case study requires considerable flexibility on the part of participants. Work plan 18 - 22 May: sending off participant information and consent forms to the charity 18 May - 3 July: literature review 6 - 10 July: design of interviews to identify main issues 13 - 24 July: interviews 27 July - 7 August: analysis of interviews and design of 'focused exploration' phase 10 August - 3 September: 'focused exploration', iterative analysis 7 September - 2 October: data analysis and conclusions 5 - 16 October: member checking 19 October - 4 December: writing 7 December - 4 January: final checking and proofreading 8 January: submission Resources Travel costs: return trips to the charity's head office during the period of the dissertation using public transport within London Ethics Research Ethics Checklist School of Informatics BSc MSc/MA Projects If the answer to any of the following questions (1-3) is NO, your project needs to be modified. 1 Does your project pose only minimal and predictable risk to you (the student)? Yes 2 Does your project pose only minimal and predictable risk to other people affected by or

participating in the project? Yes

3 Is your project supervised by a member of academic staff of the School of Infomatics or another individual approved by the module leaders?

Yes

If the answer to either of the following questions (4-5) is YES, you MUST apply to the University Research Ethics Committee for approval. 4 Does your project involve animals? No 5 Does your project involve pregnant women or women in labour? No If the answer to the following question (6) is YES, you MUST complete the remainder of this form (7-19). If the answer is NO, you are finished. 6 Does your project involve human participants? For example, as interviewees, respondents to

a questionnaire or participants in evaluation or testing? Yes

If the answer to any of the following questions (7-13) is YES, you MUST apply to the Informatics Research Ethics Panel for approval and your application may be referred to the University Research Ethics Committee. 7 Could your project uncover illegal activities? No 8 Could your project cause stress or anxiety in the participants? No 9 Will you be asking questions of a sensitive nature? No 10 Does your project rely on covert observation of the participants? No 11 Does your project involve participants who are under the age of 18? No 12 Does your project involve adults who are vulnerable because of their social, psychological or

medical circumstances (vulnerable adults)? No

13 Does your project involve participants who have learning difficulties? No The following questions (14-16) must be answered YES, i.e. you MUST COMMIT to satisfy these conditions and have an appropriate plan to ensure they are satisfied.

73

14 Will you ensure that participants taking part in your project are fully informed about the purpose of the research?

Yes

15 Will you ensure that participants taking part in your project are fully informed about the procedures affecting them or affecting any information collected about them, including information about how the data will be used, to whom it will be disclosed, and how long it will be kept?

Yes

16 When people agree to participate in your project, will it be made clear to them that they may withdraw (i.e. not participate) at any time without any penalty?

Yes

The following questions (17-19) must be answered and the requested information provided. 17 Will consent be obtained from the participants in your project?

Consent from participants will be necessary if you plan to gather personal, medical or other sensitive data about them. 'Personal data' means data relating to an identifiable living person, e.g. data you collect using questionnaires, observations, interviews, computer logs. The person might be identifiable if you record their name, username, student ID, DNA, fingerprint, etc. If YES, provide the consent request form that you will use and indicate who will obtain the consent, how are you intending to arrange for a copy of the signed consent form for the participants, when will they receive it and how long the participants will have between receiving information about the study and giving consent, and when the filled consent request forms will be available for inspection (NOTE: subsequent failure to provide the filled consent request forms will automatically result in withdrawal of any earlier ethical approval of your project).

Yes

18 Have you made arrangements to ensure that material and/or private information obtained from or about the participating individuals will remain confidential? Provide details: I will only store data on password-protected devices. I will anonymise participants by using random codes instead of their names to identify them.

Yes

19 Will the research be conducted in the participant's home or other non-University location? If YES, provide details of how your safety will be preserved: I will conduct research at the charity's offices in London. I will stay safe by adhering to the health and safety regulations of the office.

Yes

I will obtain consent from participants in the project using the following form:

City University London

CONSENT FORM Title of Study: 'A Case Study of Digital Information Resources belonging to a charity.'

Please initial box

74

1. I agree to take part in the above City University London research project. I have had the project explained to me, and I have read the participant information sheet, which I may keep for my records. I understand this will involve:

• being interviewed by the researcher • analysis of my use of the digital asset management system using

computer logs I understand it might also involve:

• completing a questionnaire asking me about my use of digital resources and/or the digital asset management system

• making myself available for a further interview or focus group • using a computer to test the usability of a system

2. This information will be held and processed for the following purpose(s):

• completion of the case study I understand that any personal data I provide (e.g. data collected from interviews, questionnaires etc.) will be anonymised to prevent my identity from being made public. AND I understand that I will be given a transcript of data concerning me for my approval before it is included in the write-up of the research.

3. I understand that my participation is voluntary, that I can choose not to participate in part or all of the project, and that I can withdraw at any stage of the project without being penalized or disadvantaged in any way.

4. I agree to City University London recording and processing this information about me. I understand that this information will be used only for the purpose(s) set out in this statement and my consent is conditional on the University complying with its duties and obligations under the Data Protection Act 1998.

5. I agree to take part in the above study.

____________________ ____________________________ _____________ Name of Participant Signature Date ____________________ ____________________________ _____________ Name of Researcher Signature Date When completed, 1 copy for participant; 1 copy for researcher file.

75

I will send the form to participants on 18 May and give them two weeks to return them to me by email. I will sign the forms and send copies back to all the participants by post. I will ask for permission before publishing any screenshots of copyright material in my dissertation. Confidentiality My case study will anonymise both the charity and any participants in my research. However, all other details of my project will be made public. To keep the identities of the charity and participants confidential I will use codes instead of names to identify them. I will also blur any images that identify the charity if they are included in screenshots in my dissertation. Despite these measures, participants' identities will probably be obvious to other members of staff. Similarly, disguising the identity of the charity will probably be difficult as its activities are quite distinctive. Nevertheless, anonymity will prevent unequivocal identification by most readers of the dissertation. References

Al-Maskari, A. & Sanderson, M. 2010. A review of factors influencing user satisfaction in information retrieval. Journal of the American Society for Information Science and Technology, vol. 61, no. 5, pp. 859.

Angus, E., Thelwall, M. & Stuart, D. 2008. General patterns of tag usage among university groups in Flickr. Online Information Review, vol. 32, no. 1, pp. 89-101.

Baca, M. 2003. Practical Issues in Applying Metadata Schemas and Controlled Vocabularies to Cultural Heritage Information. Cataloging & Classification Quarterly, vol. 36, no. 3, pp. 47-55.

Belkin, N.J., Oddy, R.N. and Brooks, H.M. 1982. ASK for information retrieval: Part 1. Background and theory. Journal of Documentation, 38(2), 61-71.

Brodkin, J. 2007. You are wasting time. Find out why; The cost of ineffective search. Network World, Inc.

Broughton, V. 2005. Essential classification. London, Facet.

Broomfield, J. 2009. Digital asset management case study - Museum Victoria. Journal of Digital Asset Management, vol. 5, no. 3, pp. 116-125.

Chu, H., 2010. Information representation and retrieval in the digital age. Medford, N.J., Information Today.

Digital Asset Mangement Foundation, 2015. Digital Asset Management [online]. DAM Foundation. Available: http://damfoundation.org (accessed 15/5/15)

Dervin, B. 2005. What methodology does to theory: Sense-Making methodology as exemplar, in Fisher, K. E., Erdelez, S. and McKechnie, L. E. F. (eds) 2005, Theories of information behavior, Medford NJ, Information Today, pp. 25-30.

Enser, P. 2008. The evolution of visual information retrieval. Journal of Information Science, vol. 34, no. 4, pp. 531-546.

Haynes, D. 2004. Metadata for Information Management and Retrieval. London, Facet.

Hedges, M. 2015, Digital Asset Management in Theory and Practice. London, Facet.

Hider, P. 2008. How Much are Technical Services Worth?: Using the Contingent Valuation Method to Estimate the Added Value of Collection Management and Access. Library Resources & Technical Services, vol. 52, no. 4, pp. 254.

Hider, P. 2012. Information Resource Description: Creating and Managing Metadata. London, Facet.

76

Hjørland, B. & Nissen Pedersen, K. 2005. A substantive theory of classification for information retrieval. Journal of Documentation, vol. 61, no. 5, pp. 582-597.

Hollink, L., Schreiber, A.T., Wielinga, B.J. & Worring, M. 2004. Classification of user image descriptions. International Journal of Human - Computer Studies, vol. 61, no. 5, pp. 601-626.

Jansen, B.J. 2008. Searching for digital images on the web. Journal of Documentation, vol. 64, no. 1, pp. 81-101.

Jorgensen, C., Jaimes, A., Benitez, A.B. & Chang, S. 2001. A conceptual framework and empirical research for classifying visual descriptors. Journal of the American Society for Information Science and Technology, vol. 52, no. 11, pp. 938-947.

Keathley, E.F. 2014. Digital Asset Management [ebook]. Berkeley, CA, Apress.

Kho, N.D. 2007. Case study: A Case of Simplifying Digital Asset Management. Streaming Media Magazine, 78. Medford, Information Today.

Kuhlthau, C. C., 2005. Kuhlthau’s information search process, in Fisher, K. E., Erdelez, S. and McKechnie, L. E. F. (eds) 2005, Theories of information behavior, Medford NJ, Information Today, pp. 230-34.

Lambe, P. 2007. Organising knowledge: taxonomies, knowledge and organisational effectiveness. Oxford, Chandos

Lancaster, F.W. 2003. Indexing and Abstracting in Theory and Practice, 3rd edn. London, Facet.

Leckie, G. J., 2005. General Model of the Information Seeking of Professionals, K. E., Erdelez, S. and McKechnie, L. E. F. (eds), Theories of information behavior, Medford NJ, Information Today, pp. 158-64.

Lippell, H. 2015. Building a Corporate Taxonomy, in Schopflin, K. (ed.) 2015, A Handbook for Corporate Information Professionals. London, Facet, pp. 57-76.

Matusiak, K.K. 2006. Towards user-centered indexing in digital image collections. OCLC Systems & Services: International digital library perspectives, vol. 22, no. 4, pp. 283-298.

McGovern, M. 2013. Digital Asset Management: Where to Start. Curator: The Museum Journal, vol. 56, no. 2, pp. 237-254.

Morville, P. & Rosenfeld, L. 2006. Information architecture for the World Wide Web [ebook]. Farnham, O'Reilly.

Pickard, A.J. 2013. Research Methods in Information. London, Facet.

Roberts, H.E. 2001. A picture is worth a thousand words: Art indexing in electronic databases. Journal of the American Society for Information Science and Technology, vol. 52, no. 11, pp. 911-916.

Russell-Rose, T. & Tate, T. 2013. Designing the search experience: the information architecture of discovery. Amsterdam, Morgan Kaufmann.

Savolainen, R., 2009. Information use and information processing: comparison of conceptualisations. Journal of Documentation, 65(2), pp. 187-207.

Shatford, S. 1986. Analysing the subject of a picture: a theoretical approach. Cataloguing and Classification Quarterly, 6(3), pp. 39-62.

Slavic, A. 2011. Classification revisited: a web of knowledge, in Foster, A. and Rafferty, P. (eds.), 2011, Innovations in information retrieval. London, Facet, pp 23-48.

Stokes, D. & Seers, I. 2005. Case study: How digital asset management supports account management, creative services and workflows at Ogilvy. Journal of Digital Asset Management, vol. 1, no. 5, pp. 328-334.

Svenonious, E. 2000. The Intellectual Foundation of Information Organization. Cambridge, MA, MIT Press.

77

TechTarget, 2015. UX (User Experience) Definition [online]. TechTarget. Available: http://searchcio.techtarget.com/definition/UX-user-experience (accessed 15/5/15)

Third Light (2015). Metadata [online]. Third Light Ltd. Available: https://www.thirdlight.com/docs/display/ims6/Metadata (accessed 15/5/15) Resources

Bawden, D. 2015. Research methods: Surveys. INM356 Research, Evaluation and Communication Skills. City University London.

Hart, C. 2005. Doing your Masters Dissertation. London, Sage Publications.

78

APPENDIX L Reflection The dissertation project has been a challenging but rewarding part of my Masters course. I have learnt a lot about working with large digital image collections. I have also developed my knowledge of Third Light digital asset management software. And I have been able to assess the value of the metadata schema and controlled vocabularies that I developed for the charity during my internship and come up with further recommendations for how the charity can organise, manage and use their digital image collection. I have also learnt how to plan and organise a large-scale research project and gather and analyse qualitative data. A few changes were made to the project outlined in my proposal. Firstly, the scope of the case study widened to include all staff, not just staff at the head office. Secondly, my work plan changed - the literature review and data analysis phases took longer than I had originally planned. Thirdly, in the 'focused exploration' phase, I did research (metadata analysis and an image-tagging exercise) that I had not envisaged doing in my proposal. Nevertheless, writing a detailed project proposal was very helpful for planning and organising my research. The recommendations given by Pickard (2007, 85-94) for how to carry out case study research were particularly useful. The literature review was time consuming because I had to cover a broad range of topics. There were several areas that I wished to research in the 'orientation and overview' phase of my case study and so I covered all these areas in my literature review. Whilst this gave me a good grounding in the theoretical and practical issues underpinning these topics, it did require a lot of research into areas that in some cases did not turn out to be that significant for my research. I wondered with hindsight whether I should have started the 'orientation and overview' phase research first and then done the literature review only on the salient issues that I wanted to investigate further. This approach is often taken in grounded theory. However, it would have made it harder to design the 'orientation and overview' phase research. And it would also have prevented me from relating the results of the 'orientation and overview' phase research to theories from the literature. So I think I was probably right to do the literature review first. Whilst digital asset management effectively incorporates existing areas of library and information science research, I found it strange that one DAM journal ('Journal of Digital Asset Management') ceased publication in December 2010. Nevertheless I was able to find research in this area in other publications. With hindsight I would also have done more to encourage participants in the case study to influence the design of the case study. As Pickard (2007, 93) observes, in case studies, 'the role of the researcher is very much that of being a 'research instrument', interacting with the research community and allowing that community some degree of 'ownership' of the research.' For the initial interviews, I should have told participants that the purpose of the interviews was to get an idea of the salient issues of the case to encourage them to elaborate more on their answers and what they felt were important or interesting issues about the collection. I also think I could have asked more open questions during the initial interviews. I found the log analysis challenging to design and carry out as I had never read about this kind of research. Nevertheless, I found that the data could be analysed effectively using Excel. As I got used to Advanced Search, I found I could search for very specific sets of resources, which helped a lot in retrieving files for interpreting the results of the audit log and, later, for the metadata analysis. The group interview, which I did for the 'focused exploration' phase, made me realise the importance of rehearsing questions beforehand. For I felt that I wasted time by not being more articulate. I also found I was nervous about discussing managing the collection as I did not want to come across as prying or judgmental. However, with hindsight I should not have worried about this as participants were aware that I was conducting research and did not seem uncomfortable or embarrassed.