Folksonomies as Subject Access-- A Survey of Tagging in Library Online Catalogs and Discovery Layers

12
Folksonomies as Subject Access - A Survey of Tagging in Library Online Catalogs and Discovery Layers Yan Yi Lee & Sharon Q. Yang Abstract This paper 1 discusses a survey on how system vendors and libraries handled tagging in OPACs and discovery layers. Tags, otherwise called folksonomies, are user added subject metadata. This survey also studied user behavior when the user faced the choice to tag. The findings indicate that most legacy/classic systems have no tagging capability. About 47% of the discovery tools provide tagging function. About 49% of the libraries that have a system with tagging capability have turned the tagging function on in their OPACs and discovery tools. However, only 40% of the libraries that turned tagging on actually utilized user added subject metadata as access points to items in the OPAC. Academic library users are less active in tagging than public library users. 1. Introduction Folksonomy is “a term created by Thomas Vander Wal by combining taxonomy with folk(Steele, 2009). Simply put, folksonomy is a classification of resources created by the general public. Users add keywords, called tags, to describe a resource on the Web. The action of adding tags is called tagging. Tag cloud refers to the display of the accumulated tags as a way to access resources. Gene Smith describes a tag cloud as "a method of presenting tags where the more frequently used tags are emphasized, usually in size or color. Tag clouds tell you at a glance which tags are more popular. Each tag is a link" (Smith, 2008). A tag cloud is a visual subject classification scheme showing more popular or less important resources based on the font and color of terms. See Figure 1 for an example of a tag cloud in the test OPAC of Wagner College Library. Figure 1. Tag Cloud in test OPAC of Wagner College Library

Transcript of Folksonomies as Subject Access-- A Survey of Tagging in Library Online Catalogs and Discovery Layers

Folksonomies as Subject Access - A Survey of Tagging in Library

Online Catalogs and Discovery Layers

Yan Yi Lee & Sharon Q. Yang

Abstract

This paper

1 discusses a survey on how system vendors and libraries handled tagging in OPACs

and discovery layers. Tags, otherwise called folksonomies, are user added subject metadata. This

survey also studied user behavior when the user faced the choice to tag. The findings indicate

that most legacy/classic systems have no tagging capability. About 47% of the discovery tools

provide tagging function. About 49% of the libraries that have a system with tagging capability

have turned the tagging function on in their OPACs and discovery tools. However, only 40% of

the libraries that turned tagging on actually utilized user added subject metadata as access points

to items in the OPAC. Academic library users are less active in tagging than public library users.

1. Introduction

Folksonomy is “a term created by Thomas Vander Wal by combining taxonomy with folk”

(Steele, 2009). Simply put, folksonomy is a classification of resources created by the general

public. Users add keywords, called tags, to describe a resource on the Web. The action of adding

tags is called tagging. Tag cloud refers to the display of the accumulated tags as a way to access

resources. Gene Smith describes a tag cloud as "a method of presenting tags where the more

frequently used tags are emphasized, usually in size or color. Tag clouds tell you at a glance

which tags are more popular. Each tag is a link" (Smith, 2008). A tag cloud is a visual subject

classification scheme showing more popular or less important resources based on the font and

color of terms. See Figure 1 for an example of a tag cloud in the test OPAC of Wagner College

Library.

Figure 1. Tag Cloud in test OPAC of Wagner College Library

Both librarians and computer scientists became interested in tagging and tag clouds from their

inception. In the last five years, there were many studies comparing user-created tags with

controlled vocabularies, especially the Library of Congress Subject Headings (LCSH). In 2007,

Tiffany Smith compared the LC subject headings in five books to tags for the same books found

in LibraryThing. Even though she had set out to measure the efficacy of tagging as subject

access, she did not reach any concrete conclusions. Since then, there have been many large scale

studies to compare LCSH with tags in LibraryThing. Those include the studies by Heymann and

Garcia-Molina (2009), Lawson (2009), Peterson (2009), Rolla (2009), Wetterstrom (2008),

Thomas (2009), Lu, Park, and Hu (2010), just to name a few. Methodology of most research in

this area included extracting titles, ISBNs, and LC Subject Headings in MARC 650 fields of

OCLC or LC bibliographic records and searching the same books in LibraryThing by ISBN. The

LCSH and tags for the same books were compared for duplication, quality, coverage, and

effectiveness.

Thus far, all the research, either by librarians or computer scientists, has been positive about

tagging. Even though there sometimes may be up to 60% overlap, the findings indicate that

folksonomies often use different terms from LCSH and can provide additional subject access to

library collections (Kwan and Chan, 2009). User generated tags may cover more aspects of a

book’s subject (Rolla, 2009). Linking folksonomies to LCSH has been deemed by Kwan and

Chan (2009) as a helpful aid. Even the Library of Congress Working Group on the Future of

Bibliographic Control “has suggested that libraries should open up their catalogs to allow users

to add descriptive tags to the bibliographic data in catalog records” (Rolla, 2009).

It has been almost a year since the last study which yielded positive findings about tagging as

a viable subject access point to compliment LCSH and other controlled vocabularies in online

catalogs. What actions have vendors taken to incorporate tagging into library systems? What

have libraries done to add folksonomies in addition to LCSH in bibliographic records? How do

users respond to tagging capability in library catalogs and discovery tools? This paper aims to

answer those questions by conducting a survey of library systems, libraries, and tagging

activities by users.

2. Library Systems and Folksonomies

Have system vendors taken folksonomies into consideration when designing catalogs and

discovery tools? In order to find out how the current library systems handle folksonomies, the

authors used Marshall Breeding’s Technology Guide (Library Technology Guides-Discovery

Layer Interfaces, 2012) to obtain a comprehensive list of major library systems with an OPAC,

including 37 Integrated Library Systems (ILS) and 15 discovery tools (also referred to as next

generation catalogs). An extensive study of all the 37 major ILS systems revealed that only two

ILS OPACs, Koha and Genesis G3, allow users to add tags and only Koha uses tags to enhance

subject access. See Table 1 for a list of the ILS.

Table 1. ILS OPACs

Library Automation

System

Allow Users to

add tags

Tag

Cloud

Tag

List

Tag to start a

new search

Tag to refine

a search

1 Agent VERSO - - - - -

2 Aleph 500 - - - - -

3 Alexandria - - - - -

4 Amlib - - - - -

5 Apollo - - - - -

6 Athena - - - - -

7 Atriuum - - - - -

8 Carl.X - - - - -

9 Circulation Plus - - - - -

10 Concourse - - - - -

11 DB/TextWorks - - - - -

12 Destiny - - - - -

13 Dynix - - - - -

14 EOS Web - - - - -

15 Evergreen - - - - -

16 Evolve - - - - -

17 Genesis G3 √ - - - -

18 GLAS - - - - -

19 Horizon - - - - -

20 InfoCentre - - - - -

21 Innopac - - - - -

22 Koha √ √ - √ -

23 Liberty3 - - - - -

24 Library Solution - - - - -

25 LibraryWorld - - - - -

26 Mandarin M3 - - - - -

27 Millennium - - - - -

28 OPALS - - - - -

29 Polaris - - - - -

30 Portfolio - - - - -

31 ResourceMate - - - - -

32 Spydus - - - - -

33 Unicorn (Symphony) - - - - -

34 Virtua - - - - -

35 Voyager - - - - -

36 Vubis Smart - - - - -

37 Winnebago Spectrum - - - - -

Total 5.41% 2.70% 0.00% 2.70% 0.00%

Only about 5% of the major library systems allow tagging. It comes as no surprise that most

legacy or classic ILS do not embed folksonomies as they were developed in the 1990s when

folksonomy was not yet popular. User contributed tags and tag lists or clouds more often exist in

the newer ILS such as Koha, which was created in 1999 (History-Official Website of Koha

Library Software, 2012).

However, within the last five years, discovery tools have come into play. A discovery tool is a

stand-alone catalog with the advanced features of a next generation catalog (NGC) that is

developed independently from any ILS. Libraries can use a discovery tool to replace its OPAC or

use it side by side with the OPAC. The following is a list of 15 major discovery tools that have

been deployed worldwide. The authors randomly chose ten examples from user lists for each

discovery layer compiled by Marshall Breeding (Library Technology Guides-Discovery Layer

Interfaces, 2012) and examined presence and absence of folksonomies and how they are being

used as subject access in the chosen implementations. Observing a system in action will shed

light on its design. The documentation for a system is consulted for clarification when necessary.

The findings are summarized in Table 2.

Table 2. Folksonomies within Discovery Tools

Systems Allow Users

to add tags

Tag Cloud Tag list Tag to start a

new search

Tag to

refine a

search

1 AquaBrowser √ √ √ √ √

2 AXIELL ARENA - - - - -

3 Blacklight - - - - -

4 Biblio Commons √ - √ √ √

5 EBSCO Discover Service - - - - -

6 Encore √ √ √ √ √

7 Endeca - - - - -

8 Enterprise - - - - -

9 Primo √ √ √ √ -

10 Scriblio - - - - -

11 Summon - - - - -

12 SOPAC √ √ - √ -

13 Visualizer - - - - -

14 VuFind √ No √ √ -

15 WorldCat Local √ √ √ √ -

Total 47% 33% 40% 47% 20%

Only 7 out of 15 discovery tools (about 47%) provide tagging function. Only one third of those

discovery tools are capable of displaying tags as a cloud with visual representation by fonts and

size based on frequency of use and popularity. About 40% can display a tag list similar to the

clickable Library of Congress subject headings. Moreover, about 47% of the discovery tools use

tags to execute a new search, while only 20% use tags to refine or narrow a search. A system that

allows users to add tags may not necessarily provide tag cloud or list as subject access.

Among discovery tools studied, Encore, Biblio Commons, and AquaBrowser have the most

tagging features. Encore can choose to display either a tag cloud or a tag list to refine or narrow a

search. Its tag cloud or list is a mixture of user added tags and keywords from bibliographic

records. Such keywords are not tags and using keywords as tags is almost an act of cheating, but

doing so can ensure that the tag cloud is always present with an abundance of "tags". The user

added tags are also displayed separately under “Community Tags” in a record view and will

execute a new search to retrieve all the items being tagged. AquaBrowser displays “a word cloud”

for variant spellings, related words/associations, and synonyms, and its cloud is not based on

user added tags but is system generated. In AquaBrowser, the real tag cloud contributed by a user

remains private within a user’s login account. Biblio Commons displays tags at the start page for

searching and later as a facet to refine or narrow a search. All three of these discovery tools excel

at utilizing folksonomies as additional subject access to collections.

3. Libraries and Folksonomies

When an ILS OPAC or discovery tool had the tagging capability, did libraries take advantage of

this function? The authors chose the Koha OPAC as an example and did a survey of tagging

activities in the OPACs of 307 Koha implementers.

Koha is an open source ILS that is widely used in libraries all over the world. Tagging is one

of the important features in its system design. After adding tags, users can choose to keep these

tags private and hidden in their account, or publish them in the OPAC as a “Cloud”.

Subsequently, librarians can decide to turn a “Tag Cloud” on or off. Additionally, librarians can

also decide whether the tags created by users can be published in the OPAC directly or must be

approved by librarians before publishing. An external dictionary can be installed in the Koha

system which serves as a “whitelist” of pre-allowed tags and helps librarians to verify terms

added by users.

The 307 libraries, including 218 public, 62 academic, and 27 school libraries, are taken from

Library Technology Guide (Library Technology Guides-Discovery Layer Interfaces, 2012) for

the survey. It is the most comprehensive and complete list of Koha users published so far. The

first step in the survey was to check 307 OPACs to determine how many have enabled tagging.

Table 3 is a breakdown of the 307 Koha users by library type based on presence or absence of

tagging function in the OPAC. Figure 2 displays the same statistics by bar chart.

Tags are enabled in 107 public libraries, almost half of the total 218. At 58%, the percentage

of tags enabled is higher for academic libraries. But fewer school libraries, only 22%, allow users

to create tags. On average, 49% of libraries allow their users to create tags, or add their own

subject terms for library materials while 51% of libraries turned off the tagging function.

Table 3. Tagging in 307 Koha OPACs by Type of Libraries

Library Type Total Libraries Total Libraries (Tags Enabled)

Percentage (Tags Enabled)

Total Libraries (Tags Disabled)

Percentage (Tags

Disabled)

Public 218 107 49.08% 111 50.92%

Academic 62 36 58.06% 26 41.94%

School 27 6 22.22% 21 77.78%

All Libraries 307 149 48.53% 158 51.47%

Figure 2. Tagging in 307 Koha OPACs by Library Type

4. Users and folksonomies

How often did users take advantage of the opportunity for tagging? Around 50% of the sample

libraries allowed users to add their tags to Koha online catalog. In some libraries, users added

their tags to catalogs actively and created “large clouds”. But in other libraries, users added only

a few tags.

Tag clouds by the 307 libraries are grouped into 4 categories: large cloud, small cloud, empty

cloud, and no cloud. A “large cloud” includes over 50 tags, and a “small cloud” includes less

than 50 tags. An “empty cloud” has no tags in it which indicates that users did not add any tags

even with the tagging function enabled. The last category “no cloud” means that librarians did

not turned on tagging in system.

Table 4 is a summary of 149 libraries that turned on tagging in Table 3 and Figure 2. About

40% have large clouds while 46% have small clouds and 14% have empty clouds. Large clouds

can be used as subject access to collections while small and empty clouds are generally useless.

Authors found that users in 40% of the sample libraries are interested in tagging and trying to

describe library resources in their own language. They are trying to build their own access points

in library catalogs. In 60% of the sample libraries, users did not pay attention to tagging. They

may even not be aware of the existence of tagging capability in a catalog.

Table 4. Tag clouds by size in 149 libraries that have enabled tagging

Table 5 is a summary of tag clouds of all the 307 libraries. About 51% of libraries turned off

tagging, and therefore, have no tag clouds. 22% of libraries have small insignificant clouds,

which are almost useless as subject access. Only 20% of them have large clouds that are

relatively effective in retrieving materials. Thus, we may safely conclude that around 20% of

0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%

Percentage (TagsEnabled)

Percentage (TagsDisabled)

Cloud Size Number of libraries Percentages

Large Cloud 60 40%

Small Cloud 68 46%

Empty Cloud 21 14%

Total 149 100%

libraries are using tags as subject search keys. Some tag clouds reside in user accounts behind

login and some are publicly available in OPACs. Even though almost half of the 307 Koha

libraries encouraged users to participate in tagging by turning the function on, 7% of the libraries

did not receive any tags from users. Figure 3 is a graphical summary of the same data as in Table

5.

Table 5. Tag Clouds in 307 Koha Libraries

Library Type

Total Libraries

Tag Cloud > 50 tags

Percentage > 50 tags

Tag Cloud < 50 tags

Percentage < 50 tags

Tag Cloud zero tag

Percentage zero tag

Tag Cloud turned off

Percentage no tag cloud

Public 218 58 26.61% 38 17.43% 11 5.05% 111 50.91%

Academic 62 2 3.23% 27 43.55% 7 11.29% 26 41.93%

School 27 0 0.00% 3 11.11% 3 11.11% 21 77.78%

All 307 60 19.54% 68 22.15% 21 6.84% 158 51.47%

Figure 3. Tag Cloud in 307 Koha Libraries

Figure 4 is a comparison by library type. Even though more academic libraries allow their users

to add tags (58% vs. 48% for public libraries), our study found very few large clouds in

academic OPACs. Academic library users are not so active in adding tags to catalogs. In contrast,

users in public libraries are more active in adding and using tags, which led to more large clouds.

20%

22%

7%

51%

Tag Cloud in 307 Libaraies

Large Tag Cloud(over 50 tags)

Small Tag Cloud(less than 50 tags)

Empty Tag Cloud(no tags)

Tag Cloud notturned on

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

Public Academic School

Percentage (> 50tags)

Percentage (< 50tags)

Percentage (notags)

Figure 4. Tag Cloud in Koha Libraries – A Comparison

4.1 Tag Cloud in Public Libraries

Public library users are more active than academic or school library users in adding tags to online

catalog. Figure 5 describes tag clouds in 218 public libraries. Around 27% of 218 public libraries

have large tag clouds so that patrons can use them to search the entire catalog. Seventeen percent

have small tag clouds though some small tag clouds include only one or two tags which are not

useful. Five percent of the public libraries do not have any user added tags and the rest (about

51%) did not enable tagging in their system.

Figure 5. Tag Clouds in 218 Public Libraries

Some tags are very close to subjects or keywords, such as “nature”, “religion”, or “web 2.0”. But

most tags created by public library users describe resources for the use of certain communities,

or only themselves, such as “Summer Reading Club”, “Toddler Time”, or “Great Movies”.

4.2 Tag Cloud in Academic Libraries

Most academic library users are not creating large tag clouds. Figure 6 illustrates tag clouds in 62

academic libraries. 44% of them have small tag clouds, and 11% have an empty tag cloud. Forty-

two percent of 62 academic libraries did not turn on tagging. Only a small portion of academic

libraries, 3%, have large tag clouds, which could be used for searching in catalog.

27%

17%

5%

51%

Tag Cloud in Public Libraries

Large Tag Cloud(over 50 tags)

Small Tag Cloud(less than 50 tags)

Empty Tag Cloud(no tags)

Tag Cloud notturned on

Figure 6. Tag Cloud in 62 Academic Libraries

4.3 Tag Cloud in School Libraries

Figure 7 shows tag clouds in school libraries. The majority of school libraries did not turn on

tagging. Only 11% of school libraries have small tag clouds. Each “cloud” includes one or two

tags at most. Essentially, school libraries do not use tagging at all. More research is needed to

look into the reason behind this phenomenon.

Figure 7. Tag Cloud in 27 School Libraries

5. Conclusion

3%

44%

11%

42%

Tag Cloud in Academic Libraries

Large Tag Cloud (over50 tags)

Small Tag Cloud (lessthan 50 tags)

Empty Tag Cloud (notags)

Tag Cloud not turnedon

0%

11%

11%

78%

Tag Cloud in School Libraries

Large Tag Cloud (over50 tags)

Small Tag Cloud (lessthan 50 tags)

Empty Tag Cloud (notags)

Tag Cloud not turnedon

Research provided evidence in support of folksonomies as a viable alternate method of subject

access to resources. Very few legacy or classic ILSs are capable of this function though the open

source ILS, Koha, is an exception. Only half of newly developed discovery tools (47%) allow

tagging. To get a glimpse of how much libraries are taking advantage of a system with tagging

capability, the authors examined libraries with Koha and found that only half (49%) enabled

tagging. Among the Koha libraries that enabled tagging, only 40% have large tag clouds with

over 50 tags that appear useful, while 46% do not have adequately meaningful clouds (small

clouds with fewer than 50 tags) and 14% have given users the opportunity to add tags, but users

did not show any interest, thus resulting in empty clouds.

Based on the above findings, the authors recommend that more vendors should add tagging

capability in the future release of new systems to get users better access to collections. Libraries

should find ways to more aggressively promote tagging activities. Research should be done to

investigate why half of the libraries do not allow users to add tags and also why academic library

users are less interested in tagging than public library users. Tagging is a Web 2.0 phenomenon

where user participation is anticipated, and users should even be encouraged to share their

wisdom in cataloguing.

Notes

1. An abridged version of this paper was published as: Yang, Sharon Q., “Tagging for Subject

Access”, Computers in Libraries, v. 32, no. 9, Nov. 2012

References

Heymann, Paul and Hector Garcia-Molina. (2009). “Contrasting Controlled Vocabulary and

Tagging: Do Experts Choose the Right Names to Label the Wrong Things?”

Paper presented at the Second ACM International Conference on Web Search and Web

Data Mining (WSDM ’09), Barcelona, Spain, February 9-12, 2009. Accessed March 20,

2012. http://ilpubs.stanford.edu:8090/955/1/cvuv-lbrp.pdf.

History-Official Website of Koha Library Software, maintained by Koha Library Software

Community.Accessed April 28, 2012. http://koha-community.org/about/history/

Kwan, Yi and Lois Mai Chan. (2009). “Linking Folksonomy to Library of Congress Subject

Headings: An Exploratory Study.” Journal of Documentation, 65(6), 872-900.

Lawson, Karen G. (2009). “Mining Social Tagging Data for Enhanced Subject Access for

Readers and Researchers.” Journal of Academic Librarianship, 35(6), 574-582.

Library of Congress Working Group on the Future of Bibliographic Control (2008). On the

Record: Report of the Library of Congress Working Group on the Future of

Bibliographic Control. Accessed Feb. 23, 2012. http://www.loc.gov/bibliographic-

future/news/lcwg-ontherecord-jan08-final.pdf

Library Technology Guides-Discovery Layer Interfaces, maintained by Marshall Breeding.

Accessed March 6, 2012. http://www.librarytechnology.org/web/Breeding/guides/

Lu, Caimei, Jung-ran Park, and Xiaohua Hu. (2010). “User Tags Versus Expert-assigned

Subject Terms: A Comparison of LibraryThing Tags and Library of Congress Subject

Headings.” Journalof Information Science, 36(6), 763-779.

doi:10.1177/0165551510386173

Peterson, Elaine. (2009). “Patron Preferences for Folksonomy Tags: Research Findings When

Both Hierarchical Subject Headings and Folksonomy Tags Are Used.” Evidence Based

Library & Information Practice, 4(1), 53-56.

Rolla, Peter J. (2009). “User Tags versus Subject Headings: Can User-Supplied Data Improve

Subject Access to Library Collections?” Library Resources & Technical Services, 53(3),

174-184.

Smith, Gene. (2008). Tagging: People-powered Metadata for the Social Web. Berkeley, CA:

New Riders.

Smith, Tiffany. (2007). “Cataloging and You: Measuring the Efficacy of A Folksonomy for

Subject Analysis.” Paper presented at the 18th Workshop of the American Society for

Information Science and Technology Special Interest Group in Classification Research,

Milwaukee, Wisconsin. Accessed March 20, 2012.

http://arizona.openrepository.com/arizona/handle/10150/106434

Steele, Tom. (2009). “The New Cooperative Cataloging.” Library Hi Tech, 27(1), 68-77.

Thomas, Marliese, Dana M. Caudle, and Cecilia M. Schmitz. (2009). “To Tag Or Not To Tag?”

Library Hi Tech, 27(3), 411-434.

Wetterstrom, Mikael. (2008). “The Complementarity of Tags and LCSH — A Tagging

Experiment and Investigation into Added Value in a New Zealand Library Context.” New

Zealand Library & Information Management Journal, 50(4), 292-306.

Author Biographies

Yan Yi Lee

Yan Yi Lee works as Systems/Cataloging Librarian in Wagner College Library, New York, USA.

She is the supervisor of the Technical Services Department. She received her MLS from the Pratt

Institute in New York and her MS in Computer Engineering from New Jersey Institute of

Technology. She also served as the chair of the WALDO Technical Services Committee between

2010 and 2012. Her research interests include library automation, open source, the Semantic

Web, and linked data. Please contact her at [email protected].

Sharon Q. Yang

Dr. Sharon Q. Yang works as Associate Professor and Systems Librarian in Rider University

Moore Library, New Jersey, USA. She received her MS in 1988, Certificate for Advanced

Librarianship in 1989, and DLS in 1997, all from Columbia University, New York City. Her

research interests include next generation catalog, the Semantic Web, library systems, and

assessment of information literacy skills. Please contact her at [email protected].