Corpus-Based Query Expansion in Online Public Access Catalogs
Folksonomies as Subject Access-- A Survey of Tagging in Library Online Catalogs and Discovery Layers
Transcript of Folksonomies as Subject Access-- A Survey of Tagging in Library Online Catalogs and Discovery Layers
Folksonomies as Subject Access - A Survey of Tagging in Library
Online Catalogs and Discovery Layers
Yan Yi Lee & Sharon Q. Yang
Abstract
This paper
1 discusses a survey on how system vendors and libraries handled tagging in OPACs
and discovery layers. Tags, otherwise called folksonomies, are user added subject metadata. This
survey also studied user behavior when the user faced the choice to tag. The findings indicate
that most legacy/classic systems have no tagging capability. About 47% of the discovery tools
provide tagging function. About 49% of the libraries that have a system with tagging capability
have turned the tagging function on in their OPACs and discovery tools. However, only 40% of
the libraries that turned tagging on actually utilized user added subject metadata as access points
to items in the OPAC. Academic library users are less active in tagging than public library users.
1. Introduction
Folksonomy is “a term created by Thomas Vander Wal by combining taxonomy with folk”
(Steele, 2009). Simply put, folksonomy is a classification of resources created by the general
public. Users add keywords, called tags, to describe a resource on the Web. The action of adding
tags is called tagging. Tag cloud refers to the display of the accumulated tags as a way to access
resources. Gene Smith describes a tag cloud as "a method of presenting tags where the more
frequently used tags are emphasized, usually in size or color. Tag clouds tell you at a glance
which tags are more popular. Each tag is a link" (Smith, 2008). A tag cloud is a visual subject
classification scheme showing more popular or less important resources based on the font and
color of terms. See Figure 1 for an example of a tag cloud in the test OPAC of Wagner College
Library.
Figure 1. Tag Cloud in test OPAC of Wagner College Library
Both librarians and computer scientists became interested in tagging and tag clouds from their
inception. In the last five years, there were many studies comparing user-created tags with
controlled vocabularies, especially the Library of Congress Subject Headings (LCSH). In 2007,
Tiffany Smith compared the LC subject headings in five books to tags for the same books found
in LibraryThing. Even though she had set out to measure the efficacy of tagging as subject
access, she did not reach any concrete conclusions. Since then, there have been many large scale
studies to compare LCSH with tags in LibraryThing. Those include the studies by Heymann and
Garcia-Molina (2009), Lawson (2009), Peterson (2009), Rolla (2009), Wetterstrom (2008),
Thomas (2009), Lu, Park, and Hu (2010), just to name a few. Methodology of most research in
this area included extracting titles, ISBNs, and LC Subject Headings in MARC 650 fields of
OCLC or LC bibliographic records and searching the same books in LibraryThing by ISBN. The
LCSH and tags for the same books were compared for duplication, quality, coverage, and
effectiveness.
Thus far, all the research, either by librarians or computer scientists, has been positive about
tagging. Even though there sometimes may be up to 60% overlap, the findings indicate that
folksonomies often use different terms from LCSH and can provide additional subject access to
library collections (Kwan and Chan, 2009). User generated tags may cover more aspects of a
book’s subject (Rolla, 2009). Linking folksonomies to LCSH has been deemed by Kwan and
Chan (2009) as a helpful aid. Even the Library of Congress Working Group on the Future of
Bibliographic Control “has suggested that libraries should open up their catalogs to allow users
to add descriptive tags to the bibliographic data in catalog records” (Rolla, 2009).
It has been almost a year since the last study which yielded positive findings about tagging as
a viable subject access point to compliment LCSH and other controlled vocabularies in online
catalogs. What actions have vendors taken to incorporate tagging into library systems? What
have libraries done to add folksonomies in addition to LCSH in bibliographic records? How do
users respond to tagging capability in library catalogs and discovery tools? This paper aims to
answer those questions by conducting a survey of library systems, libraries, and tagging
activities by users.
2. Library Systems and Folksonomies
Have system vendors taken folksonomies into consideration when designing catalogs and
discovery tools? In order to find out how the current library systems handle folksonomies, the
authors used Marshall Breeding’s Technology Guide (Library Technology Guides-Discovery
Layer Interfaces, 2012) to obtain a comprehensive list of major library systems with an OPAC,
including 37 Integrated Library Systems (ILS) and 15 discovery tools (also referred to as next
generation catalogs). An extensive study of all the 37 major ILS systems revealed that only two
ILS OPACs, Koha and Genesis G3, allow users to add tags and only Koha uses tags to enhance
subject access. See Table 1 for a list of the ILS.
Table 1. ILS OPACs
Library Automation
System
Allow Users to
add tags
Tag
Cloud
Tag
List
Tag to start a
new search
Tag to refine
a search
1 Agent VERSO - - - - -
2 Aleph 500 - - - - -
3 Alexandria - - - - -
4 Amlib - - - - -
5 Apollo - - - - -
6 Athena - - - - -
7 Atriuum - - - - -
8 Carl.X - - - - -
9 Circulation Plus - - - - -
10 Concourse - - - - -
11 DB/TextWorks - - - - -
12 Destiny - - - - -
13 Dynix - - - - -
14 EOS Web - - - - -
15 Evergreen - - - - -
16 Evolve - - - - -
17 Genesis G3 √ - - - -
18 GLAS - - - - -
19 Horizon - - - - -
20 InfoCentre - - - - -
21 Innopac - - - - -
22 Koha √ √ - √ -
23 Liberty3 - - - - -
24 Library Solution - - - - -
25 LibraryWorld - - - - -
26 Mandarin M3 - - - - -
27 Millennium - - - - -
28 OPALS - - - - -
29 Polaris - - - - -
30 Portfolio - - - - -
31 ResourceMate - - - - -
32 Spydus - - - - -
33 Unicorn (Symphony) - - - - -
34 Virtua - - - - -
35 Voyager - - - - -
36 Vubis Smart - - - - -
37 Winnebago Spectrum - - - - -
Total 5.41% 2.70% 0.00% 2.70% 0.00%
Only about 5% of the major library systems allow tagging. It comes as no surprise that most
legacy or classic ILS do not embed folksonomies as they were developed in the 1990s when
folksonomy was not yet popular. User contributed tags and tag lists or clouds more often exist in
the newer ILS such as Koha, which was created in 1999 (History-Official Website of Koha
Library Software, 2012).
However, within the last five years, discovery tools have come into play. A discovery tool is a
stand-alone catalog with the advanced features of a next generation catalog (NGC) that is
developed independently from any ILS. Libraries can use a discovery tool to replace its OPAC or
use it side by side with the OPAC. The following is a list of 15 major discovery tools that have
been deployed worldwide. The authors randomly chose ten examples from user lists for each
discovery layer compiled by Marshall Breeding (Library Technology Guides-Discovery Layer
Interfaces, 2012) and examined presence and absence of folksonomies and how they are being
used as subject access in the chosen implementations. Observing a system in action will shed
light on its design. The documentation for a system is consulted for clarification when necessary.
The findings are summarized in Table 2.
Table 2. Folksonomies within Discovery Tools
Systems Allow Users
to add tags
Tag Cloud Tag list Tag to start a
new search
Tag to
refine a
search
1 AquaBrowser √ √ √ √ √
2 AXIELL ARENA - - - - -
3 Blacklight - - - - -
4 Biblio Commons √ - √ √ √
5 EBSCO Discover Service - - - - -
6 Encore √ √ √ √ √
7 Endeca - - - - -
8 Enterprise - - - - -
9 Primo √ √ √ √ -
10 Scriblio - - - - -
11 Summon - - - - -
12 SOPAC √ √ - √ -
13 Visualizer - - - - -
14 VuFind √ No √ √ -
15 WorldCat Local √ √ √ √ -
Total 47% 33% 40% 47% 20%
Only 7 out of 15 discovery tools (about 47%) provide tagging function. Only one third of those
discovery tools are capable of displaying tags as a cloud with visual representation by fonts and
size based on frequency of use and popularity. About 40% can display a tag list similar to the
clickable Library of Congress subject headings. Moreover, about 47% of the discovery tools use
tags to execute a new search, while only 20% use tags to refine or narrow a search. A system that
allows users to add tags may not necessarily provide tag cloud or list as subject access.
Among discovery tools studied, Encore, Biblio Commons, and AquaBrowser have the most
tagging features. Encore can choose to display either a tag cloud or a tag list to refine or narrow a
search. Its tag cloud or list is a mixture of user added tags and keywords from bibliographic
records. Such keywords are not tags and using keywords as tags is almost an act of cheating, but
doing so can ensure that the tag cloud is always present with an abundance of "tags". The user
added tags are also displayed separately under “Community Tags” in a record view and will
execute a new search to retrieve all the items being tagged. AquaBrowser displays “a word cloud”
for variant spellings, related words/associations, and synonyms, and its cloud is not based on
user added tags but is system generated. In AquaBrowser, the real tag cloud contributed by a user
remains private within a user’s login account. Biblio Commons displays tags at the start page for
searching and later as a facet to refine or narrow a search. All three of these discovery tools excel
at utilizing folksonomies as additional subject access to collections.
3. Libraries and Folksonomies
When an ILS OPAC or discovery tool had the tagging capability, did libraries take advantage of
this function? The authors chose the Koha OPAC as an example and did a survey of tagging
activities in the OPACs of 307 Koha implementers.
Koha is an open source ILS that is widely used in libraries all over the world. Tagging is one
of the important features in its system design. After adding tags, users can choose to keep these
tags private and hidden in their account, or publish them in the OPAC as a “Cloud”.
Subsequently, librarians can decide to turn a “Tag Cloud” on or off. Additionally, librarians can
also decide whether the tags created by users can be published in the OPAC directly or must be
approved by librarians before publishing. An external dictionary can be installed in the Koha
system which serves as a “whitelist” of pre-allowed tags and helps librarians to verify terms
added by users.
The 307 libraries, including 218 public, 62 academic, and 27 school libraries, are taken from
Library Technology Guide (Library Technology Guides-Discovery Layer Interfaces, 2012) for
the survey. It is the most comprehensive and complete list of Koha users published so far. The
first step in the survey was to check 307 OPACs to determine how many have enabled tagging.
Table 3 is a breakdown of the 307 Koha users by library type based on presence or absence of
tagging function in the OPAC. Figure 2 displays the same statistics by bar chart.
Tags are enabled in 107 public libraries, almost half of the total 218. At 58%, the percentage
of tags enabled is higher for academic libraries. But fewer school libraries, only 22%, allow users
to create tags. On average, 49% of libraries allow their users to create tags, or add their own
subject terms for library materials while 51% of libraries turned off the tagging function.
Table 3. Tagging in 307 Koha OPACs by Type of Libraries
Library Type Total Libraries Total Libraries (Tags Enabled)
Percentage (Tags Enabled)
Total Libraries (Tags Disabled)
Percentage (Tags
Disabled)
Public 218 107 49.08% 111 50.92%
Academic 62 36 58.06% 26 41.94%
School 27 6 22.22% 21 77.78%
All Libraries 307 149 48.53% 158 51.47%
Figure 2. Tagging in 307 Koha OPACs by Library Type
4. Users and folksonomies
How often did users take advantage of the opportunity for tagging? Around 50% of the sample
libraries allowed users to add their tags to Koha online catalog. In some libraries, users added
their tags to catalogs actively and created “large clouds”. But in other libraries, users added only
a few tags.
Tag clouds by the 307 libraries are grouped into 4 categories: large cloud, small cloud, empty
cloud, and no cloud. A “large cloud” includes over 50 tags, and a “small cloud” includes less
than 50 tags. An “empty cloud” has no tags in it which indicates that users did not add any tags
even with the tagging function enabled. The last category “no cloud” means that librarians did
not turned on tagging in system.
Table 4 is a summary of 149 libraries that turned on tagging in Table 3 and Figure 2. About
40% have large clouds while 46% have small clouds and 14% have empty clouds. Large clouds
can be used as subject access to collections while small and empty clouds are generally useless.
Authors found that users in 40% of the sample libraries are interested in tagging and trying to
describe library resources in their own language. They are trying to build their own access points
in library catalogs. In 60% of the sample libraries, users did not pay attention to tagging. They
may even not be aware of the existence of tagging capability in a catalog.
Table 4. Tag clouds by size in 149 libraries that have enabled tagging
Table 5 is a summary of tag clouds of all the 307 libraries. About 51% of libraries turned off
tagging, and therefore, have no tag clouds. 22% of libraries have small insignificant clouds,
which are almost useless as subject access. Only 20% of them have large clouds that are
relatively effective in retrieving materials. Thus, we may safely conclude that around 20% of
0.00%10.00%20.00%30.00%40.00%50.00%60.00%70.00%80.00%90.00%
Percentage (TagsEnabled)
Percentage (TagsDisabled)
Cloud Size Number of libraries Percentages
Large Cloud 60 40%
Small Cloud 68 46%
Empty Cloud 21 14%
Total 149 100%
libraries are using tags as subject search keys. Some tag clouds reside in user accounts behind
login and some are publicly available in OPACs. Even though almost half of the 307 Koha
libraries encouraged users to participate in tagging by turning the function on, 7% of the libraries
did not receive any tags from users. Figure 3 is a graphical summary of the same data as in Table
5.
Table 5. Tag Clouds in 307 Koha Libraries
Library Type
Total Libraries
Tag Cloud > 50 tags
Percentage > 50 tags
Tag Cloud < 50 tags
Percentage < 50 tags
Tag Cloud zero tag
Percentage zero tag
Tag Cloud turned off
Percentage no tag cloud
Public 218 58 26.61% 38 17.43% 11 5.05% 111 50.91%
Academic 62 2 3.23% 27 43.55% 7 11.29% 26 41.93%
School 27 0 0.00% 3 11.11% 3 11.11% 21 77.78%
All 307 60 19.54% 68 22.15% 21 6.84% 158 51.47%
Figure 3. Tag Cloud in 307 Koha Libraries
Figure 4 is a comparison by library type. Even though more academic libraries allow their users
to add tags (58% vs. 48% for public libraries), our study found very few large clouds in
academic OPACs. Academic library users are not so active in adding tags to catalogs. In contrast,
users in public libraries are more active in adding and using tags, which led to more large clouds.
20%
22%
7%
51%
Tag Cloud in 307 Libaraies
Large Tag Cloud(over 50 tags)
Small Tag Cloud(less than 50 tags)
Empty Tag Cloud(no tags)
Tag Cloud notturned on
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
Public Academic School
Percentage (> 50tags)
Percentage (< 50tags)
Percentage (notags)
Figure 4. Tag Cloud in Koha Libraries – A Comparison
4.1 Tag Cloud in Public Libraries
Public library users are more active than academic or school library users in adding tags to online
catalog. Figure 5 describes tag clouds in 218 public libraries. Around 27% of 218 public libraries
have large tag clouds so that patrons can use them to search the entire catalog. Seventeen percent
have small tag clouds though some small tag clouds include only one or two tags which are not
useful. Five percent of the public libraries do not have any user added tags and the rest (about
51%) did not enable tagging in their system.
Figure 5. Tag Clouds in 218 Public Libraries
Some tags are very close to subjects or keywords, such as “nature”, “religion”, or “web 2.0”. But
most tags created by public library users describe resources for the use of certain communities,
or only themselves, such as “Summer Reading Club”, “Toddler Time”, or “Great Movies”.
4.2 Tag Cloud in Academic Libraries
Most academic library users are not creating large tag clouds. Figure 6 illustrates tag clouds in 62
academic libraries. 44% of them have small tag clouds, and 11% have an empty tag cloud. Forty-
two percent of 62 academic libraries did not turn on tagging. Only a small portion of academic
libraries, 3%, have large tag clouds, which could be used for searching in catalog.
27%
17%
5%
51%
Tag Cloud in Public Libraries
Large Tag Cloud(over 50 tags)
Small Tag Cloud(less than 50 tags)
Empty Tag Cloud(no tags)
Tag Cloud notturned on
Figure 6. Tag Cloud in 62 Academic Libraries
4.3 Tag Cloud in School Libraries
Figure 7 shows tag clouds in school libraries. The majority of school libraries did not turn on
tagging. Only 11% of school libraries have small tag clouds. Each “cloud” includes one or two
tags at most. Essentially, school libraries do not use tagging at all. More research is needed to
look into the reason behind this phenomenon.
Figure 7. Tag Cloud in 27 School Libraries
5. Conclusion
3%
44%
11%
42%
Tag Cloud in Academic Libraries
Large Tag Cloud (over50 tags)
Small Tag Cloud (lessthan 50 tags)
Empty Tag Cloud (notags)
Tag Cloud not turnedon
0%
11%
11%
78%
Tag Cloud in School Libraries
Large Tag Cloud (over50 tags)
Small Tag Cloud (lessthan 50 tags)
Empty Tag Cloud (notags)
Tag Cloud not turnedon
Research provided evidence in support of folksonomies as a viable alternate method of subject
access to resources. Very few legacy or classic ILSs are capable of this function though the open
source ILS, Koha, is an exception. Only half of newly developed discovery tools (47%) allow
tagging. To get a glimpse of how much libraries are taking advantage of a system with tagging
capability, the authors examined libraries with Koha and found that only half (49%) enabled
tagging. Among the Koha libraries that enabled tagging, only 40% have large tag clouds with
over 50 tags that appear useful, while 46% do not have adequately meaningful clouds (small
clouds with fewer than 50 tags) and 14% have given users the opportunity to add tags, but users
did not show any interest, thus resulting in empty clouds.
Based on the above findings, the authors recommend that more vendors should add tagging
capability in the future release of new systems to get users better access to collections. Libraries
should find ways to more aggressively promote tagging activities. Research should be done to
investigate why half of the libraries do not allow users to add tags and also why academic library
users are less interested in tagging than public library users. Tagging is a Web 2.0 phenomenon
where user participation is anticipated, and users should even be encouraged to share their
wisdom in cataloguing.
Notes
1. An abridged version of this paper was published as: Yang, Sharon Q., “Tagging for Subject
Access”, Computers in Libraries, v. 32, no. 9, Nov. 2012
References
Heymann, Paul and Hector Garcia-Molina. (2009). “Contrasting Controlled Vocabulary and
Tagging: Do Experts Choose the Right Names to Label the Wrong Things?”
Paper presented at the Second ACM International Conference on Web Search and Web
Data Mining (WSDM ’09), Barcelona, Spain, February 9-12, 2009. Accessed March 20,
2012. http://ilpubs.stanford.edu:8090/955/1/cvuv-lbrp.pdf.
History-Official Website of Koha Library Software, maintained by Koha Library Software
Community.Accessed April 28, 2012. http://koha-community.org/about/history/
Kwan, Yi and Lois Mai Chan. (2009). “Linking Folksonomy to Library of Congress Subject
Headings: An Exploratory Study.” Journal of Documentation, 65(6), 872-900.
Lawson, Karen G. (2009). “Mining Social Tagging Data for Enhanced Subject Access for
Readers and Researchers.” Journal of Academic Librarianship, 35(6), 574-582.
Library of Congress Working Group on the Future of Bibliographic Control (2008). On the
Record: Report of the Library of Congress Working Group on the Future of
Bibliographic Control. Accessed Feb. 23, 2012. http://www.loc.gov/bibliographic-
future/news/lcwg-ontherecord-jan08-final.pdf
Library Technology Guides-Discovery Layer Interfaces, maintained by Marshall Breeding.
Accessed March 6, 2012. http://www.librarytechnology.org/web/Breeding/guides/
Lu, Caimei, Jung-ran Park, and Xiaohua Hu. (2010). “User Tags Versus Expert-assigned
Subject Terms: A Comparison of LibraryThing Tags and Library of Congress Subject
Headings.” Journalof Information Science, 36(6), 763-779.
doi:10.1177/0165551510386173
Peterson, Elaine. (2009). “Patron Preferences for Folksonomy Tags: Research Findings When
Both Hierarchical Subject Headings and Folksonomy Tags Are Used.” Evidence Based
Library & Information Practice, 4(1), 53-56.
Rolla, Peter J. (2009). “User Tags versus Subject Headings: Can User-Supplied Data Improve
Subject Access to Library Collections?” Library Resources & Technical Services, 53(3),
174-184.
Smith, Gene. (2008). Tagging: People-powered Metadata for the Social Web. Berkeley, CA:
New Riders.
Smith, Tiffany. (2007). “Cataloging and You: Measuring the Efficacy of A Folksonomy for
Subject Analysis.” Paper presented at the 18th Workshop of the American Society for
Information Science and Technology Special Interest Group in Classification Research,
Milwaukee, Wisconsin. Accessed March 20, 2012.
http://arizona.openrepository.com/arizona/handle/10150/106434
Steele, Tom. (2009). “The New Cooperative Cataloging.” Library Hi Tech, 27(1), 68-77.
Thomas, Marliese, Dana M. Caudle, and Cecilia M. Schmitz. (2009). “To Tag Or Not To Tag?”
Library Hi Tech, 27(3), 411-434.
Wetterstrom, Mikael. (2008). “The Complementarity of Tags and LCSH — A Tagging
Experiment and Investigation into Added Value in a New Zealand Library Context.” New
Zealand Library & Information Management Journal, 50(4), 292-306.
Author Biographies
Yan Yi Lee
Yan Yi Lee works as Systems/Cataloging Librarian in Wagner College Library, New York, USA.
She is the supervisor of the Technical Services Department. She received her MLS from the Pratt
Institute in New York and her MS in Computer Engineering from New Jersey Institute of
Technology. She also served as the chair of the WALDO Technical Services Committee between
2010 and 2012. Her research interests include library automation, open source, the Semantic
Web, and linked data. Please contact her at [email protected].
Sharon Q. Yang
Dr. Sharon Q. Yang works as Associate Professor and Systems Librarian in Rider University
Moore Library, New Jersey, USA. She received her MS in 1988, Certificate for Advanced
Librarianship in 1989, and DLS in 1997, all from Columbia University, New York City. Her
research interests include next generation catalog, the Semantic Web, library systems, and
assessment of information literacy skills. Please contact her at [email protected].