Towards Intelligent and Adaptive Digital Library Services
Transcript of Towards Intelligent and Adaptive Digital Library Services
i
TOWARDS INTELLIGENT AND ADAPTIVE DIGITAL
LIBRARY SERVICES
By
Yenruedee Chanwirawong
SIU THE: SOT-MSIT-2007-01
TOWARDS INTELLIGENT AND ADAPTIVE DIGITAL
LIBRARY SERVICES
ii
A Thesis Presented
By
Yenruedee Chanwirawong
Master of Science in Information Technology
School of Technology
Shinawatra University
June 2007
Copyright of Shinawatra University
Title: Towards Intelligent and Adaptive Digital Library
By: Yenruedee Chanwirawong
Program: Master of Science in Information Technology
Advisor: Dr. Md Maruf Hasan
Academic Year: 2006
iii
The Thesis is Accepted by the School of Technology, Shinawatra University
in Partial Fulfillment of the Requirements for the Degree of Master of Science in
Information Technology.
................................................ Acting Dean, School of Technology
(Asst. Prof. Dr. Prinya Tantaswadi)
Committee:
................................................ .Advisor
(Dr. Md Maruf Hasan)
................................................ Committee
(Assoc. Prof. Dr. Ekawit Nantajeewarawat)
................................................ Committee
(Asst. Prof. Dr. Chutiporn Anutariya)
................................................ External Examiner
(Dr. Kazuhiro Takeuchi)
June 2007
iv
Acknowledgments
I would like to express my sincere gratitude to Dr. Maruf Hasan for the
support he has provided me during the entire course of my graduate studies at
Shinawatra University. I have benefited a lot from his guidance, on scholarly as well
as spiritual levels, and my work would never have been a success without his
encouragement.
I am thankful to Assoc. Prof. Dr. Ekawit Nantajeewarawat for helping me get
started with this project, and correcting my course as necessary. I am also thankful to
Asst. Prof. Dr. Chutiporn Anutariya for her valuable suggestions towards improving
my thesis, and for helping me defend it.
I thank all members of MSIT. The interactions that I have had with many of
you over the last two years have been wonderful learning opportunities for me. I
thank Kritsada Klaimak for helping me with several aspects of this work. I wish all of
them very successful careers ahead.
Last, but not the least, I am grateful to my parents whose love and
encouragement have made this work possible. I would not be who I am today without
their blessings.
v
Abstract
A digital library is a collection of documents in organized electronic form and
accessible via search and browsing interfaces. Depending on the specific library, a
user may be able to access magazine articles, books, papers, images, sound files, and
videos. The availability of contents, user profile and usage pattern in a digital library
in machine understandable formats paves the way of processing these information
further using state-of-the-art technologies to introduce intelligent services in digital
library. Analyzing, annotating and organizing contents based on a domain-ontology
(Middleton, Roure, & Shadbolt, 2001) give us the ability in making topical inferences
through content relationship. User’s rudimentary or incomplete profile can also be
augmented with the help of profile-ontology and usage pattern. Nevertheless, we can
also use sophisticated mathematical models to process usage-pattern data for making
content recommendations using collaborative filtering (Sarwar, Karypis, Konstan, &
Riedl, 2001) and to model interest-drift in users. Serving digital library users with the
right information which best reflect their query, profile, usage history and content
relationships is only possible when we consider the above issues in a unified manner.
In this research, we use an open-source digital library system. We develop and
integrate add-on ontologies and necessary algorithms to demonstrate intelligent
services in the digital library. Our prototype can demonstrate that the retrieval results
for the same query by a particular user, but on different time, may yield different
result-sets (ranking) since the query, profile and contents are dynamically enhanced
using intelligent and adaptive algorithms. We also demonstrate that the same query
from different users may yield different result sets as justified by the differences in
their profiles and usage patterns. Our prototype can also recommend digital library
items to users by using collaborative filtering –based usage-pattern analysis. We test
our approach using a partially synthetic datasets and analyzed the results through
human judgments.
Keywords: Digital library, Recommender system, Collaborative filtering, Usage
pattern, Ontology
vi
Table of Contents
Title Page
Acknowledgments ............................................................................................................... i
Abstract ............................................................................................................................. v
Table of Contents .............................................................................................................. vi
List of Figures ................................................................................................................. viii
Chapter 1 Introduction ........................................................................................................1
1.1 Motivation ............................................................................................................ 1
1.2 Thesis outline ....................................................................................................... 2
Chapter 2 Material and Methodology ...................................................................................3
2.1 Digital Library Services ....................................................................................... 3
2.2 Recommendation System..................................................................................... 5
2.2.1 Recommendation Techniques ....................................................................... 6
2.3 Ontologies .......................................................................................................... 10
2.3.1 Domain ontology ........................................................................................ 10
2.3.2 User profile ontology .................................................................................. 12
2.4 Single Exponential Smoothing .......................................................................... 14
2.4.1 The basic idea of the Exponential smoothing ............................................. 15
2.4.2 Assigning interest value .............................. Error! Bookmark not defined.
2.5 Usage-Pattern Analysis using Collaborative Filtering ...... Error! Bookmark not
defined.
2.6 Temporal Changes in User Interest (User Interest-drift) .. Error! Bookmark not
defined.
Chapter 3 Implementation Detail ....................................................................................... 17
3.1 System overview ................................................................................................ 18
3.2 The components of our digital library ............................................................... 19
3.2.1. Access log .................................................................................................. 19
3.2.2. ACM Computing Classification Ontology ................................................ 20
3.2.3. User Profiles............................................................................................... 20
3.2.4. User Machine with Web Browser .............................................................. 21
vii
Chapter 4 Experimental Result and Evaluation ................................................................... 24
4.1 Experimental Setup and Result Analysis ........................................................... 24
4.1.1User Similarity ............................................................................................. 24
4.1.2 Single Exponential Smoothing ................................................................... 25
Chapter 5 Conclusions and Future Work ............................................................................ 28
References ....................................................................................................................... 29
Biography ........................................................................................................................ 31
viii
List of Figures
Title Page
Figure 2. 1: Example of sources content in digital library .......... Error! Bookmark not
defined.
Figure 2. 2: Examples of Greenstone Digital library interface ... Error! Bookmark not
defined.
Figure 2.3: The Collaborative Filtering process ............................................................ 6
Figure 2. 4: Cosine based example ................................................................................ 8
Figure 2. 5: TechLens interface ................................... Error! Bookmark not defined.
Figure 2. 6: Example of ACM Computing Classification System .............................. 11
Figure 2.7: Schema for ACM-CCS Ontology based on CCS Taxonomy .................... 11
Figure 2.8: Example of inference using ACM-CCS ontology ..................................... 12
Figure 2. 9: User Profile Ontology .............................................................................. 13
Figure 2. 10: The i x j Matrix used for recommendation ............ Error! Bookmark not
defined.
Figure 2.11: Overview of Adaptive and Intelligent Service features in Digital
Libraries. (a) Horizontal Axis illustrates that search results for a particular user
using the same keyword at different time yield different retrieval results. (b)
Vertical Axis illustrates that search results for different users based on the same
keyword may yield different retrieval results. ..................................................... 17
Figure 3.1: an overview of the software infrastructure used in the prototype. ............ 19
Figure 3.2: Example of access log ............................................................................... 19
Figure 3.3:example of ACM Computing Classification System Ontology ................. 20
Figure 3. 4: Example of user similarity in the database ............................................... 20
Figure 3.5: Screen shot of a sample user profile .......................................................... 21
Figure 3. 6: Log in page ............................................................................................... 22
Figure 3.7: Digital library prototype homepage .......................................................... 22
Figure 3. 8: Example of search result from our digital library .................................... 23
Figure 4. 1: Examples of user profiles ......................................................................... 24
ix
Figure 4. 2: Examples of user profiles ......................................................................... 25
Figure 4. 3: Example of JAVA code for calculating user similarity ........................... 25
Figure 4. 4: Graphical representations the relation between weight and α .................. 26
List of Tables
Title Page
Table 2.1: Sample of Parameter fi relate with parameter t .......................................... 16
Table 4 1: Display a sample weight calculation for a user. ......................................... 26
Figure 4.2: Weight of topic for a use .......................................................................... 27
1
Chapter 1
Introduction
1.1 Motivation
A digital library is a collection of documents in organized electronic form and
accessible via search and browsing interfaces. The availability of contents, user
profile and usage pattern in a digital library in machine understandable formats pave
the way of processing these information further using state-of-the-art technologies to
introduce intelligent services in digital library. Analyzing, annotating and organizing
contents based on a domain-ontology (Middleton et al., 2001) give us the ability in
making topical inferences through content relationship. User’s rudimentary or
incomplete profile can also be augmented with the help of profile-ontology and usage
pattern. Nevertheless, we can also use sophisticated mathematical models to process
usage-pattern data for making content recommendations using collaborative filtering
(Sarwar et al., 2001) and to model interest-drift in users. Serving digital library users
with the right information which best reflect their query, profile, usage history and
content relationships is only possible when we consider the above issues in a unified
manner. There are intensive researches and development on digital library over the
last decades. However, many of these researches failed concentrating on the
intelligent services for digital library in a unified (Liao, Liao, Kao, & Harn, 2006). In
this thesis, we present a unified approach towards developing adaptive and intelligent
digital library services.
Majority of digital library researchers focused on providing access to diverse digital
information resources using so-called searching and browsing interfaces (G.G.
Chowdhury & S. Chowdhury, 2003). Searching interface may range from basic
keyword search to field-specific advanced search, etc. Browsing interfaces include
categorical navigation based on certain taxonomy and meta-data such as browse by
author, browse by category, etc. We argue that what makes a digital library unique is
the availability of content in electronic form (which can be processed automatically
and inferences can be made), and the availability of user profile and usage patterns.
Unlike the WWW, Google-like keyword search or Yahoo-like Directory is certainly
not adequate for harnessing information in the context of a digital library. Therefore,
2
we tried to make use of DL content, user-profile and usage patterns and developed
necessary algorithm to facilitate intelligent and adaptive services for digital library in
a unique fashion.
Using our approach, we successfully demonstrate that the retrieval results for the
same query by a particular user, but on different point of time, may yield different
result-sets (ranking) since the query, profile and contents are dynamically enhanced
using intelligent and adaptive algorithms. We also demonstrate that the same query
from different users may yield different result sets as justified by the differences in
their profiles and usage patterns. Our prototype can also recommend digital library
items to users by using collaborative filtering based usage-pattern analysis.
1.2 Thesis outline
In chapter 2 we are introducing digital library, example of using digital library and
limitation of current digital library. Also recommendation technique and integration
of digital library and recommender system are discussed there. We also explore the
idea of using both domain ontology and user-ontology we have used in our system.
Single Exponential Smoothing which used to decrease weights in digital library is
discussed in this chapter as well. Next, in the chapter 3, the prototype system
architecture and system component are specified. Following that, in chapter 4, the
evaluations are performed to measure that effectiveness of our approach. Finally, in
chapter 5, the conclusions of the thesis are presented. Also chapter 5 identifies the
areas that are in need of further research.
3
Chapter 2
Material and Methodology
2.1 Digital Library Services
Digital libraries (DL) have become a major part of the mainstream library landscape.
Content for digital libraries usually requires manipulation by several tools, each with
it own benefits and drawbacks. In this section, current digital library system, tools for
digital library, advantages and the limitation of current digital libraries are discussed.
As G.G Chowdhury & S. Chowdhury stated, a digital library is an assemblage of
digital computingm stoage, and communications machinery together with the content
and software needed to reproduce, emulate, and extend the services provided by
conventional libraries based on paper and other material means of collecting,
cataloging, finding, and disseminating information. A full service digital library must
accomplish all essential services of traditional libraries and also exploit the well-
known advantages of digital storage, searching, and communication G.G. Chowdhury
& S. Chowdhury, 2003).
Being a digital library, the digital libraries contents can be database of text, numbers,
graphics, sound, video, etc. In general, DL contents are organized to make
information accessible in particular, well-defined ways - and good ones will include a
description of how the information is organized (G.G. Chowdhury & S. Chowdhury,
2003).
Figure 2.1: Example of sources contents in digital library
4
There are several digital libraries software available, for example, The New Zealand
digital library project, presently called the Greenstone Digital library (New Zealand
Digital Library Project, 2000), is a research program at the University of Waikato.
Greenstone is an example of a software suit for building, maintaining and distributing
digital library collections. Greenstone includes the following functions, full-text and
fielded search, flexible browsing facilities, metadata-based (Dublin Core), collection-
specific, hierarchical phrase browsing supported, creates all access structures
automatically (see Figure 2.2, below for an example of Greenstone Interface).
Figure 2. 2: Examples of Greenstone Digital library interface
If we compare digital libraries with its traditional counterparts, digital libraries give us
many advantages. For example the user of a digital library need not to go to the
library physically; people from all over the world can gain access to the same
information, as long as they are connected to the network. People can gain access to
the information at any time. Unlike physical media, digital library contents can be
used by a number of users at the same time. Lastly, the DL user is able to use
advanced electronic facilities such as keyword search to explore information digitally
and efficiently (G.G. Chowdhury & S. Chowdhury, 2003; Liao et al., 2006).
With the advantages of using DL, we investigate to see if it can be improved to be
more intelligent and personalized based on user profile. From our investigations and
observations, we notice that most digital libraries provide basic search and navigation
functions which maybe adequate for Internet content but are merely sufficient in the
5
context of a digital library. In DL context, we have been continuously failing to take
advantage of electronic contents using advanced content processing techniques, and
also failing to make use of user profile and usage history, etc.
However, automatic analysis and organization of contents based on user perspective
and profile is desirable since machine-readable electronic contents can be further
processed and annotated in order to make better sense about them.
Generally, keyword search and navigation may only facilitate and entry-level access
to the content in DL. Efficient access to DL content is only possible when we succeed
to make use of heterogeneous information and facets under a unified framework. For
example, (1) keyword-based search often produces enormous amount of irrelevant
hits. Different users may target different items when using the same keyword to
search. (2) When DL users sign up, they may inadvertently provide incomplete or
inaccurate profile information about themselves. Moreover, (3) user’s profile,
information-seeking behaviour and information-needs, etc. change over time.
In this research, we try to augment user profile adaptively using usage history and
using DL user’s profile ontology. We also analyze DL contents to consider ontology-
driven inference for content association (e.g., topic relationships) to make topical
relate to us.
2.2 Recommendation System
Recommendation system are widely used online (e.g.: in Amazon.com) to suggest to
users items they may like or find useful. Recommendation system has become
popular since the mid 1990’s. In this section, the current recommender system and
some recommendation techniques are described.
A recommendation system is designing, managing and delivering the content based
on known, observed and predictive information. Recommendation techniques match
an individual, personal preferences and user habits on a user profile to make
individual recommendation.
There are several filtering techniques can be used in the recommendation system.
6
Collaborative filtering and content-based recommending are two fundamental
techniques that have been proposed for performing recommendation. Both techniques
have their own advantages however they cannot perform well in many situations. To
improve performance, various hybrid techniques have been considered.
2.2.1 Recommendation Techniques
Collaborative filtering technique
Several existing collaborative-filtering-based recommendation systems have been
designed and implemented since early 90’s. Collaborative filtering techniques have
been proven to provide satisfying recommendations to users.
Collaborative filtering (CF) is one of the key techniques for implementing a
recommender system that recommends to a user a set of candidate items, which may
be preferable or useful to the user. We use CF algorithm for our recommendation
(Sarwar et al., 2001)
Collaborative filtering in general works as follows (see Figure 2.3). Take a set of
users and a set of documents. As users interact with the system they rate items
(obtrusively or unobtrusively). These ratings are collected in a two-dimensional roster
(the first dimension being the set of users and the second dimension being the set of
documents). Not yet encountered documents are rated by computing a neighborhood
of the point for the rating being predicted and combining the ratings in that
neighborhood somehow.
Figure 2.3: The Collaborative Filtering process
7
There are two different approaches within collaborative filtering for how to combine
the ratings of the different users into predicted ratings for unseen items. One is User-
based collaborative filtering and another is Item-based collaborative filtering. These
two approaches are called user based collaborative filtering and item-based
collaborative filtering, respectively.
a. User-based collaborative filtering first computes a neighborhood for the active
user (i.e., the user for whom the prediction is made). This neighborhood consists of
users which are similar to the active user and have rated the active item. The ratings
of the users in this neighborhood for the active item are then combined into a
predicted rating.
b.Item-based collaborative filtering (Sarwar et al., 2001) is conceptually very
similar: we just switch the dimensions. So instead of computing a neighborhood of
users, we compute a neighborhood of items similar to the active item (i.e., the item
about which the prediction is made). Then we combine the ratings of the active user
for the items in this neighborhood into a predicted rating. NOTE: the collaboration,
that is the use of the other user's profiles, is located in the similarity computation
GroupLens is a example of a collaborative system that uses the item-based technique
by computing item-item similarity. GroupLens is a system for collaborative filtering
of netnews, to help people find articles they will like in the huge stream of available
articles. The Item-Based technique first analyzes the relationship between different
items on a user-item matrix then computes the recommendation for user (Konstan et
al., 1997)
TechLens is another example of Item-Based Collaborative Filtering Recommendation
system. TechLens developed a generic DL recommendation model using the
Collaborative filtering approach by analyzing relationships between citations. The
system uses the opinion of a community to recommend items to individuals. The
system will recommend items to a user that theirs neighbours may have voted on
(Konstan et al., 1997)
8
Figure 2. 4: TechLens interface
Item Similarity computation
Cosine similarity (Sarwar et al., 2001)
Here the similarity is measured as the angle between the two vectors. Formally, in the
m x n ratings matrix, similarity between user/items i and j is given by: (See Figure 2.4
as example of Cosine based calculation)
Figure 2. 5: Cosine-based example
User Similarity computation
To measure similarity between two profiles, user profiles are thought of as two
vectors in the m dimensional user-space. The similarity between them is measured by
HP
LR
George
1
4
Henriette
4
1
sim(HP, LR) = 0.47
9
computing the cosine of the angle between these two profiles (Sarwar et al., 2001)We
denote profx and profy as the cosine of the angle between them and can be computed
as:
Sim ( profx, profy ) = profx . profy / ( | profx || profy | ) (1)
where “.” Denotes the dot=product of the two vectors
The CF engine computes the similarity and the recommendation score to make a
recommendation. When making a recommendation, CF engine used the bookshelf in
form of i x j matrix as shown in figure 2.9 where Bi denotes i-th books and Ui denotes
i-th users. If useri adds bookj to bookshelf, the value of Ui,j, is set to 1 and if booki
does not add to bookshelf, the value of Ui,j is set to 0.(NIST/SEMATECH, 2007)
Usage-Pattern Analysis using Collaborative Filtering
The digital library registered users are able to create personal bookshelf by adding
bookmark to theirs bookshelf. Since Different groups of people obviously have
different interest, and then people in the same group may have the same interest. We
can classify users into some group by finding the similar between users. In a group of
people, some books have read by user 1, may also be interested by user 2. From our
assumption, then we can recommend the items to people in the same group. We can
also predict rating of new items for the target users, based on user bookshelf and
rating of similar user.
.
Figure 2. 6: The i x j Matrix used for recommendation
Content-Based filtering technique.
In content-based techniques, the user model includes information about the content of
items of interest whether these are web pages, movies, music, or anything else. Using
these items as a basis, the technique identifies similar items that are returned as
10
recommendations. These techniques might prove highly suitable for users who have
specific interests and who are looking for related recommendations (Paulson &
Tzanavari, 2003).
Several researchers use the Content-based filtering technique in recommendation
system.algorithm, for example, Robin van Meteren and Maarten van Someren
proposed PRES (Meteren & Someren, 2000) that use content-based filtering
techniques to suggest document that relevance to user profile. The user profile was
created by user feedback.
Hybrid techniques
Hybrid techniques seem to promise to combine the positive features of both content-
based and social filtering methods, diminish their shortcomings, and thus produce a
more robust system. The philosophy here is that the content of items is taken into
consideration when identifying similar users for collaborative recommendation.
Zan Huang, Wingyan Chung et al. (Huang, Chung, Ong, & Chen, 2002) proposed the
idea of using a graph-based recommendation system for digital libraries by combining
the content-based and collaborative approach. The similarities between items, item
and user, and user and other users based on the features of items and users are
computed and weights. Then the assigned item with higher weights is recommended
to user by the system.
2.3 Ontologies
Ontologies are the basis for rich, semantic descriptions of the content in the digital
library. In this research, we can identify two main modules in our propose system,
domain ontology and user profile ontology.
2.3.1 Domain ontology
Domain ontology describes aspects that are specific to particular domains and is used
as a conceptual backbone for structuring the domain information provided in the
information spaces. Such domain ontology typically comprises conceptual relations,
such as a topic hierarchy, but also richer taxonomic and non-taxonomic relations
(Haase, Volker, & Sure, 2005).
11
Due to the Semantic Web initiative, ontology received a renewed interest and found
its application in many practical applications. As explained earlier, a digital library is
a focused collection of digital resources in a specific domain; we can use domain
ontology as reference to make sense of contents (instances) in that domain by means
of attributes and class relationships. For our experiments, we focused on Computing
Domain and developed our domain ontology based on ACM Computing
Classification System (hereafter, ACM-CCS). Our approach is similar to what is used
in (MIT Libraries, 2002) We have enhanced the original ACM-CCS taxonomy
(Figure 2.6) with extra attributes such as, hasKeywords (Figure 2.7). There are several
ways to automatically extract keywords, and we used Keyphrase Extraction Tool
(KEA) from University of Waikato, New Zealand to automatically extract keywords
for ACM-CCS sub-topics (using machine-learning technique).
Figure 2. 7: Example of ACM Computing Classification System
Figure 2.8: Schema for ACM-CCS Ontology based on CCS Taxonomy
12
inComposedBy
hasKeyword
hasKeyword
Figure 2.9: Example of inference using ACM-CCS ontology
From the ACM-CCS ontology as illustrated in Figure 2.6B, when user searches some
items in the digital library, we can infer from the hasKeyword attribute. We then
know the category of the item that user is searching (most likely related to user’s
present context). In Figure 2.7, we explain an intuitive example of how topic-
inference is performed in our experiment. When a user searches with multiple
keywords or the user-profile and usage-pattern include one or more topics, such as,
“Volume rendering” and “Unstructured grids”, our ontology-driven inference can
precisely identify the user in his or her context. The keyword, “Volume rendering”
appears in both Graphic System and Hardware Architecture categories; but for the
keyword “Unstructured Grids” is only found in Hardware Architecture category.
Therefore, the result should be more biased in Hardware Architecture category for
this particular user. The efficiency of searching in digital library can be improved by
using domain ontology as well as profile ontology as explained below.
2.3.2 User profile ontology
The class defined in the system’s ACM Computing Classification System ontology is
used for user profiles. The user profiles will then hold the interested information about
the users of the digital library system. Thus the user profile represents in term of
which category users are interested in. This profile knowledge will drive the digital
library system. Further more, our user profile ontology allows inferences via is-a
relationship defined in the domain ontology.
13
Typically user profiles require questionnaires and interviews with user to acquire
information about users’ requirement before a profile can be built. A user profile
consists of facts about users and theirs interests. Weight for each topic that user
interested in will be calculated and kept in the profile ontology and this knowledge
about user will improve the correctness of searching in the digital library see Figure
2.8 for example of user profile ontology (Liao et al., 2006).
Figure 2. 10: User Profile Ontology
The profiles hold the information about interest topic and its weights. We can
inference the interest topic from user ontology. Using user profile, we can inference
user interest topic without user browsed explicitly. Everyday user profiles are
computed. The interest value for each interest topic in user profiles will be assigned
when user browse items. The 50% of interest values will be assigned to the super-
class of that interest topic also (Liao et al., 2006; Middleton et al., 2001)
Since user interests change overtime, using user’s profile is not enough. Analyzing
user’s behaviors have been using to improve the recommender system. Many
researchers developed several approaches to recommend item to user by learning user
behavior. Middleton et al., (Middleton et al., 2001) presented the approach to acquire
user profile by unobtrusive monitoring of browsing behaviors and application of
supervised machine-learning techniques coupled with an ontological representation to
extract user preferences. Similar to David M Nichols et al., (Pennock, Horvitz,
Lawrence, & Giles, 2000) they suggested using usage data in recommendation for
digital library. Since a person normally goes through a series of stages in finding out
about an item, ‘Discovery model’ illustrated user activity and response to item. Joana
14
Trajkova et al. (Trajkova & Gauch, 2003) focused on identifying user’s interests
without user’s interaction technique ‘Ontology-Based user profile without the user
interaction’. They built a user profile by collection user data (URL, date, time spent
on the page) via a proxy server then match the document to user profile in a pre-
existing ontology.
However, the accuracy of profiling relies on sufficient user behavior history. This
approach will be ineffective if user have less interactive with the system. Knowing
that user’s domains of interests can reduce this gap. T.Jonathan Lau and Autin J.Wang
(Lau & Wang, 2006) presented a technique to acquire user’s interest by inferring
user’s interest base on demographics information such as profession, religion,
ethnicity, and age. They used Open Mind common sense database to generate user
profiles and applies them in content recommendation.
User profiling not only use in matching the item to user but it also user to improve
searching and browsing in digital library. Kruk (Kruk, 2005) presented JeromeDL
which they especially focus exploitation of personal profile information based on
Semantic Web for searching in digital libraries. Jerome ontology modified from
DublinCore metadata includes the permission of structured values as well as
additional definitions of keywords and catalog classifications (domains of interest) of
resources. Each keyword concept is connected to other concepts with properties:
hyponyms, synonyms, homonyms, semantic fields and categorization. Each concept
has a list of often-used lexical variants of the word. That can help to get the stem for
the concept when the user provides a different variant of it. The search algorithm in
implemented in JeromeDL was designed to support the query. The query should
return items where descriptions do not directly contains the required values and the
meaning of values provided in the query should be resolved in the context of user’s
interests.
2.4 Single Exponential Smoothing
Time is an important factor for the interest value as well. Because, in general, users’
like items and dislike items will always change. For example, a 15 years old girl was
interested in items such as teen-age fashion, cakes, pets, and so on. When she became
15
20 years old, she may be interested in something else, not things for 15 years old girl
such as she may be interested in cosmetics, adult fashion, cooking, and so on. In user
profile, we should aware of these changes. Therefore the recent interest value for
items more will be higher than the interest value of items which user browed or rated
for long time ago. Then when users search some items in the digital library, the result
will more match with current user interest (Yang & Li, 2005)
As we addressed, the users' interests changing is related to the time. Since the recent
observations are given relatively more weight in forecasting than the older
observations, we then use Single Exponential Smoothing method to assigns
exponentially decreasing weights to forecast future values for each topic to users in
our system (NIST/SEMATECH, 2007).
2.4.1 The basic idea of the Exponential smoothing
The basic idea of the Exponential smoothing is value in more recent time period have
more impact on the forecast therefore weighting factors in our system will decrease
exponentially. We calculate the smoothed value by using formula
(NIST/SEMATECH, 2007).
St = αyt-1 + ( 1 – α )St-1 0 < α ≤ 1 t ≥ 3
where t = any time period
St = the smoothed value
α = smoothing constant
yt-1 = the current observation
When smoothing constant (α) is close to 1, weight decreasing is quick and when
smoothing constant (α) is close to 0, weight decreasing is slow. Then we define
another method to find the best smoothing constant (α) for each user depending on
usage history. (The parameters values in this experiment are decided heuristic).
Here, firstly, we define parameter fi as:
fi = Ni / MaxN (2)
where fi Frequency of usage the system for useri
16
Ni Number of transactions for useri
MaxN The highest number of transaction for all users
Table 2.1: Sample of Parameter fi relate with parameter t
From the fi parameter, then we can find the appropriate values for smoothing constant
to each user.
2.4.2 Assigning interest value
An interest value can be assigned to each interest topic by many events. For example
when user browsed the books, the interest value will be 1. If users rate the interest
topic then the interest value will be 0.9. If users follow the item that system
recommends then the value of interest topic will be 1. If users add the item to their
bookshelf, the value of interest topic will be 1. At the same time, every time interest
value is assigned to each topic, the system will assign 50% of interest value to its
super class as well and this value will be calculated by time function also (Middleton
et al., 2001)
17
Chapter 3
Implementation Detail
3.1 Temporal Changes in User Interest (User Interest-drift)
In this section, we discuss about the model of our proposed system. In figure 3.1, it
depicts the system overview and composes of user profile, usage pattern, and digital
library ontology. When users register to our system, user can rate the user interest
topics and then weight will be calculated. At the end of the day, user similarity and
the value of the interest topic will also be calculated. The weight of each topic will be
reduced by 50% of its current weight for every a period of time (different user will
have different period of time to reduce the weight depending on how often users use
the system). Then the longer item exists in the system will have less importance than
the new one. The result is when userj search with the same keyword on a different
time, at time t0 and time t1, will get different result. In the meantime, different users
(useri and userj) search with the same keyword, at time t0 will also get different result.
Figure 3.1: Overview of Adaptive and Intelligent Service features in Digital
Libraries. (a) Horizontal Axis illustrates that search results for a particular user using
the same keyword at different time yield different retrieval results. (b) Vertical Axis
illustrates that search results for different users based on the same keyword may yield
different retrieval results.
18
3.2 System overview
In order to evaluate the effectiveness of our approach, we implemented a personalized
digital library system prototype. The overall architecture is based on dynamically
generated web pages which display the results from a "back end" computational
subsystem. Data for these computational layers is stored in a database.
There are many factors involved in determining the architecture of a digital library. In
making this decision, one must determine how the system will be used and who the
users will be. We want the system to be robust and scalable, but we also have to face
the reality of a limited budget. We also want the system to be available to users
throughout the network. Based on an analysis of the use of a system such as ours, we
decided that a web application would be the ideal architecture for our digital library.
We are using the Tomcat server to host our Web application. Tomcat is an application
server from the Apache Software Foundation that executes Java Servlets and renders
Web pages that include Java Server Page coding. Described as a "reference
implementation" of the Java Servlet and the Java Server Page specifications, Tomcat
is the result of an open collaboration of developers and is available from the Apache
Web site in both binary and source versions. Tomcat can be used as either a
standalone product with its own internal Web server or together with other Web
servers, including Apache, Netscape Enterprise Server, Microsoft Internet
Information Server (IIS), and Microsoft Personal Web Server. Tomcat requires a Java
Runtime Enterprise Environment that conforms to JRE 1.1 or later. Tomcat is one of
several open source collaborations that are collectively known as Jakarta.
Any recommendation system also requires a database in order to store information
about items, users, and the ratings of users on items. For the database, MySQL was
chosen, since it is freely available, fast, and well documented
This structure is described below and illustrated by Figure 3.1.
19
Figure 3.2: an overview of the software infrastructure used in the prototype.
3.3 System architecture
3.3.1. Access log
Access log files are files that contain a list of actions that have occurred in a system.
Analysis of these files can not only tell who, what, when and where but how
information in the system was sought and used. Access log file analysis is a
quantitative method that is used to monitor usage of systems and gain an
understanding of users in many domains. By tracking users’ interactions with the
application in a log file, it is possible to collect useful information that can be used to
assess what the main interests of the users are. In this way, we are able to obtain
implicit feedback and to extract needs for changes to the ontology to improve the
interaction with the application.
Figure 3.1: Example of access log
20
3.2.2. ACM Computing Classification Ontology
We use the ACM Computing Classification System ontology to represents the
documents classification in our digital library. We have enhanced the original ACM-
CCS taxonomy with extra attributes such as, hasKeywords. We learn user interests
from the user queried keywords and keyword paths in the ontology.
Figure 3.2:example of ACM Computing Classification System Ontology
3.2.3. User Profiles
Gathering the information which is thought to be of interest to the user is one of the
most difficult tasks in digital library system development. A user profile will consist
of the user's account details and areas of interest. Information about a user’s interests
describes user’s information needs in terms of information types and types of
contents, and short-term and long-term interests.
Figure 3. 3: Example of user similarity in the database
21
After building the initial profile in digital library, we build the profiles without user
interaction, automatically monitoring the user’s browsing habits. Interest profiles are
computed daily by correlating previously browsed books with their classification.
User profiles thus hold a set of topics and interest values in their topics for each day
of the trial. Ontological relationships between topics of interest are used to infer other
topics of interest, which might not have been browsed explicitly.
Figure 3.4: Screen shot of a sample user profile
3.2.4. User Machine with Web Browser
A Web user can access our digital library through a Java enabled Web browser. Most
modern Web Browsers such as Microsoft Internet Explorer, Mozilla, and Netscape
support Java.
24
Chapter 4
Experimental Result and Evaluation
Our prototype is a recommendation system that allows users to explore the book
collection. When a user search a book, the system will look for book in the user
interest topic and in the same time the system will looks for the books in similar user
who search for the same interest topic and recommend the book that user have never
try before. Since our system can learn user interest which can change over time so the
items that the system will recommend to user matches with user interest in current
time.
4.1 Experimental Setup and Result Analysis
In order to demonstrate the effectiveness and the efficiency of our proposed approach,
a few experiments are conducted.
4.1.1User Similarity
We generated random profiles for 10 users. Then we use this information to find the
similarity between each user. Figure 4.1 shows the user profiles. The value ‘1’ means
users interested in that category. The value “0” means users not interested in that
category. After calculating the similarity between two users using (3), then we can
find the nearest neighbor for useri. Figure 4.2 displays the result of user similarity
calculation.
Figure 4. 1: Examples of user profiles
25
Figure 4. 2: Examples of user profiles
Figure 4. 3: Example of JAVA code for calculating user similarity
As described in previous section, the recommendation of the collaborative method
depends on the user similarity. Therefore, this method tends to provide unexpected
finding due to the information sharing between relevant users.
4.1.2 Single Exponential Smoothing
We generated α value to find the appropriate α for each user behaviour. From our
experiments, we found that if α is small, the weight will reduce faster than α is big. So
26
we can apply this to each user that is if user often use the system then the system
should assign a small α to them. But if users not often use the system then the system
should assign a big α to them.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Days
weig
ht
alpha = 0.5
alpha = 0.4
alpha = 0.2
alpha = 0.1
alpha = 0.05
Figure 4. 4: Graphical representations the relation between weight and α
The weight of a topic of useri is calculated by using (1). From fig. 9 we can see that
the weight will be reduced and increase depend on the user actions.
Table 4 1: Display a sample weight calculation for a user.
In figure 4.5, we can see that the weight of each user interest can reduce and increase
everyday depend on the history.
27
0
0.2
0.4
0.6
0.8
1
1.2
1 2 3 4 5 6 7 8 9 10
days
weig
ht
sample user
Figure 4.2: Weight of topic for a use
Our prototype can demonstrate that the retrieval results for the same query by a
particular user, but on different time, may yield different result-sets (ranking) since
the query; profile and contents are dynamically enhanced using intelligent algorithms.
We also demonstrate that the same query from different users may yield different
result sets as justified by the differences in their profiles and content relationships.
Our prototype can also recommend digital library items to users by using
collaborative filtering–based usage pattern analysis. We test our approach using a
partially synthetic datasets and analyzed the results through human judgments.
28
Chapter 5
Conclusions and Future Work
In this paper, we presented a novel approach to associate contents with user
profile and usage pattern and modelling interest-drift in digital libraries. We validated
our approach with synthetic data. However, we are currently setting the digital library
with real contents and users to put our system into practical system. An experimental
evaluation showed that our approaches provide the better-personalized services for
digital library. Two different users search with the same query yield different results.
When same user searches with the same query but in the different time they will get
different results.
For the future work, we will include systematic evaluation based on real digital
library.
29
References
Chowdhury, G. G., & Chowdhury, S. (2003). Introduction to digital libraries (359
ed.). London: Facet Publishing.
Haase, P., Volker, J., & Sure, Y. (2005). Management of dynamic knowledge.
JOURNAL OF KNOWLEDGE MANAGEMENT, 9(5), p. 97-107.
Huang, Z., Chung, W., Ong, T. H., & Chen, H. (2002, June 14-18, 2002). A Graph-
based Recommender System for Digital Library. Paper presented at the
ACM/IEEE Joint Conference on Digital Libraries, JCDL 2002, Portland,
Oregon, USA. Retrieved 2006/04/15, from
Konstan, J., Kapoor, N., McNee, S., & Butler, J. (2005). TechLens: Exploring the Use
of Recommenders to Support Users of Digital Libraries. In Fall 2005 Task
Force Meeting. Phoenix, AZ:(Coalition for Networked Information).
Konstan, J., Miller, B., Maltz, D., Herlocker, J., Gordon, L., & Riedl, J. (1997).
GroupLens: Applying Collaborative Filtering to Usenet News.
Communications of the ACM, 40(3), p. 77-87.
Kruk, S. 2005, personal communication, 10/1/2007,2005.
Liao, I., Liao, S., Kao, K., & Harn, I. (2006, November 27-30, 2006). A Personal
Ontology Model for Library Recommendation System. Paper presented at the
9th International Conference on Asian Digital Libraries,ICADL 2006, Kyoto,
Japan.
Lau, T. J., & Wang, A. (2006). AAA: a Profiling and Recommendation System.
Cambridge, MA 02142 USA:(Laboratory of Computer Science Cambridge).
Meteren, R. V., & Someren, M. V. (2000, May, 2000). Using Content-Based Filtering
for Recommendation Paper presented at the MLnet / ECML2000 Workshop,
Barcelona, Spain.
Middleton, S. E., Roure, D. C., & Shadbolt, N. R. (2001, 15 April 2002). Capturing
knowledge of user preferences: ontologies in recommender systems. Paper
presented at the Proceedings of the International Conference on Knowledge
Capture (KCAP'01), Victoria, British Columbia, Canada.
MIT Libraries (2002, 05/25/2006). DSpace. Retrieved 2/2/2007. 2007, from
http://www.dspace.org/
30
New Zealand Digital Library Project, U. o. W. (2000). Greenstone Library Software.
Retrieved 05/25/2006. 2006, from http://www.greenstone.org/
NIST/SEMATECH (2007, 7/18/2006). NIST/SEMATECH e-Handbook of Statistical
Methods. Retrieved 10/02/2007. 2007, from
http://www.itl.nist.gov/div898/handbook
Paulson, P., & Tzanavari, A. (2003). Combining collaborative and contentbased
filtering using conceptual graphs. In Lecture Notes In Artificial Intelligence
Series.
Pennock, D., Horvitz, E., Lawrence, S., & Giles, C. (2000). Collaborative Filtering by
Personality Diagnosis: A Hybrid Memory- and Model-Based Ap. Paper
presented at the The Sixteenth Conference on Uncertainty in Artificial
Intelligence (UAI-2000), Stanford, CA.
Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based Collaborative
Filtering Recommendation Algorithms. In The Tenth International World
Wide Web Conference (WWW10). pp. 285 - 295). Hong Kong.
Trajkova, J., & Gauch, S. (2003). Improving Ontology-Based User Profiles. Thesis.
University of Kansas. http://citeseer.ist.psu.edu/trajkova03improving.html.
Yang, Y., & Li, J. Z. (2005). Interest-based Recommendation in Digital Library.
Journal of Computer Science, 1, 40-46.
31
Biography
Name: Yenruedee Chanwirawong
Date of Birth: November 25, 1979
Place of Birth: Bangkok, Thailand
Institutions Attended:
May 1998 – April 2002 Bachelor of Information Science with First Class
Honours in Management Information System
Walailak University
Nakhonsithammarat, Thailand
June 2004 –March 2007 Master of Science in Technology (MSIT)
Shinawatra University
Bangkok, Thailand
Position and Office: Mainframe Developer, Cetelem (Thailand) Company
Limited.
Home Address: 160/260 Changarkat Uthit Road, Sikan, Donmueng,
Bangkok Thailand
E-mail: [email protected]