Towards Intelligent and Adaptive Digital Library Services

i

TOWARDS INTELLIGENT AND ADAPTIVE DIGITAL

LIBRARY SERVICES

By

Yenruedee Chanwirawong

SIU THE: SOT-MSIT-2007-01

TOWARDS INTELLIGENT AND ADAPTIVE DIGITAL

LIBRARY SERVICES

ii

A Thesis Presented

By

Yenruedee Chanwirawong

Master of Science in Information Technology

School of Technology

Shinawatra University

June 2007

Copyright of Shinawatra University

Title: Towards Intelligent and Adaptive Digital Library

By: Yenruedee Chanwirawong

Program: Master of Science in Information Technology

Advisor: Dr. Md Maruf Hasan

Academic Year: 2006

iii

The Thesis is Accepted by the School of Technology, Shinawatra University

in Partial Fulfillment of the Requirements for the Degree of Master of Science in

Information Technology.

................................................ Acting Dean, School of Technology

(Asst. Prof. Dr. Prinya Tantaswadi)

Committee:

................................................ .Advisor

(Dr. Md Maruf Hasan)

................................................ Committee

(Assoc. Prof. Dr. Ekawit Nantajeewarawat)

................................................ Committee

(Asst. Prof. Dr. Chutiporn Anutariya)

................................................ External Examiner

(Dr. Kazuhiro Takeuchi)

June 2007

iv

Acknowledgments

I would like to express my sincere gratitude to Dr. Maruf Hasan for the

support he has provided me during the entire course of my graduate studies at

Shinawatra University. I have benefited a lot from his guidance, on scholarly as well

as spiritual levels, and my work would never have been a success without his

encouragement.

I am thankful to Assoc. Prof. Dr. Ekawit Nantajeewarawat for helping me get

started with this project, and correcting my course as necessary. I am also thankful to

Asst. Prof. Dr. Chutiporn Anutariya for her valuable suggestions towards improving

my thesis, and for helping me defend it.

I thank all members of MSIT. The interactions that I have had with many of

you over the last two years have been wonderful learning opportunities for me. I

thank Kritsada Klaimak for helping me with several aspects of this work. I wish all of

them very successful careers ahead.

Last, but not the least, I am grateful to my parents whose love and

encouragement have made this work possible. I would not be who I am today without

their blessings.

v

Abstract

A digital library is a collection of documents in organized electronic form and

accessible via search and browsing interfaces. Depending on the specific library, a

user may be able to access magazine articles, books, papers, images, sound files, and

videos. The availability of contents, user profile and usage pattern in a digital library

in machine understandable formats paves the way of processing these information

further using state-of-the-art technologies to introduce intelligent services in digital

library. Analyzing, annotating and organizing contents based on a domain-ontology

(Middleton, Roure, & Shadbolt, 2001) give us the ability in making topical inferences

through content relationship. User’s rudimentary or incomplete profile can also be

augmented with the help of profile-ontology and usage pattern. Nevertheless, we can

also use sophisticated mathematical models to process usage-pattern data for making

content recommendations using collaborative filtering (Sarwar, Karypis, Konstan, &

Riedl, 2001) and to model interest-drift in users. Serving digital library users with the

right information which best reflect their query, profile, usage history and content

relationships is only possible when we consider the above issues in a unified manner.

In this research, we use an open-source digital library system. We develop and

integrate add-on ontologies and necessary algorithms to demonstrate intelligent

services in the digital library. Our prototype can demonstrate that the retrieval results

for the same query by a particular user, but on different time, may yield different

result-sets (ranking) since the query, profile and contents are dynamically enhanced

using intelligent and adaptive algorithms. We also demonstrate that the same query

from different users may yield different result sets as justified by the differences in

their profiles and usage patterns. Our prototype can also recommend digital library

items to users by using collaborative filtering –based usage-pattern analysis. We test

our approach using a partially synthetic datasets and analyzed the results through

human judgments.

Keywords: Digital library, Recommender system, Collaborative filtering, Usage

pattern, Ontology

vi

Table of Contents

Title Page

Acknowledgments ............................................................................................................... i

Abstract ............................................................................................................................. v

Table of Contents .............................................................................................................. vi

List of Figures ................................................................................................................. viii

Chapter 1 Introduction ........................................................................................................1

1.1 Motivation ............................................................................................................ 1

1.2 Thesis outline ....................................................................................................... 2

Chapter 2 Material and Methodology ...................................................................................3

2.1 Digital Library Services ....................................................................................... 3

2.2 Recommendation System..................................................................................... 5

2.2.1 Recommendation Techniques ....................................................................... 6

2.3 Ontologies .......................................................................................................... 10

2.3.1 Domain ontology ........................................................................................ 10

2.3.2 User profile ontology .................................................................................. 12

2.4 Single Exponential Smoothing .......................................................................... 14

2.4.1 The basic idea of the Exponential smoothing ............................................. 15

2.4.2 Assigning interest value .............................. Error! Bookmark not defined.

2.5 Usage-Pattern Analysis using Collaborative Filtering ...... Error! Bookmark not

defined.

2.6 Temporal Changes in User Interest (User Interest-drift) .. Error! Bookmark not

defined.

Chapter 3 Implementation Detail ....................................................................................... 17

3.1 System overview ................................................................................................ 18

3.2 The components of our digital library ............................................................... 19

3.2.1. Access log .................................................................................................. 19

3.2.2. ACM Computing Classification Ontology ................................................ 20

3.2.3. User Profiles............................................................................................... 20

3.2.4. User Machine with Web Browser .............................................................. 21

vii

Chapter 4 Experimental Result and Evaluation ................................................................... 24

4.1 Experimental Setup and Result Analysis ........................................................... 24

4.1.1User Similarity ............................................................................................. 24

4.1.2 Single Exponential Smoothing ................................................................... 25

Chapter 5 Conclusions and Future Work ............................................................................ 28

References ....................................................................................................................... 29

Biography ........................................................................................................................ 31

viii

List of Figures

Title Page

Figure 2. 1: Example of sources content in digital library .......... Error! Bookmark not

defined.

Figure 2. 2: Examples of Greenstone Digital library interface ... Error! Bookmark not

defined.

Figure 2.3: The Collaborative Filtering process ............................................................ 6

Figure 2. 4: Cosine based example ................................................................................ 8

Figure 2. 5: TechLens interface ................................... Error! Bookmark not defined.

Figure 2. 6: Example of ACM Computing Classification System .............................. 11

Figure 2.7: Schema for ACM-CCS Ontology based on CCS Taxonomy .................... 11

Figure 2.8: Example of inference using ACM-CCS ontology ..................................... 12

Figure 2. 9: User Profile Ontology .............................................................................. 13

Figure 2. 10: The i x j Matrix used for recommendation ............ Error! Bookmark not

defined.

Figure 2.11: Overview of Adaptive and Intelligent Service features in Digital

Libraries. (a) Horizontal Axis illustrates that search results for a particular user

using the same keyword at different time yield different retrieval results. (b)

Vertical Axis illustrates that search results for different users based on the same

keyword may yield different retrieval results. ..................................................... 17

Figure 3.1: an overview of the software infrastructure used in the prototype. ............ 19

Figure 3.2: Example of access log ............................................................................... 19

Figure 3.3:example of ACM Computing Classification System Ontology ................. 20

Figure 3. 4: Example of user similarity in the database ............................................... 20

Figure 3.5: Screen shot of a sample user profile .......................................................... 21

Figure 3. 6: Log in page ............................................................................................... 22

Figure 3.7: Digital library prototype homepage .......................................................... 22

Figure 3. 8: Example of search result from our digital library .................................... 23

Figure 4. 1: Examples of user profiles ......................................................................... 24

ix

Figure 4. 2: Examples of user profiles ......................................................................... 25

Figure 4. 3: Example of JAVA code for calculating user similarity ........................... 25

Figure 4. 4: Graphical representations the relation between weight and α .................. 26

List of Tables

Title Page

Table 2.1: Sample of Parameter fi relate with parameter t .......................................... 16

Table 4 1: Display a sample weight calculation for a user. ......................................... 26

Figure 4.2: Weight of topic for a use .......................................................................... 27

1

Chapter 1

Introduction

1.1 Motivation

A digital library is a collection of documents in organized electronic form and

accessible via search and browsing interfaces. The availability of contents, user

profile and usage pattern in a digital library in machine understandable formats pave

the way of processing these information further using state-of-the-art technologies to

introduce intelligent services in digital library. Analyzing, annotating and organizing

contents based on a domain-ontology (Middleton et al., 2001) give us the ability in

making topical inferences through content relationship. User’s rudimentary or

incomplete profile can also be augmented with the help of profile-ontology and usage

pattern. Nevertheless, we can also use sophisticated mathematical models to process

usage-pattern data for making content recommendations using collaborative filtering

(Sarwar et al., 2001) and to model interest-drift in users. Serving digital library users

with the right information which best reflect their query, profile, usage history and

content relationships is only possible when we consider the above issues in a unified

manner. There are intensive researches and development on digital library over the

last decades. However, many of these researches failed concentrating on the

intelligent services for digital library in a unified (Liao, Liao, Kao, & Harn, 2006). In

this thesis, we present a unified approach towards developing adaptive and intelligent

digital library services.

Majority of digital library researchers focused on providing access to diverse digital

information resources using so-called searching and browsing interfaces (G.G.

Chowdhury & S. Chowdhury, 2003). Searching interface may range from basic

keyword search to field-specific advanced search, etc. Browsing interfaces include

categorical navigation based on certain taxonomy and meta-data such as browse by

author, browse by category, etc. We argue that what makes a digital library unique is

the availability of content in electronic form (which can be processed automatically

and inferences can be made), and the availability of user profile and usage patterns.

Unlike the WWW, Google-like keyword search or Yahoo-like Directory is certainly

not adequate for harnessing information in the context of a digital library. Therefore,

2

we tried to make use of DL content, user-profile and usage patterns and developed

necessary algorithm to facilitate intelligent and adaptive services for digital library in

a unique fashion.

Using our approach, we successfully demonstrate that the retrieval results for the

same query by a particular user, but on different point of time, may yield different

result-sets (ranking) since the query, profile and contents are dynamically enhanced

using intelligent and adaptive algorithms. We also demonstrate that the same query

from different users may yield different result sets as justified by the differences in

their profiles and usage patterns. Our prototype can also recommend digital library

items to users by using collaborative filtering based usage-pattern analysis.

1.2 Thesis outline

In chapter 2 we are introducing digital library, example of using digital library and

limitation of current digital library. Also recommendation technique and integration

of digital library and recommender system are discussed there. We also explore the

idea of using both domain ontology and user-ontology we have used in our system.

Single Exponential Smoothing which used to decrease weights in digital library is

discussed in this chapter as well. Next, in the chapter 3, the prototype system

architecture and system component are specified. Following that, in chapter 4, the

evaluations are performed to measure that effectiveness of our approach. Finally, in

chapter 5, the conclusions of the thesis are presented. Also chapter 5 identifies the

areas that are in need of further research.

3

Chapter 2

Material and Methodology

2.1 Digital Library Services

Digital libraries (DL) have become a major part of the mainstream library landscape.

Content for digital libraries usually requires manipulation by several tools, each with

it own benefits and drawbacks. In this section, current digital library system, tools for

digital library, advantages and the limitation of current digital libraries are discussed.

As G.G Chowdhury & S. Chowdhury stated, a digital library is an assemblage of

digital computingm stoage, and communications machinery together with the content

and software needed to reproduce, emulate, and extend the services provided by

conventional libraries based on paper and other material means of collecting,

cataloging, finding, and disseminating information. A full service digital library must

accomplish all essential services of traditional libraries and also exploit the well-

known advantages of digital storage, searching, and communication G.G. Chowdhury

& S. Chowdhury, 2003).

Being a digital library, the digital libraries contents can be database of text, numbers,

graphics, sound, video, etc. In general, DL contents are organized to make

information accessible in particular, well-defined ways - and good ones will include a

description of how the information is organized (G.G. Chowdhury & S. Chowdhury,

2003).

Figure 2.1: Example of sources contents in digital library

4

There are several digital libraries software available, for example, The New Zealand

digital library project, presently called the Greenstone Digital library (New Zealand

Digital Library Project, 2000), is a research program at the University of Waikato.

Greenstone is an example of a software suit for building, maintaining and distributing

digital library collections. Greenstone includes the following functions, full-text and

fielded search, flexible browsing facilities, metadata-based (Dublin Core), collection-

specific, hierarchical phrase browsing supported, creates all access structures

automatically (see Figure 2.2, below for an example of Greenstone Interface).

Figure 2. 2: Examples of Greenstone Digital library interface

If we compare digital libraries with its traditional counterparts, digital libraries give us

many advantages. For example the user of a digital library need not to go to the

library physically; people from all over the world can gain access to the same

information, as long as they are connected to the network. People can gain access to

the information at any time. Unlike physical media, digital library contents can be

used by a number of users at the same time. Lastly, the DL user is able to use

advanced electronic facilities such as keyword search to explore information digitally

and efficiently (G.G. Chowdhury & S. Chowdhury, 2003; Liao et al., 2006).

With the advantages of using DL, we investigate to see if it can be improved to be

more intelligent and personalized based on user profile. From our investigations and

observations, we notice that most digital libraries provide basic search and navigation

functions which maybe adequate for Internet content but are merely sufficient in the

5

context of a digital library. In DL context, we have been continuously failing to take

advantage of electronic contents using advanced content processing techniques, and

also failing to make use of user profile and usage history, etc.

However, automatic analysis and organization of contents based on user perspective

and profile is desirable since machine-readable electronic contents can be further

processed and annotated in order to make better sense about them.

Generally, keyword search and navigation may only facilitate and entry-level access

to the content in DL. Efficient access to DL content is only possible when we succeed

to make use of heterogeneous information and facets under a unified framework. For

example, (1) keyword-based search often produces enormous amount of irrelevant

hits. Different users may target different items when using the same keyword to

search. (2) When DL users sign up, they may inadvertently provide incomplete or

inaccurate profile information about themselves. Moreover, (3) user’s profile,

information-seeking behaviour and information-needs, etc. change over time.

In this research, we try to augment user profile adaptively using usage history and

using DL user’s profile ontology. We also analyze DL contents to consider ontology-

driven inference for content association (e.g., topic relationships) to make topical

relate to us.

2.2 Recommendation System

Recommendation system are widely used online (e.g.: in Amazon.com) to suggest to

users items they may like or find useful. Recommendation system has become

popular since the mid 1990’s. In this section, the current recommender system and

some recommendation techniques are described.

A recommendation system is designing, managing and delivering the content based

on known, observed and predictive information. Recommendation techniques match

an individual, personal preferences and user habits on a user profile to make

individual recommendation.

There are several filtering techniques can be used in the recommendation system.

6

Collaborative filtering and content-based recommending are two fundamental

techniques that have been proposed for performing recommendation. Both techniques

have their own advantages however they cannot perform well in many situations. To

improve performance, various hybrid techniques have been considered.

2.2.1 Recommendation Techniques

Collaborative filtering technique

Several existing collaborative-filtering-based recommendation systems have been

designed and implemented since early 90’s. Collaborative filtering techniques have

been proven to provide satisfying recommendations to users.

Collaborative filtering (CF) is one of the key techniques for implementing a

recommender system that recommends to a user a set of candidate items, which may

be preferable or useful to the user. We use CF algorithm for our recommendation

(Sarwar et al., 2001)

Collaborative filtering in general works as follows (see Figure 2.3). Take a set of

users and a set of documents. As users interact with the system they rate items

(obtrusively or unobtrusively). These ratings are collected in a two-dimensional roster

(the first dimension being the set of users and the second dimension being the set of

documents). Not yet encountered documents are rated by computing a neighborhood

of the point for the rating being predicted and combining the ratings in that

neighborhood somehow.

Figure 2.3: The Collaborative Filtering process

7

There are two different approaches within collaborative filtering for how to combine

the ratings of the different users into predicted ratings for unseen items. One is User-

based collaborative filtering and another is Item-based collaborative filtering. These

two approaches are called user based collaborative filtering and item-based

collaborative filtering, respectively.

a. User-based collaborative filtering first computes a neighborhood for the active

user (i.e., the user for whom the prediction is made). This neighborhood consists of

users which are similar to the active user and have rated the active item. The ratings

of the users in this neighborhood for the active item are then combined into a

predicted rating.

b.Item-based collaborative filtering (Sarwar et al., 2001) is conceptually very

similar: we just switch the dimensions. So instead of computing a neighborhood of

users, we compute a neighborhood of items similar to the active item (i.e., the item

about which the prediction is made). Then we combine the ratings of the active user

for the items in this neighborhood into a predicted rating. NOTE: the collaboration,

that is the use of the other user's profiles, is located in the similarity computation

GroupLens is a example of a collaborative system that uses the item-based technique

by computing item-item similarity. GroupLens is a system for collaborative filtering

of netnews, to help people find articles they will like in the huge stream of available

articles. The Item-Based technique first analyzes the relationship between different

items on a user-item matrix then computes the recommendation for user (Konstan et

al., 1997)

TechLens is another example of Item-Based Collaborative Filtering Recommendation

system. TechLens developed a generic DL recommendation model using the

Collaborative filtering approach by analyzing relationships between citations. The

system uses the opinion of a community to recommend items to individuals. The

system will recommend items to a user that theirs neighbours may have voted on

(Konstan et al., 1997)

8

Figure 2. 4: TechLens interface

Item Similarity computation

Cosine similarity (Sarwar et al., 2001)

Here the similarity is measured as the angle between the two vectors. Formally, in the

m x n ratings matrix, similarity between user/items i and j is given by: (See Figure 2.4

as example of Cosine based calculation)

Figure 2. 5: Cosine-based example

User Similarity computation

To measure similarity between two profiles, user profiles are thought of as two

vectors in the m dimensional user-space. The similarity between them is measured by

HP

LR

George

1

4

Henriette

4

1

sim(HP, LR) = 0.47

9

computing the cosine of the angle between these two profiles (Sarwar et al., 2001)We

denote profx and profy as the cosine of the angle between them and can be computed

as:

Sim ( profx, profy ) = profx . profy / ( | profx || profy | ) (1)

where “.” Denotes the dot=product of the two vectors

The CF engine computes the similarity and the recommendation score to make a

recommendation. When making a recommendation, CF engine used the bookshelf in

form of i x j matrix as shown in figure 2.9 where Bi denotes i-th books and Ui denotes

i-th users. If useri adds bookj to bookshelf, the value of Ui,j, is set to 1 and if booki

does not add to bookshelf, the value of Ui,j is set to 0.(NIST/SEMATECH, 2007)

Usage-Pattern Analysis using Collaborative Filtering

The digital library registered users are able to create personal bookshelf by adding

bookmark to theirs bookshelf. Since Different groups of people obviously have

different interest, and then people in the same group may have the same interest. We

can classify users into some group by finding the similar between users. In a group of

people, some books have read by user 1, may also be interested by user 2. From our

assumption, then we can recommend the items to people in the same group. We can

also predict rating of new items for the target users, based on user bookshelf and

rating of similar user.

.

Figure 2. 6: The i x j Matrix used for recommendation

Content-Based filtering technique.

In content-based techniques, the user model includes information about the content of

items of interest whether these are web pages, movies, music, or anything else. Using

these items as a basis, the technique identifies similar items that are returned as

10

recommendations. These techniques might prove highly suitable for users who have

specific interests and who are looking for related recommendations (Paulson &

Tzanavari, 2003).

Several researchers use the Content-based filtering technique in recommendation

system.algorithm, for example, Robin van Meteren and Maarten van Someren

proposed PRES (Meteren & Someren, 2000) that use content-based filtering

techniques to suggest document that relevance to user profile. The user profile was

created by user feedback.

Hybrid techniques

Hybrid techniques seem to promise to combine the positive features of both content-

based and social filtering methods, diminish their shortcomings, and thus produce a

more robust system. The philosophy here is that the content of items is taken into

consideration when identifying similar users for collaborative recommendation.

Zan Huang, Wingyan Chung et al. (Huang, Chung, Ong, & Chen, 2002) proposed the

idea of using a graph-based recommendation system for digital libraries by combining

the content-based and collaborative approach. The similarities between items, item

and user, and user and other users based on the features of items and users are

computed and weights. Then the assigned item with higher weights is recommended

to user by the system.

2.3 Ontologies

Ontologies are the basis for rich, semantic descriptions of the content in the digital

library. In this research, we can identify two main modules in our propose system,

domain ontology and user profile ontology.

2.3.1 Domain ontology

Domain ontology describes aspects that are specific to particular domains and is used

as a conceptual backbone for structuring the domain information provided in the

information spaces. Such domain ontology typically comprises conceptual relations,

such as a topic hierarchy, but also richer taxonomic and non-taxonomic relations

(Haase, Volker, & Sure, 2005).

11

Due to the Semantic Web initiative, ontology received a renewed interest and found

its application in many practical applications. As explained earlier, a digital library is

a focused collection of digital resources in a specific domain; we can use domain

ontology as reference to make sense of contents (instances) in that domain by means

of attributes and class relationships. For our experiments, we focused on Computing

Domain and developed our domain ontology based on ACM Computing

Classification System (hereafter, ACM-CCS). Our approach is similar to what is used

in (MIT Libraries, 2002) We have enhanced the original ACM-CCS taxonomy

(Figure 2.6) with extra attributes such as, hasKeywords (Figure 2.7). There are several

ways to automatically extract keywords, and we used Keyphrase Extraction Tool

(KEA) from University of Waikato, New Zealand to automatically extract keywords

for ACM-CCS sub-topics (using machine-learning technique).

Figure 2. 7: Example of ACM Computing Classification System

Figure 2.8: Schema for ACM-CCS Ontology based on CCS Taxonomy

12

inComposedBy

hasKeyword

hasKeyword

Figure 2.9: Example of inference using ACM-CCS ontology

From the ACM-CCS ontology as illustrated in Figure 2.6B, when user searches some

items in the digital library, we can infer from the hasKeyword attribute. We then

know the category of the item that user is searching (most likely related to user’s

present context). In Figure 2.7, we explain an intuitive example of how topic-

inference is performed in our experiment. When a user searches with multiple

keywords or the user-profile and usage-pattern include one or more topics, such as,

“Volume rendering” and “Unstructured grids”, our ontology-driven inference can

precisely identify the user in his or her context. The keyword, “Volume rendering”

appears in both Graphic System and Hardware Architecture categories; but for the

keyword “Unstructured Grids” is only found in Hardware Architecture category.

Therefore, the result should be more biased in Hardware Architecture category for

this particular user. The efficiency of searching in digital library can be improved by

using domain ontology as well as profile ontology as explained below.

2.3.2 User profile ontology

The class defined in the system’s ACM Computing Classification System ontology is

used for user profiles. The user profiles will then hold the interested information about

the users of the digital library system. Thus the user profile represents in term of

which category users are interested in. This profile knowledge will drive the digital

library system. Further more, our user profile ontology allows inferences via is-a

relationship defined in the domain ontology.

13

Typically user profiles require questionnaires and interviews with user to acquire

information about users’ requirement before a profile can be built. A user profile

consists of facts about users and theirs interests. Weight for each topic that user

interested in will be calculated and kept in the profile ontology and this knowledge

about user will improve the correctness of searching in the digital library see Figure

2.8 for example of user profile ontology (Liao et al., 2006).

Figure 2. 10: User Profile Ontology

The profiles hold the information about interest topic and its weights. We can

inference the interest topic from user ontology. Using user profile, we can inference

user interest topic without user browsed explicitly. Everyday user profiles are

computed. The interest value for each interest topic in user profiles will be assigned

when user browse items. The 50% of interest values will be assigned to the super-

class of that interest topic also (Liao et al., 2006; Middleton et al., 2001)

Since user interests change overtime, using user’s profile is not enough. Analyzing

user’s behaviors have been using to improve the recommender system. Many

researchers developed several approaches to recommend item to user by learning user

behavior. Middleton et al., (Middleton et al., 2001) presented the approach to acquire

user profile by unobtrusive monitoring of browsing behaviors and application of

supervised machine-learning techniques coupled with an ontological representation to

extract user preferences. Similar to David M Nichols et al., (Pennock, Horvitz,

Lawrence, & Giles, 2000) they suggested using usage data in recommendation for

digital library. Since a person normally goes through a series of stages in finding out

about an item, ‘Discovery model’ illustrated user activity and response to item. Joana

14

Trajkova et al. (Trajkova & Gauch, 2003) focused on identifying user’s interests

without user’s interaction technique ‘Ontology-Based user profile without the user

interaction’. They built a user profile by collection user data (URL, date, time spent

on the page) via a proxy server then match the document to user profile in a pre-

existing ontology.

However, the accuracy of profiling relies on sufficient user behavior history. This

approach will be ineffective if user have less interactive with the system. Knowing

that user’s domains of interests can reduce this gap. T.Jonathan Lau and Autin J.Wang

(Lau & Wang, 2006) presented a technique to acquire user’s interest by inferring

user’s interest base on demographics information such as profession, religion,

ethnicity, and age. They used Open Mind common sense database to generate user

profiles and applies them in content recommendation.

User profiling not only use in matching the item to user but it also user to improve

searching and browsing in digital library. Kruk (Kruk, 2005) presented JeromeDL

which they especially focus exploitation of personal profile information based on

Semantic Web for searching in digital libraries. Jerome ontology modified from

DublinCore metadata includes the permission of structured values as well as

additional definitions of keywords and catalog classifications (domains of interest) of

resources. Each keyword concept is connected to other concepts with properties:

hyponyms, synonyms, homonyms, semantic fields and categorization. Each concept

has a list of often-used lexical variants of the word. That can help to get the stem for

the concept when the user provides a different variant of it. The search algorithm in

implemented in JeromeDL was designed to support the query. The query should

return items where descriptions do not directly contains the required values and the

meaning of values provided in the query should be resolved in the context of user’s

interests.

2.4 Single Exponential Smoothing

Time is an important factor for the interest value as well. Because, in general, users’

like items and dislike items will always change. For example, a 15 years old girl was

interested in items such as teen-age fashion, cakes, pets, and so on. When she became

15

20 years old, she may be interested in something else, not things for 15 years old girl

such as she may be interested in cosmetics, adult fashion, cooking, and so on. In user

profile, we should aware of these changes. Therefore the recent interest value for

items more will be higher than the interest value of items which user browed or rated

for long time ago. Then when users search some items in the digital library, the result

will more match with current user interest (Yang & Li, 2005)

As we addressed, the users' interests changing is related to the time. Since the recent

observations are given relatively more weight in forecasting than the older

observations, we then use Single Exponential Smoothing method to assigns

exponentially decreasing weights to forecast future values for each topic to users in

our system (NIST/SEMATECH, 2007).

2.4.1 The basic idea of the Exponential smoothing

The basic idea of the Exponential smoothing is value in more recent time period have

more impact on the forecast therefore weighting factors in our system will decrease

exponentially. We calculate the smoothed value by using formula

(NIST/SEMATECH, 2007).

St = αyt-1 + ( 1 – α )St-1 0 < α ≤ 1 t ≥ 3

where t = any time period

St = the smoothed value

α = smoothing constant

yt-1 = the current observation

When smoothing constant (α) is close to 1, weight decreasing is quick and when

smoothing constant (α) is close to 0, weight decreasing is slow. Then we define

another method to find the best smoothing constant (α) for each user depending on

usage history. (The parameters values in this experiment are decided heuristic).

Here, firstly, we define parameter fi as:

fi = Ni / MaxN (2)

where fi Frequency of usage the system for useri

16

Ni Number of transactions for useri

MaxN The highest number of transaction for all users

Table 2.1: Sample of Parameter fi relate with parameter t

From the fi parameter, then we can find the appropriate values for smoothing constant

to each user.

2.4.2 Assigning interest value

An interest value can be assigned to each interest topic by many events. For example

when user browsed the books, the interest value will be 1. If users rate the interest

topic then the interest value will be 0.9. If users follow the item that system

recommends then the value of interest topic will be 1. If users add the item to their

bookshelf, the value of interest topic will be 1. At the same time, every time interest

value is assigned to each topic, the system will assign 50% of interest value to its

super class as well and this value will be calculated by time function also (Middleton

et al., 2001)

17

Chapter 3

Implementation Detail

3.1 Temporal Changes in User Interest (User Interest-drift)

In this section, we discuss about the model of our proposed system. In figure 3.1, it

depicts the system overview and composes of user profile, usage pattern, and digital

library ontology. When users register to our system, user can rate the user interest

topics and then weight will be calculated. At the end of the day, user similarity and

the value of the interest topic will also be calculated. The weight of each topic will be

reduced by 50% of its current weight for every a period of time (different user will

have different period of time to reduce the weight depending on how often users use

the system). Then the longer item exists in the system will have less importance than

the new one. The result is when userj search with the same keyword on a different

time, at time t0 and time t1, will get different result. In the meantime, different users

(useri and userj) search with the same keyword, at time t0 will also get different result.

Figure 3.1: Overview of Adaptive and Intelligent Service features in Digital

Libraries. (a) Horizontal Axis illustrates that search results for a particular user using

the same keyword at different time yield different retrieval results. (b) Vertical Axis

illustrates that search results for different users based on the same keyword may yield

different retrieval results.

18

3.2 System overview

In order to evaluate the effectiveness of our approach, we implemented a personalized

digital library system prototype. The overall architecture is based on dynamically

generated web pages which display the results from a "back end" computational

subsystem. Data for these computational layers is stored in a database.

There are many factors involved in determining the architecture of a digital library. In

making this decision, one must determine how the system will be used and who the

users will be. We want the system to be robust and scalable, but we also have to face

the reality of a limited budget. We also want the system to be available to users

throughout the network. Based on an analysis of the use of a system such as ours, we

decided that a web application would be the ideal architecture for our digital library.

We are using the Tomcat server to host our Web application. Tomcat is an application

server from the Apache Software Foundation that executes Java Servlets and renders

Web pages that include Java Server Page coding. Described as a "reference

implementation" of the Java Servlet and the Java Server Page specifications, Tomcat

is the result of an open collaboration of developers and is available from the Apache

Web site in both binary and source versions. Tomcat can be used as either a

standalone product with its own internal Web server or together with other Web

servers, including Apache, Netscape Enterprise Server, Microsoft Internet

Information Server (IIS), and Microsoft Personal Web Server. Tomcat requires a Java

Runtime Enterprise Environment that conforms to JRE 1.1 or later. Tomcat is one of

several open source collaborations that are collectively known as Jakarta.

Any recommendation system also requires a database in order to store information

about items, users, and the ratings of users on items. For the database, MySQL was

chosen, since it is freely available, fast, and well documented

This structure is described below and illustrated by Figure 3.1.

19

Figure 3.2: an overview of the software infrastructure used in the prototype.

3.3 System architecture

3.3.1. Access log

Access log files are files that contain a list of actions that have occurred in a system.

Analysis of these files can not only tell who, what, when and where but how

information in the system was sought and used. Access log file analysis is a

quantitative method that is used to monitor usage of systems and gain an

understanding of users in many domains. By tracking users’ interactions with the

application in a log file, it is possible to collect useful information that can be used to

assess what the main interests of the users are. In this way, we are able to obtain

implicit feedback and to extract needs for changes to the ontology to improve the

interaction with the application.

Figure 3.1: Example of access log

20

3.2.2. ACM Computing Classification Ontology

We use the ACM Computing Classification System ontology to represents the

documents classification in our digital library. We have enhanced the original ACM-

CCS taxonomy with extra attributes such as, hasKeywords. We learn user interests

from the user queried keywords and keyword paths in the ontology.

Figure 3.2:example of ACM Computing Classification System Ontology

3.2.3. User Profiles

Gathering the information which is thought to be of interest to the user is one of the

most difficult tasks in digital library system development. A user profile will consist

of the user's account details and areas of interest. Information about a user’s interests

describes user’s information needs in terms of information types and types of

contents, and short-term and long-term interests.

Figure 3. 3: Example of user similarity in the database

21

After building the initial profile in digital library, we build the profiles without user

interaction, automatically monitoring the user’s browsing habits. Interest profiles are

computed daily by correlating previously browsed books with their classification.

User profiles thus hold a set of topics and interest values in their topics for each day

of the trial. Ontological relationships between topics of interest are used to infer other

topics of interest, which might not have been browsed explicitly.

Figure 3.4: Screen shot of a sample user profile

3.2.4. User Machine with Web Browser

A Web user can access our digital library through a Java enabled Web browser. Most

modern Web Browsers such as Microsoft Internet Explorer, Mozilla, and Netscape

support Java.

22

Figure 3. 5: Log in page

Figure 3.6: Digital library prototype homepage

23

Figure 3. 7: Example of search result from our digital library

24

Chapter 4

Experimental Result and Evaluation

Our prototype is a recommendation system that allows users to explore the book

collection. When a user search a book, the system will look for book in the user

interest topic and in the same time the system will looks for the books in similar user

who search for the same interest topic and recommend the book that user have never

try before. Since our system can learn user interest which can change over time so the

items that the system will recommend to user matches with user interest in current

time.

4.1 Experimental Setup and Result Analysis

In order to demonstrate the effectiveness and the efficiency of our proposed approach,

a few experiments are conducted.

4.1.1User Similarity

We generated random profiles for 10 users. Then we use this information to find the

similarity between each user. Figure 4.1 shows the user profiles. The value ‘1’ means

users interested in that category. The value “0” means users not interested in that

category. After calculating the similarity between two users using (3), then we can

find the nearest neighbor for useri. Figure 4.2 displays the result of user similarity

calculation.

Figure 4. 1: Examples of user profiles

25

Figure 4. 2: Examples of user profiles

Figure 4. 3: Example of JAVA code for calculating user similarity

As described in previous section, the recommendation of the collaborative method

depends on the user similarity. Therefore, this method tends to provide unexpected

finding due to the information sharing between relevant users.

4.1.2 Single Exponential Smoothing

We generated α value to find the appropriate α for each user behaviour. From our

experiments, we found that if α is small, the weight will reduce faster than α is big. So

26

we can apply this to each user that is if user often use the system then the system

should assign a small α to them. But if users not often use the system then the system

should assign a big α to them.

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Days

weig

ht

alpha = 0.5

alpha = 0.4

alpha = 0.2

alpha = 0.1

alpha = 0.05

Figure 4. 4: Graphical representations the relation between weight and α

The weight of a topic of useri is calculated by using (1). From fig. 9 we can see that

the weight will be reduced and increase depend on the user actions.

Table 4 1: Display a sample weight calculation for a user.

In figure 4.5, we can see that the weight of each user interest can reduce and increase

everyday depend on the history.

27

0

0.2

0.4

0.6

0.8

1

1.2

1 2 3 4 5 6 7 8 9 10

days

weig

ht

sample user

Figure 4.2: Weight of topic for a use

Our prototype can demonstrate that the retrieval results for the same query by a

particular user, but on different time, may yield different result-sets (ranking) since

the query; profile and contents are dynamically enhanced using intelligent algorithms.

We also demonstrate that the same query from different users may yield different

result sets as justified by the differences in their profiles and content relationships.

Our prototype can also recommend digital library items to users by using

collaborative filtering–based usage pattern analysis. We test our approach using a

partially synthetic datasets and analyzed the results through human judgments.

28

Chapter 5

Conclusions and Future Work

In this paper, we presented a novel approach to associate contents with user

profile and usage pattern and modelling interest-drift in digital libraries. We validated

our approach with synthetic data. However, we are currently setting the digital library

with real contents and users to put our system into practical system. An experimental

evaluation showed that our approaches provide the better-personalized services for

digital library. Two different users search with the same query yield different results.

When same user searches with the same query but in the different time they will get

different results.

For the future work, we will include systematic evaluation based on real digital

library.

29

References

Chowdhury, G. G., & Chowdhury, S. (2003). Introduction to digital libraries (359

ed.). London: Facet Publishing.

Haase, P., Volker, J., & Sure, Y. (2005). Management of dynamic knowledge.

JOURNAL OF KNOWLEDGE MANAGEMENT, 9(5), p. 97-107.

Huang, Z., Chung, W., Ong, T. H., & Chen, H. (2002, June 14-18, 2002). A Graph-

based Recommender System for Digital Library. Paper presented at the

ACM/IEEE Joint Conference on Digital Libraries, JCDL 2002, Portland,

Oregon, USA. Retrieved 2006/04/15, from

Konstan, J., Kapoor, N., McNee, S., & Butler, J. (2005). TechLens: Exploring the Use

of Recommenders to Support Users of Digital Libraries. In Fall 2005 Task

Force Meeting. Phoenix, AZ:(Coalition for Networked Information).

Konstan, J., Miller, B., Maltz, D., Herlocker, J., Gordon, L., & Riedl, J. (1997).

GroupLens: Applying Collaborative Filtering to Usenet News.

Communications of the ACM, 40(3), p. 77-87.

Kruk, S. 2005, personal communication, 10/1/2007,2005.

Liao, I., Liao, S., Kao, K., & Harn, I. (2006, November 27-30, 2006). A Personal

Ontology Model for Library Recommendation System. Paper presented at the

9th International Conference on Asian Digital Libraries,ICADL 2006, Kyoto,

Japan.

Lau, T. J., & Wang, A. (2006). AAA: a Profiling and Recommendation System.

Cambridge, MA 02142 USA:(Laboratory of Computer Science Cambridge).

Meteren, R. V., & Someren, M. V. (2000, May, 2000). Using Content-Based Filtering

for Recommendation Paper presented at the MLnet / ECML2000 Workshop,

Barcelona, Spain.

Middleton, S. E., Roure, D. C., & Shadbolt, N. R. (2001, 15 April 2002). Capturing

knowledge of user preferences: ontologies in recommender systems. Paper

presented at the Proceedings of the International Conference on Knowledge

Capture (KCAP'01), Victoria, British Columbia, Canada.

MIT Libraries (2002, 05/25/2006). DSpace. Retrieved 2/2/2007. 2007, from

http://www.dspace.org/

30

New Zealand Digital Library Project, U. o. W. (2000). Greenstone Library Software.

Retrieved 05/25/2006. 2006, from http://www.greenstone.org/

NIST/SEMATECH (2007, 7/18/2006). NIST/SEMATECH e-Handbook of Statistical

Methods. Retrieved 10/02/2007. 2007, from

http://www.itl.nist.gov/div898/handbook

Paulson, P., & Tzanavari, A. (2003). Combining collaborative and contentbased

filtering using conceptual graphs. In Lecture Notes In Artificial Intelligence

Series.

Pennock, D., Horvitz, E., Lawrence, S., & Giles, C. (2000). Collaborative Filtering by

Personality Diagnosis: A Hybrid Memory- and Model-Based Ap. Paper

presented at the The Sixteenth Conference on Uncertainty in Artificial

Intelligence (UAI-2000), Stanford, CA.

Sarwar, B., Karypis, G., Konstan, J., & Riedl, J. (2001). Item-based Collaborative

Filtering Recommendation Algorithms. In The Tenth International World

Wide Web Conference (WWW10). pp. 285 - 295). Hong Kong.

Trajkova, J., & Gauch, S. (2003). Improving Ontology-Based User Profiles. Thesis.

University of Kansas. http://citeseer.ist.psu.edu/trajkova03improving.html.

Yang, Y., & Li, J. Z. (2005). Interest-based Recommendation in Digital Library.

Journal of Computer Science, 1, 40-46.

31

Biography

Name: Yenruedee Chanwirawong

Date of Birth: November 25, 1979

Place of Birth: Bangkok, Thailand

Institutions Attended:

May 1998 – April 2002 Bachelor of Information Science with First Class

Honours in Management Information System

Walailak University

Nakhonsithammarat, Thailand

June 2004 –March 2007 Master of Science in Technology (MSIT)

Shinawatra University

Bangkok, Thailand

Position and Office: Mainframe Developer, Cetelem (Thailand) Company

Limited.

Home Address: 160/260 Changarkat Uthit Road, Sikan, Donmueng,

Bangkok Thailand

E-mail: [email protected]

Towards Intelligent and Adaptive Digital Library Services

Documents

Transcript of Towards Intelligent and Adaptive Digital Library Services