Personal Digital Libraries

10
PDLib: Personal Digital Libraries with Universal Access * Francisco Alvarez-Cavazos ITESM, Campus Monterrey Eugenio Garza Sada 2501 Monterrey, Mexico [email protected] David A. Garza-Salazar ITESM, Campus Monterrey Eugenio Garza Sada 2501 Monterrey, Mexico [email protected] Juan C. Lavariega-Jarquin ITESM, Campus Monterrey Eugenio Garza Sada 2501 Monterrey, Mexico [email protected] ABSTRACT In this paper we present the PDLib concept: personal dig- ital libraries with universal access. A universally accessi- ble personal digital library system provides each user with a customizable, general purpose document repository (i.e a personal digital library) and the means to access it from any- place at anytime from most computing devices connected to the Internet, including mobile phones, PDAs and laptops, therefore granting access from anyplace at anytime. Per- sonal digital libraries provide traditional digital library ser- vices such as document submission, full-text and metadata indexing and document search and retrieval; augmented with innovative services for the moment to moment information management needs of the individual user and adapting the services to the device that is being used for interaction with the library. These innovations include flexible collection and metadata management, interaction with other digital libraries (whether personal digital libraries of other users or traditional digital repositories) and sharing of digital con- tent with other users. These requirements are addressed by a personal digital library system architecture that considers traditional digital library challenges and mobile computing challenges. The main components of the architecture are the Client-Side Applications, the Data Server and the Mobile Communication Middleware (MCM). The current architec- ture targets to support multiple device access and support for mobile devices is also considered while traditional ser- vices have to be adapted to cope with the restrictions of mobile devices. A proof-of-concept prototype implementa- tion of the PDLib System that exemplifies many of the main concepts is available and an overview of this prototype is presented in this paper. * This work has been funded by the Mexican National Sci- ence and Technology Council (CONACyT), the Mexican University Corporation for Internet Development Associa- tion (CUDI2 ) and ITESM, Campus Monterrey. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Joint Conference on Digital Libraries 2005 Denver, Colorado, USA Copyright 2005 ACM X-XXXXX-XX-30/05/01 ...$5.00. Categories and Subject Descriptors D.2 [Software Engineering]: Software Architectures; D.2.11 [Software Architectures]: Domain-specific architectures— personal digital library system ; H.3 [Information Storage and Retrieval]: Digital Libraries; H.3.7 [Digital Libraries]: Systems issues—per- sonal libraries, universal access, mobile environments General Terms mobile personal digital library system architecture Keywords personal libraries, universal access, mobile environment 1. INTRODUCTION We propose a universally available personal digital library system. It is “personal” in the sense that each user is pro- vided with a general purpose document repository (i.e. a personal digital library). It is “universally available” in the sense that it allows the user to access her/his personal per- sonal digital library from most computing devices connected to the Internet, including mobile phones, PDAs and laptops, therefore granting access “from anyplace at anytime.” Personal digital libraries provide traditional digital library services such as document submission, full-text and meta- data indexing and document search and retrieval; augmented with innovative services for the moment to moment infor- mation management needs of the individual user. These in- novations include provisions to customize the classification of documents, interact with other digital libraries (whether personal or collective) and support user-to-user exchange of generic digital content. Another distinction between our personal library system and traditional digital library systems is that it strives to provide its users with universal access, the capability of ac- cessing the personal digital libraries “from anyplace at any- time” [19]. Consequently, we have designed our system for mobile environments. Since our system extends traditional library services for the mobile environment in order to realize the abstraction of personal libraries, we must cope with the technological challenges imposed by the implementation of digital library services [4, 1, 7], the mobile environment [3, 13, 20] and the specific requirements of personal digital libraries. The effectiveness of the document retrieval process from personal digital libraries must be assured with the provision 1

Transcript of Personal Digital Libraries

PDLib: Personal Digital Libraries with Universal Access ∗

Francisco Alvarez-CavazosITESM, Campus MonterreyEugenio Garza Sada 2501

Monterrey, Mexico

[email protected]

David A. Garza-SalazarITESM, Campus MonterreyEugenio Garza Sada 2501

Monterrey, Mexico

[email protected]

Juan C. Lavariega-JarquinITESM, Campus MonterreyEugenio Garza Sada 2501

Monterrey, Mexico

[email protected]

ABSTRACTIn this paper we present the PDLib concept: personal dig-ital libraries with universal access. A universally accessi-ble personal digital library system provides each user witha customizable, general purpose document repository (i.e apersonal digital library) and the means to access it from any-place at anytime from most computing devices connected tothe Internet, including mobile phones, PDAs and laptops,therefore granting access from anyplace at anytime. Per-sonal digital libraries provide traditional digital library ser-vices such as document submission, full-text and metadataindexing and document search and retrieval; augmented withinnovative services for the moment to moment informationmanagement needs of the individual user and adapting theservices to the device that is being used for interaction withthe library. These innovations include flexible collectionand metadata management, interaction with other digitallibraries (whether personal digital libraries of other users ortraditional digital repositories) and sharing of digital con-tent with other users. These requirements are addressed bya personal digital library system architecture that considerstraditional digital library challenges and mobile computingchallenges. The main components of the architecture are theClient-Side Applications, the Data Server and the MobileCommunication Middleware (MCM). The current architec-ture targets to support multiple device access and supportfor mobile devices is also considered while traditional ser-vices have to be adapted to cope with the restrictions ofmobile devices. A proof-of-concept prototype implementa-tion of the PDLib System that exemplifies many of the mainconcepts is available and an overview of this prototype ispresented in this paper.

∗This work has been funded by the Mexican National Sci-ence and Technology Council (CONACyT), the MexicanUniversity Corporation for Internet Development Associa-tion (CUDI2 ) and ITESM, Campus Monterrey.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Joint Conference on Digital Libraries 2005Denver, Colorado, USACopyright 2005 ACM X-XXXXX-XX-30/05/01 ...$5.00.

Categories and Subject DescriptorsD.2 [Software Engineering]: Software Architectures; D.2.11[Software Architectures]: Domain-specific architectures—personal digital library system

; H.3 [Information Storage and Retrieval]: DigitalLibraries; H.3.7 [Digital Libraries]: Systems issues—per-sonal libraries, universal access, mobile environments

General Termsmobile personal digital library system architecture

Keywordspersonal libraries, universal access, mobile environment

1. INTRODUCTIONWe propose a universally available personal digital library

system. It is “personal” in the sense that each user is pro-vided with a general purpose document repository (i.e. apersonal digital library). It is “universally available” in thesense that it allows the user to access her/his personal per-sonal digital library from most computing devices connectedto the Internet, including mobile phones, PDAs and laptops,therefore granting access “from anyplace at anytime.”

Personal digital libraries provide traditional digital libraryservices such as document submission, full-text and meta-data indexing and document search and retrieval; augmentedwith innovative services for the moment to moment infor-mation management needs of the individual user. These in-novations include provisions to customize the classificationof documents, interact with other digital libraries (whetherpersonal or collective) and support user-to-user exchange ofgeneric digital content.

Another distinction between our personal library systemand traditional digital library systems is that it strives toprovide its users with universal access, the capability of ac-cessing the personal digital libraries “from anyplace at any-time” [19]. Consequently, we have designed our system formobile environments.

Since our system extends traditional library services forthe mobile environment in order to realize the abstractionof personal libraries, we must cope with the technologicalchallenges imposed by the implementation of digital libraryservices [4, 1, 7], the mobile environment [3, 13, 20] and thespecific requirements of personal digital libraries.

The effectiveness of the document retrieval process frompersonal digital libraries must be assured with the provision

1

of an intuitive classification schema and the indexing of doc-uments. A personal digital library system should allow usersto define classification schemas according to their needs.

The creation of the personal digital library implies the sub-mission of digital documents and their placement on the per-sonal digital library under user-defined classification schemas.

The documents of a personal digital library must be acce-sible via a mechanism capable of providing meaningful an-swers to user’s queries. In a personal digital library system,search and retrieval mechanisms must adapt to the personal-ized classification schema defined by the users of the system.

The documents of the personal digital library must bereadily available in response to user demand. This is quitea challenge in mobile environments, since connection vari-ability is likely to interrupt document transfers. Addition-ally, the great diversity of mobile devices imposes a multi-platform approach to the design of mobile digital librarysystems.

A personal digital library must provide each user withmechanisms to restrict unauthorized access to their personaldigital library and with administrative tools to manage dig-ital library content.

Despite the apparent contradiction of the personalizationgoal of a personal digital library system with the commu-nication standardization efforts of traditional digital librarysystems, interoperability is a challenge worth facing for theadded value it provides to personal digital library services.

In this paper, we present a concept architecture and cur-rent status of our Personal Digital Library System (PDLibfor short) that addresses these challenges with the integra-tion of information storage and retrieval, database and mo-bile client technologies to provide universally available per-sonal digital libraries.

The rest of this paper is organized as follows. In Section 2we describe the concept of universally available personal dig-ital libraries. Based on this discussion, Section 3 describesthe architecture and implementation issues of the PDLibsystem. In Section 4, we compare PDLib with other work inthe field. Finally, we conclude in Section 5 with directionsof future work.

2. UNIVERSALLY AVAILABLE PERSONALDIGITAL LIBRARIES

Traditional digital library systems grant a group of usersaccess to a digital library. Personal digital library systemsprovide a digital library to each user (i.e. a personal dig-ital library). PDLib is a personal library system that al-lows the user to shape and access her/his digital libraryfrom anyplace at anytime using nearly any computing de-vice (i.e. universal access). To realize universally availabledigital libraries, the following requirements were defined forthe PDLib system:

• Flexible collection and metadata management. Collec-tions must be provided as a mechanism for documentclassification. Users should be provided with the abil-ity to define the metadata set that will be used todescribe the contents of each collection. These interac-tions will allow the user to customize her/his personallibrary as desired.

• Digital document submission. The user should be ableto add any digital document to a personal digital li-

Figure 1: Mobile Environment Clients

brary. Submission from several device types shouldbe supported. The personal digital library must beable to accommodate several document formats forthe same document. In addition, since provision forlibrary content personalization is permitted by user-defined collections with customizable metadata sets,while submitting a document the user must be ableto: (a) select or create the collection that will containthe document and (b) provide metadata informationfor the document according to the collection’s meta-data set.

• Search and retrieval. Search and retrieval mechanismsmust adapt to the personalized classification schemadefined by the users of the system.

• Universal access. In order to provide universal accessto the documents and services of the personal digi-tal library, several client application types suitable formobile and fixed hosts of mobile environments [3, 13]must be considered. Software clients can be classi-fied according to their client-side architecture into:(a) thin clients and (b) thick clients; and agreeing withtheir mobility into: (a) fixed clients and (b) mobileclients (Figure 1). In thin clients, the application is de-livered on a browser or microbrowser; while on thickclients both code and data reside on the device. Thickand thin clients can coexist in a single device. Mo-bile clients are those with wireless connections to thesystem’s network; while fixed clients have a reliable—wired connection—with the network.

• Administration and access control. The owner of apersonal digital library always has unrestricted accessto her/his personal digital library content. In addi-tion, the owner should be provided with the capabilityto provide other users with access to her/his personaldigital library content.

• Interoperability. Interoperability with other personallibrary users and with other digital library systems us-ing well-known interoperability protocols.

Personal digital libraries are composed of collections. Col-lections contain, in turn, other collections and/or documents.

2

Users can interact with personal digital libraries by creatingand deleting collections and submitting, moving, copying ordownloading documents. In addition, users can define themetadata set that will be used to describe the contents ofeach collection. These interactions allow the user to cus-tomize her/his personal library as desired.

To illustrate the concept of universally available personaldigital libraries, we provide two hypothetical scenarios be-low. We have deliberately chosen scenarios feasible in auniversity context because we believe the concept of a per-sonal digital library to be an interesting mobile application.These scenarios exemplify the two key ideas of our univer-sally available personal digital library system: personal li-braries and universal access.

Scenario 1: Personal LibrariesShaping the Personal Library. Sarah, a visual arts un-dergraduate student, has been working on a two-student writ-ing assignment for her Comparative Mythology class. Sarahand Aidan, her team partner, have chosen the comparisonof the Norse and Greek mythological systems as their as-signment topic. Sarah is using her Personal Digital Library(PDLib) from her desktop PC to store and classify doc-uments (e.g. text, audio, video and/or image files) regard-ing the Norse myths. She has decided to arrange her doc-uments under a collection named “Norse Myths” createdfor this purpose. Within the “Norse Myths” collection, shehas created other collections to group documents under cer-tain themes (e.g. “Theogenic Myths”, “Creation of the Uni-verse”) or has placed the documents directly under “NorseMyths”. She has also created a “Vikings Online” collectionto store useful Internet references. In order to simplify theclassification of the “Vikings Online” collection and ease itspopulation, Sarah has customized the metadata set defin-ition of the “Vikings Online” collection so that it containsjust a “Title” and “URL” fields. Finally, Sarah has set thepermission rights of her “Norse Myths” collection so thatAidan has access to her documents. The files stored by Sarahwill stay in her PDLib even after she graduates for furtheruse.

Scenario 1 focuses on the composition of our personal li-braries and the services they provide. While storing ref-erences for her writing assignment, Sarah interacted withobjects of a personal library. The objects explicit in the sce-nario are: (a) documents, (b) collections, (c) metadata setsand (d) permissions. In addition, implicit in the scenariowe encounter the following objects: (a) libraries, (b) doc-ument formats and (c) document metadata. Therefore, inaddition to digital documents, the content of a personal digi-tal library also includes user-defined collections, metadata—both user-defined field definitions (i.e. metadata sets) anddocument information (i.e. document metadata)—and per-mission rights. In Section 3 we present the model that sup-ports this.

Library services are built around personal digital libraryobjects. The services provided by a personal digital libraryare shown in Table 1.

The services provided by a personal digital library are:

• CRUD Operations. Creation, Retrieval, Updateand Deletion (CRUD for short) operations to managelibrary, collection, document, document format andmetadata set objects. In the case of document objects,

Personal Digital Library (PDLib)Objects Services

Library (a) Creation, (b) Retrieval,(c) Update and (d) Deletion.

Collection (a) Creation, (b) Retrieval,(c) Update, (d) Deletion, (e) Copyand (f) Move.

Document (a) Submission, (b) Retrieval,(c) Update, (d) Deletion, (e) Send,(f) Search, (g) Email, (h) Copy and(i) Move.

Document Format (a) Submission, (b) Retrieval,(c) Update, (d) Deletion and(e) Conversion.

Metadata Set (a) Creation, (b) Retrieval,(c) Update, (d) Deletion and(e) Get/Set Collection Metadata Set.

Document Metadata (a) Update Document Metadata.Permission (a) Get/Set Collection Permission and

(b) Get/Set Document Permission.

Table 1: Personal Digital Library Services

the creation operation is the document’s submission tothe personal digital library.

• Copy/Move Documents or Collections. Copyand Move operations are provided to duplicate or re-locate document and collection objects within the hi-erarchical structure of the personal digital library.

• Document Search. Boolean search and ranked searchof documents according to their full-text content and/orassociated metadata. The following variants of thesearch operation have been defined:

– Basic Search. Returns documents according to asearch expression.

– Advanced Search. Returns documents accordingto several metadata-specific search expressions.

– Full-Text Search. Returns documents accordingto their textual contents.

– Recursive Search. The search scope is performedin all the collection hierarchy.

– Multi-Library Search. The search scope includeslibraries from other users or other digital librarysystems.

• Send Document. Sends a document to the personaldigital library of another user. Document metadataand, optionally, document formats are sent as well.

• Email Document. Sends document and metadatavia email to any user with an email account.

• Document Format Conversion. Perform documentformat conversion (e.g. plain text document from aPDF document, a PNG image from a JPEG image).

• Get/Set Collection Metadata Set. Retrieve themetadata set of a given collection or assign a metadataset to a collection.

• Update Document Metadata. Updates documentmetadata first provided at document submission (i.e.creation of the document).

• Get/Set Document or Collection Permissions.Allow the user to grant/revoke permission rights toother users to her/his personal documents and collec-tions.

3

Scenario 2: Universal AccessAlways there for Me. Aidan, currently working on cam-pus at the Visual Arts Center and off campus as a graphicdesigner, is constantly moving from one location to another.He also changes his working computer frequently since hehas been assigned a PC laptop at the Visual Arts Centerand a desktop Macintosh at the design agency. Luckily, hisPDLib is there for him. During his precious free time at bothworks, Aidan is able to browse the references found by histeammate Sarah for their shared homework using his PDLibto access hers. As a complement to Sarah’s research, Aidanhas to find useful information regarding Greek mythology.Aidan uses his unoccupied time at work to gather referencesrelevant to his topic. Aidan has been storing his findingsat his PDLib under a collection called “Greek Myths”. Onone occasion, he ran into his friend Matthew at the campuswalkways. After briefing each other on their latest doings,they found out that Matthew—who is working on an essayfor his Classical Mythology class—could use one of Aidan’sreferences. Since Matthew does not have a PDLib, Aidanuses the email document feature of his PDLib from hismobile phone to send Matthew a copy of the reference.A few days before the due date of his written assignmentwith Sarah, Aidan bought himself a PDA. Having writtenthe joint essay and stored a copy of both the finished es-say and its related presentation in their PDLibs, Sarah andAidan decide to retrieve the presentation using PDLib inpreparation for their in-class lecture. Later, at the lecture’sQ&A section, both Sarah and Aidan were able to access theirPDLibs to search, retrieve and display some of the docu-ments quoted in their work from Aidan’s PDA. After beingpraised for the quality of their presentation, Sarah and Aidanwere able to send the essay via PDLib to their professor andclassmates.

Scenario 2 portrays the convenience of universal access.Despite moving from one place to another or changing com-puting devices, Aidan was able to reach his PDLib wher-ever and whenever he needed to. This was possible sincePDLib’s architecture can provide digital library services tomost computing devices with an Internet connection. Thearchitecture contemplates different several client applicationtypes to address the great diversity of hosts in a mobile en-vironment. Client applications are responsible of adaptingPDLib’s services according to the interface and computingresources of client devices. Mobile thick client applicationsalso adapt to the limitations of the mobile environment.Specialized connection middleware is used to support theadaptation measures undertaken by mobile thick clients.

Our personal digital library concept considers the inter-action with other digital library systems. This means thatour client applications can be used to browse the contents ofother digital library systems while providing the user witha subset of the personal digital library services to interactwith the content of other library systems. Interoperabilityhas been made possible by the inclusion of OAI-MHP [15]compliant modules in our personal digital library system ar-chitecture.

3. SYSTEM ARCHITECTUREOur goal for PDLib is to realize the concept of univer-

sally available personal digital libraries. As presented inSection 2, a universally available personal digital library

Figure 2: PDLib System Overview

system seeks to provide the abstraction of individual dig-ital libraries in a mobile environment [17]. An overview ofthe PDLib system is shown in Figure 2 and its architectureis provided in Figure 3.

The main component of the system is the PDLib DataServer. The data server stores the objects of the personaldigital libraries (see Table 1). The data server also providesthe client applications with a communication module thatenables remote access to the personal digital library content.

In order to support the requirements introduced in Sec-tion 2, we have developed and architecture and a prototypethat provide digital library services to several types of de-vices according to their computing capacity (e.g. desktop,laptop, PDA and mobile phone) with multiple operating sys-tems (e.g. Microsoft Windows, Linux, Mac OS, Palm Osand Microsoft Windows CE). The driving idea behind thisarchitecture is to provide each device with the best way tointeract with the services offered by the data server.

The architecture of PDLib also supports the interactionwith other digital library systems via the OAI-MHP. Thismeans that a PDLib user can navigate in other digital li-brary systems compliant with the OAI-MHP using PDLib’sinfrastructure. Figure 2 shows a general overview of thePDLib system. The system is composed of three layers:

1. Client Tier. Includes the variety of devices with whichan user can interact with PDLib;

2. Server Tier. Shows the server system infrastructure thatprovide services to clients: (a) Data Server, (b) MobileConnection Middlware (MCM) and (c) Web Front-end.

3. Interoperability Tier. Includes other (PDLib) data serversand OAI-MHP compliant digital library systems.

The devices of the client tier communicate with the servertier to access PDLib digital library services. The access typeof the client tier with the server tier varies according tothe client device’s capabilities. The architecture of PDLibreflects the following access types:

1. Middleware Access. Supports mobile devices, especiallythose with limited computing resources (e.g. HTTP-enabled mobile phones, PDA).

4

Figure 3: PDLib System Architecture

2. Web Access. Provides HTTP-access to any device that in-cludes a Web browser (e.g. WML/HTML microbrowser-enabled mobile phones).

3. Direct Access. Applications with very particular require-ments can access the data server directly. In Section 3we provide an example of such an application.

The following sections describe the components of thePDLib system and their interactions.

3.1 ClientsAs mentioned before, the software clients of a mobile en-

vironment can be classified according to their client-side ar-chitecture and mobility. We elaborate on this classificationwith support of Figure 1. Figure 1 plots client-side archi-tecture on the vertical axis and client mobility on the hor-izontal axis. The quadrants contain stereotypical protocolsand development platforms for each client category. Mo-bile thick clients are especially important to achieve PDLib’sgoals since mobile applications should center on mobile thickclients—a key difference with Web applications, which typ-ically focus on fixed thin clients—since thick clients makeoffline operation possible. A technological migration trendthat goes from Web applications to mobile applications issuggested. This trend reflects recent software applicationdevelopment tendencies and implies that the technologiesavailable in an earlier stage are present in subsequent stagesas well.

It is possible to map the mobile client application types tothe devices supported by PDLib as follows (refer to Figure 2and Figure 1):

• According to client-side architecture. Thin client appli-cations connect to the Data Server through the WebFront-end while thick client applications communicatewith the Data Server via the Mobile Connection Mid-dleware.

• According to mobility. Mobile applications connect tothe server tier via unreliable wireless connections whilefixed applications connect to the server tier via a reli-able wired connections. Figure 2 shows wireless con-nections as dashed lines and wired connections as solidlines.

With the objective of providing an access medium appro-priate to the devices shown in Figure 2, the following clientapplication types were defined:

• Mobile clients (mobile thick clients). Mobile clientswere designed to cope with the limitations of the mo-bile environment. This means that mobile clients rede-fine the functionality provided by PDLib’s fixed clientsto provide the abstraction of a personal digital libraryon a mobile device. A middleware is required to helpmobile clients achieve their mobile connection adapta-tion goals.

• Web clients (fixed and mobile thin clients). Thiscategory includes those devices with a browser or mi-crobrowser capable of displaying HTML (Figure 6) orWML pages. Despite the fact that Web clients do nothave all the features available in thick clients, theyprovide basic interaction support with PDLib. Webclients connect with PDLib’s Web Front-end.

• Application clients (fixed thick clients). Applica-tion clients are intended to run in desktop and laptopdevices with few computing resources constraints. Ap-plication clients are desirable because they provide aricher interface than Web clients. In addition, appli-cation clients can communicate directly with the dataserver, which leads to a more efficient communicationof fixed clients.

5

The previous client types were defined because each typehas very particular characteristics. Application clients areintended to run on resource unconstrained devices such asdesktop or laptop. On the other hand, a mobile user caninteract with PDLib using a mobile client. Mobile clientsallow the interaction with the personal library offering asubset of the functionality provided by fixed thick clients.Web clients, on the other hand, were designed to endow theuser with access to their library without the prerequisite ofa client application installed on a computing device. Oneof the main challenges is in mobile clients, since they haveto deal with the limitations of the mobile environment. Toaccomplish their task, mobile clients perform the followingfunctions:

• Local Storage Mechanism. Store documents in themobile device for offline viewing.

• Connection Adaptation Mechanism. This mech-anism provides a constant response time despite theconnection variability of wireless connections. A con-stant response time can be achieved by calculating thesize of the data-transfer window and predicting net-work state [6].

• User Interaction Support. Graphical interface thatallows the user to manage the personal digital librarycontent from its mobile device. Mobile clients readresults received from the MCM and display them onscreen.

3.2 Web Front-endThe Web Front-end transforms personal digital library

services into a web application. In order to support fixedand mobile thin clients, the Web Front-end is capable ofdelivering WML or HTML according to the requesting de-vice (i.e. browser or microbrowser). The Web Front-endprovides the following functionality:

• Session Handling. Session handling is used to main-tain the interaction of a thin client with the Web Front-end beyond a single HTTP request.

• Browser Markup Language Support. Verifies themarkup languages supported by the requesting browserand serves HTML or WML accordingly.

3.3 Mobile Connection MiddlewareOne of the main problems to solve in order to provide the

data server services is the fact that the data server has beendesigned to be used by a wide range of devices and not justmobile devices. However, there is a clear difference in com-puting resources between mobile (e.g. PDA) and fixed (e.gdesktop) devices. This computing resource disparity makesit difficult to adapt the data server to the capabilities ofmobile devices; and signals the need of a middleware com-ponent to mediate the interaction of the mobile device andthe data server. This arrangement also precludes the neces-sity of changing the previously established services. We callthis middleware the Mobile Connection Middleware (MCM).The MCM is designed to provide following functionality:

• Connection Support. Is required by mobile clientsin order to perform adaptation to the high bandwidthvariability of the mobile environment and to cope withthe frequent disconnections of mobile devices.

• Process Delegation. Execute functions that woulddemand an excessive (and quite possible, unavailable)amount of computing resources to the mobile device.

• Mobility Support. Operations such as prefetchingcan be performed to speed up the retrieval of docu-ments from the data server and store them in cacheservers that are closer to the user. When the userchanges location, it would be necessary to support mi-gration of the information between different instancesof the MCM.

• Device Interaction Support. Performs adaptationof content according to the characteristics of the deviceon which it is desired to show the information.

3.4 Data ServerThe central part of PDLib is the data server. The data

server provides the services of a personal digital library,stores personal digital library data and supports interop-erability via the OAI-MHP. The data server provides thefollowing functionality:

• Personal Library Services. The data server offerscreation, retrieval, update and deletion operations overthe library objects stored in the personal data stor-age (collections, documents and metadata). The dataserver also provides services to copy and move doc-uments or collections, search for documents in a per-sonal library, send documents to the personal library ofother users, email documents to any user with an emailaccount, perform document format conversion, changecollection metadata sets, edit document metadata andgrant or revoke permission rights over personal librarycontent.

• Personal Data Storage. Stores the data of the per-sonal digital libraries. A text search engine is usedto index the content of the personal digital libraries.A database is used to store the objects of the per-sonal digital libraries since we require a structuredmodel to represent personal library objects and therelations among them. The usage of a text searchengine and a database in the data server permit thecombined use of text-based queries and SQL datatype-based queries to support the hierarchical classificationof personal library content. The hierarchical classifi-cation can greatly improve query performance since itallows queries to be specified against documents con-tained in individual collections with or without theinclusion of their nested collections to the scope of thequery.

• OAI-MHP Support. The data server exposes themetadata of the personal digital library documents viathe OAI-MHP. In addition, the data server harvestsmetadata of other OAI-MHP compliant library sys-tems to provide users with a subset of the personaldigital library services to interact with other (OAI-MHP compliant) library systems.

Our personal digital library data model shown in Fig-ure 4 establishes that a library contains one or more col-lections. A collection contains documents or more collec-tions and is associated with a metadata set. The meta-data set is composed by one or more metadata definitions.

6

Figure 4: Personal Digital Library Data Model

A metadata set is the metadata of the document meta-data (e.g. name=“Author ”, type=“Text” and name=“Title”,type=“Text”). The document metadata is the metadataof a particular document (e.g. name=“Author ”, value=“S.Sturluson” and name=“Title”, value=“Prose Edda”). Thefields that compose document metadata are determined bythe metadata set of the collection. The metadata set of a col-lection defaults to the metadata set of the parent collectionand can be redefined by the user on a per collection basis.Both collections and documents have an associated permis-sion. Permissions are particularized in the following accessrights: (a) Personal. Access is restricted to library owner;(b) Incoming. Other users are allowed to create documentsand collections. This access right applies only to collections;(c) Outgoing. Other users are allowed to browse collec-tions and retrieve documents; (d) Update. Other usersare allowed to update documents and their associated doc-ument metadata and document formats. However, otherusers are not allowed to create documents nor delete them;and (e) Shared. Other users are granted full access to doc-uments and collections; Permission access rights associatedwith documents contained in a collection defaults to the per-mission access right associated with their parent collection.A document can have one or more document formats.

3.5 System StatusAs of the writing of this paper, the following prototype

implementations of the PDLib components have been de-veloped:

• A mobile thick client application: PDLib MobileClient. The Mobile Client application allows the userto: 1. Glance through the contents of his/her personaldigital library (Figure 5 (a)); 2. Retrieve and store alocal copy of selected digital library documents on themobile device; 3. Manage her/his personal digital li-brary by: i. Moving/copying documents or collectionsand ii. Setting access rights to documents or collections(Figure 5 (b)); 4. Share their digital library content by:i. Sending documents via email and ii. Sending a doc-ument to another PDLib user (Figure 5 (b)); 5. Searchfor documents in her/his personal digital library (Fig-ure 5 (c)); 6. View and edit document metadata (Fig-ure 5 (d)); and 7. Browse the contents of other users’personal digital libraries. Prototype implementationsof PDLib’s Mobile Client have been developed usingCLDC/MIDP J2ME [5] (Figure 5) The prototype ap-plications communicate with the MCM via the XML-RPC protocol libraries of the Java/XML Enhydra Ap-plication Server [11].

• A fixed client application: The PDLib BeUp! Bin. TheBeUp! Bin is a prototype client application that up-

Figure 5: PDLib Mobile Client (J2ME CLDC/MIDP)

loads files to a special collection where all unclassifieddocuments are stored. The current implementation ofthis application supports the addition of files via “drag-and-drop” operations, the queueing of upload requestsand direct file selection. The BeUp! Bin is an exampleof an application client with direct access to the DataServer (as explained earlier in this section). The ap-plication communicates directly with the PDLib DataServer using the Apache XML-RPC client library [2].

• Data Server. PDLib’s Data Server provides personaldigital library content storage and retrieval services.The prototype implementation of the Data Server sup-ports the creation, retrieval, update and deletion op-erations over the library objects stored in the personaldata storage (collections, documents and metadata).The data server also provides services to copy andmove documents or collections, search for documentsin a personal library, send documents to the personallibrary of other users, email documents to any userwith an email account, change collection metadata setsand edit document metadata. The data server imple-mentation exposes the metadata of the personal digi-tal library documents via the OAI-MHP and harvestsmetadata of other OAI-MHP compliant library sys-tems in order to allow PDLib clients to browse theircontents. A prototype implementation of the dataserver has been created using J2SE [8]. MySQL is used

7

Figure 6: PDLib Web Client

to store all personal digital library data [14]. We arecurrently working to incorporate the Lucene full-textsearch engine [12] to improve performance and reduceindex storage requirements.

• MCM (Mobile Connection Middleware). The MCM in-teracts with the Mobile Client to provide the followingconnection adaptation services: (a) Session handlingover the stateless XML-RPC protocol; (b) A constantcollection navigation response time despite the con-nection variability of wireless connections; (c) Parti-tion document transfers and support disconnectionswhile transferring documents. In addition to provingconnection adaptation, the MCM also stores query re-sponses received from the data server in order to ac-celerate the responses it serves to the Mobile Client.In the current implementation, the MCM services areaccessed via XML-RPC. The Apache XML-RPC im-plementation is used to provide the services [2].

• Web Front-end. (a.k.a. PDLib Web Client) The WebFront-end was developed using Java Servlets, Cascad-ing Style Sheets (CSS), HTML and WML. The WebFront-end verifies the display capabilities of the re-questing browser and answers accordingly (e.g. serv-ing a WML page instead of HTML if the requestingbrowser is incapable of displaying HTML) (Figure 6).The Web Front-end runs on the Tomcat servlet con-tainer [23].

4. RELATED WORKThis section compares PDLib with other digital library

and mobile database projects with goals related to the uni-versally available personal digital libraries concept. Thiscomparison distinguishes PDLib from the projects presentedin the remainder of this section.

DSpace is a Web-based institutional digital library systemthat captures, stores, indexes, preserves and redistributesthe intellectual output of a university’s research faculty indigital formats [22, 21]. DSpace differs from PDLib in sev-eral aspects such as: (a) Purpose. DSpace focuses on insti-tutional (i.e. collective) digital libraries while PDLib is con-cerned with personal digital libraries. (b) Client ApplicationType. DSpace is a Web-based application and targets fixed-thin clients while PDLib addresses thick and thin clients onmobile and fixed devices. (c) Structure. DSpace follows a

Web-based three-layer architecture divided in storage, busi-ness, and application layers. PDLib follows a multilayeredpeer to peer architecture to support Web-based applicationsand mobile applications. (d) Data Distribution. DSpacecentralizes digital library data in a single data store whilethe PDLib system architecture considers: (i) Integrating thedata of distributed data stores; and (ii) Communicating dig-ital library data across several data server subsystems.

Greenstone is a software system for the creation of dig-ital library collections. The Greenstone system seeks toprovide a unified framework for searching and browsing li-brary contents. The Greenstone system differs from thePDLib system in the following: (a) Purpose. Greenstonerealizes collective digital libraries while PDLib is concernedwith personal digital libraries; (b) Client Application Type.Greenstone has a Web-based interface and facilities to createstandalone CD-ROM working versions of the digital library.On the other hand, PDLib addresses thick and thin clientson mobile and fixed devices. (c) Build Process. Greenstonebuilding process can be cumbersome for large collections,taking a day or two to complete [24]. PDLib does not re-quire a building process, since its search engine componentsupports the addition and removal of documents to exist-ing indexes (i.e. user actions effectively “build” the digitallibrary interactively). (d) Data Distribution. The Green-stone system centralizes the digital library data in a singledata store while the PDLib system architecture considersintegrating the data of distributed data stores and commu-nicating digital library data across several data server sub-systems.

The Owl Intranet Engine (Owl) is a multi-user Web-baseddocument repository system for publishing documents of acorporation, small business, group of people, or an individ-ual [16]. Owl presents the following differences with thePDLib system: (a) Individual Libraries. Owl is a multi-user environment where an individual digital library is de-fined with permission policies. PDLib provides personal dig-ital libraries with access control policies that allow the userto share her/his library data with other users; (b) Infor-mation Storage and Retrieval. Both Owl and PDLib sup-port keyword-based full-text search on document contentand document metadata. However, the metadata defini-tion of Owl is fixed while PDLib allows its users to specifycustom metadata definitions. In addition, PDLib extendsthe keyword-based full-text search with standard databasequeries. (c) Client Application Type. Owl is a Web-basedapplication and targets fixed-thin clients while PDLib ad-dresses thick and thin clients on mobile and fixed devices.(d) Data Distribution. Owl centralizes the digital librarydata in a single data store while the PDLib system architec-ture considers integrating the data of distributed data storesand communicating digital library data across several dataserver subsystems.

The Phronesis system is a practical and efficient tool forthe creation, administration and maintenance of distributeddigital libraries on the Internet [7, 18]. Phronesis digitallibraries are grouped in collections. A Phronesis collectionconsists of documents relevant to a knowledge area or rele-vant to a group of people. The Phronesis system, like Green-stone, builds on the Managing Gigabytes program [25].Phronesis includes the following distinctive features [18]:(a) Full-text (text, PostScript, html, pdf and rtf documents)and metadata indexing and searching based on the MG

8

(Managing Gigabytes) system [25]. (b) Metadata based onthe international standard Dublin Core. (c) Searching andretrieval of English and Spanish documents. (d) Compres-sion of digital library content. (e) Parallel search in remotePhronesis Collections. (f) Spanish and English Web-basedinterface. (g) Access control policies. Some differences be-tween Phronesis and PDLib follow: (a) Purpose. Phronesisfocuses on collective digital libraries while PDLib is con-cerned with personal digital libraries. (b) Client Applica-tion Type. Phronesis is a Web-based application and tar-gets fixed-thin clients while PDLib addresses thick and thinclients on mobile and fixed devices. (c) Build Process. Asin the case of the Greenstone building process, Phronesisbuilding process can be cumbersome for large collections.PDLib does not require a building process, since its searchengine component supports the addition and removal of doc-uments to existing indexes. (d) Data Distribution. Phrone-sis centralizes the digital library data in a single data storewhile the PDLib system architecture considers integratingthe data of distributed data stores and communicating digi-tal library data across several data server subsystems. How-ever, while Phronesis data stores are all centralized, Phrone-sis provides support for parallel searches in remote collec-tions.

UpLib is a Web-based full-text indexed repository ac-cessed through an active agent. The UpLib system hasbeen specifically designed for secure use by a single individ-ual [10]. Collaborative operation of multiple UpLib reposi-tories is possible with extensions that facilitate communitybuilding around individual document collections [9]. UpLibis “universal” in the sense that documents are representedas projections into the text and image domains. UpLib in-terface is predominantly based on page images. Therefore,UpLib can handle any document format which can be ren-dered as pages. UpLib includes mechanisms to provide al-ternative representations of documents besides their textualand image representations.

While both UpLib and PDLib define themselves as beinguniversal personal digital library systems, there are funda-mental differences between the motivating factors behindeach system. These differences follow: (a) Universality No-tion. UpLib provides “universality” by representing docu-ments as text and images. UpLib is a universal digital li-brary system since it provides a uniform alternative to digi-tal library systems with specific language extensions (likethose provided by Phronesis) or domain specific descrip-tions (such as a well-defined metadata set for documents ina particular knowledge area). Conversely, PDLib attemptsto provide “universal access”—i.e. from anyplace at any-time [19]—to personal digital libraries. (b) Personal Li-braries. UpLib is a personal digital library system sinceit was specifically designed for secure use by a single indi-vidual and to accommodate for user-defined extensions [10].In contrast, PDLib declares itself a personal digital librarysystem because it seeks to provide the abstraction of indi-vidual digital libraries in a mobile environment. Both UpLiband PDLib are digital library systems that store individualknowledge. However, UpLib provides a universal interfacefor library content while PDLib endeavors to allow person-alization and customization of library content. PDLib sup-ports individualization of library content with a data modeldesigned to provide the notion of user-defined collections(i.e. possibly nested, document containment entities) with

customizable metadata sets.Besides their conceptual differences UpLib and PDLib

have the following operational differences: (a) Client Ap-plication Type. UpLib is a Web-based application and tar-gets fixed-thin clients while PDLib addresses thick and thinclients on mobile and fixed devices. (b) Variety of Formats.UpLib can handle any document format which can be ren-dered as pages and includes mechanisms to provide alterna-tive representations of documents besides their textual andimage representations. Alternatively, PDLib’s data serversubsystem has been designed for the storage and retrieval ofgeneral purpose content and is not constrained by the visualrepresentation (or lack of it) of document formats. Nonethe-less, the data server’s keyword-based retrieval schema doesrequire the text projection of documents. (c) Data Distrib-ution. UpLib centralizes the digital library data in a singledata store while the PDLib system architecture considers in-tegrating the data of distributed data stores and communi-cating digital library data across several data server subsys-tems. However, collaborative operation of multiple UpLibrepositories is possible with extensions that facilitate com-munity building around individual document collections [9].

5. CONCLUSIONS ANDFUTURE DIRECTIONS

We have presented the concept of personal digital librarieswith universal access. A universally accessible personal dig-ital library system provides each user with a customizable,general purpose document repository (i.e. a personal dig-ital library) and the means to access it from anyplace atanytime from nearly any computing device. In particular,we have described the requirements, architecture and imple-mentation of a personal digital library system with universalaccess: The PDLib system. To realize universally availablepersonal digital libraries, the following requirements werecharacterized: (a) Flexible collection and metadata manage-ment ; (b) Digital document submission; (c) Search and re-trieval ; (d) Universal access; (e) Administration and usercontrol ; and (f) Interoperability. These requirements wereaddressed by our personal digital library system architec-ture. The main components of the architecture are theClient-Side Applications, the Data Server and the MobileCommunication Middleware (MCM). The Client-Side Ap-plications target mobile and fixed devices and depending onthe device digital library services are adapted. The DataServer supports the services for data storage, indexing andretrieval of information from users. The MCM is an inter-mediary between mobile devices and the Data Server andprovides functionality for mobility support like connectionadaptation. Currently a proof-of-concept prototype imple-mentation exists that exemplifies many of the main conceptspresented on this paper.

The PDLib system is an ongoing research and develop-ment effort where interesting challenges related to digitallibraries and mobile computing are presented. We are cur-rently working on the following research topics: (a) Scal-ability A personal digital library system must cope withan increasing number of mobile users that will demand alarge amount of distributed storage resources to hold an evengreater quantity of personal data objects (e.g. documents);(b) Universal access and availability In addition to the di-versity of the client applications, we have expanded the uni-

9

versal access criterion to include measured performance andavailability metrics; support of digital library services duringoffline operation as well as ad-hoc interactions are being de-fined. Caching and prefetching techniques that can improveperformance of mobile devices accessing digital library ob-jects are also topics of interest for our work; (c) Contentadaptation Content summarization and visualization meth-ods of digital library information on limited screen size de-vices are being explored could provide an added value tomobile clients of our personal digital library system; and(d) Interoperability In addition to the current functionalitythere is the opportunity for exploring further interoperabil-ity with OAI-MHP and other systems like z39.50 and websearch engines. The traditional concept of a library was asite where documents were stored and patrons had to attendthe library site in order to access the documents. The infor-mation technology opened new opportunities allowing theaccess to traditional and new library services from remotelocations. Personal digital libraries with universal accessprovide services that support the abstraction that a singleuser has his own library that he can carry with him any-where he goes. He can manage and access documents any-where and can also interact with personal digital librariesfrom other users or with traditional digital library reposito-ries. University students can have their own Personal DigitalLibrary where they store important papers during the timethat they attend the university, after they graduate, theykeep their Personal Digital Library and continue expandingtheir own collections according to their personal and profes-sional interests. Faculty members and researchers can havetheir own Personal Digital Libraries, during a conference,they can add/retrieve documents to/from their personal li-brary as they interact with colleagues. These are just somesample scenarios of the potential applications for this con-cept.

6. ACKNOWLEDGMENTSWe would like to thank Roberto Garcia-Sanchez, Adan

Salinas-Flores, Luis Hurtado-Alvarado, Miguel Escoffie-Puerto,Eugenio Flores-Centeno, Marcos Guevara-Camacho and Ed-uardo Alvarez-Cavazos for their support in the elaborationof this paper. To all of them our most sincere gratitude.

7. ADDITIONAL AUTHORSLorena G. Gomez-Martinez (ITESM, Campus Monter-rey, email: [email protected]) and Martha Sordia-Salinas(ITESM, Campus Monterrey, email: [email protected]).

8. REFERENCES[1] N. R. Adam, R. Holowczak, M. Halem, N. Lal, and

Y. Yesha. Digital library task force. IEEE Computer,29(8), August 1996.

[2] Apache XML-RPC. URL ws.apache.org/xmlrpc/.[3] D. Barbara. Mobile computing and databases—a

survey. Knowledge and Data Engineering,11(1):108–117, 1999.

[4] B. K. Bhargava and M. Annamalai. Communicationcosts in digital library databases. In Database andExpert Systems Applications, pages 1–13, 1995.

[5] CLDC/MIDP. URL java.sun.com/products/midp/.[6] R. Garcia-Sanchez. Connection adaptation techniques

for mobile clients that access digital library services.

Master’s thesis, ITESM, Campus Monterrey,December 2004. Available in Spanish.

[7] D. A. Garza-Salazar, J. C. Lavariega Jarquín, andM. Sordia-Salinas. Information retrieval andadministration of distributed documents in internet.the phronesis digital library project. InW. Abramowicz, editor, Knowledge Based InformationRetrieval and Filtering from the Web, chapter 3, pages53–73. Kluwer Academic Publishers, Boston, MA,November 2003.

[8] J2SE. URL java.sun.com/j2se/.[9] W. C. Janssen. Collaborative extensions for the uplib

system. In Proceedings of the 4th ACM/IEEE-CS jointconference on Digital libraries, pages 239–240. ACMPress, 2004.

[10] W. C. Janssen and K. Popat. Uplib: a universalpersonal digital library system. In Proceedings of the2003 ACM symposium on Document engineering,pages 234–242. ACM Press, 2003.

[11] kXML-RPC.URL kxmlrpc.objectweb.org/index.html.

[12] Lucene. URL jakarta.apache.org/lucene/.[13] S. K. Madria, M. Mohania, S. S. Bhowmick, and

B. Bhargava. Mobile data and transactionmanagement. Information Sciences—Informatics andComputer Science: An International Journal,141(3–4):279–309, 2002.

[14] MySQL. URL www.mysql.com/.[15] OAI. URL www.openarchives.org/.[16] Owl. URL owl.sourceforge.net/.[17] PDLib. The personal digital library project.

URL copernico.mty.itesm.mx/pdlib/.[18] Phronesis. The phronesis digital library project.

URL copernico.mty.itesm.mx/phronesis/project/.[19] E. Pitoura and G. Samaras. Data Management for

Mobile Computing, volume 10. Kluwer AcademicPublishers, 1998.

[20] M. Satyanarayanan. Pervasive computing: Vision andchallenges. IEEE Personal Communications, pages10–17, August 2001.

[21] M. Smith. Dspace for e-print archives. High EnergyPhysics Libraries Webzine, 1(9), March 2004.

[22] M. Smith, M. Barton, M. Bass, M. Branschofsky,G. McClellan, D. Stuve, R. Tansley, , and J. H.Walker. Dspace: An open source dynamic digitalrepository. D-Lib Magazine, 9(1), January 2003. ISSN1082-9873.

[23] Tomcat. URL jakarta.apache.org/tomcat/.[24] I. H. Witten, S. J. Boddie, D. Bainbridge, and R. J.

McNab. Greenstone: a comprehensive open-sourcedigital library software system. In Proceedings of thefifth ACM conference on Digital libraries, pages113–121. ACM Press, 2000.

[25] I. H. Witten, I. H. Witten, A. Moffat, and T. C. Bell.Managing Gigabytes: Compressing and IndexingDocuments and Images. Morgan Kaufmann, secondedition, May 1999.

10