Web Services and Information Delivery for Diverse Environments

16
Web Services and Information Delivery for Diverse Environments Juliana Freire Bharat Kumar Bell Laboratories, 600 Mountain Ave., Murray Hill, NJ 07974, USA juliana,bharat @research.bell-labs.com Abstract There is a growing need for techniques that provide alternative means to access Web content and services, be it the ability to browse the Web through a voice interface like the PhoneBrowser, or through a wireless PDA or smart phone. The Web was designed and works well for desktop comput- ers, to be viewed in large screens and through good network connections. However, using the Web through a phone or a small wireless device poses a number of challenges. In this paper, we discuss the issues involvedin making existing Web content and services available for diverse environments, and describe PersonalClipper, a system that allows casual users to easily create customized (and simplified) views of Web sites that are well-suited for different types of terminals. 1 Introduction The ability to take information, entertainment and e-commerce on the go has a lot of promise. The wireless data market is expected to grow enormously in the next few years. In the US alone, Dataquest expects that the number of wireless data subscribers will explode from 3 million in 1998 to 36 million in 2003. Thus, very soon, millions of people will be able to access the Web, order services and goods from wireless Internet devices. However, the existing Web infrastructure and content were designed for desktop computers and are not well-suited for devices that have less processing power and memory, small screens, and limited input devices. In addition, wireless data networks provide less bandwidth, have high latency and are not as stable as traditional (wired) networks. Consider for example accessing the Web from a personal digital assistant (PDA) such as the Palm Pilot. Current wireless data services such as Omnisky [14] run over CDPD 1 , whose throughput rates vary from 5-6 kpbs up to 12-13 kbps. With a screen size of 160x160 pixels on a 6x6cm surface, it can be very hard to browse through large pages with rich graphics. In addition, input facilities are limited — even with Palm’s Grafitti text input system, entering text can be very time consuming. In order to address these limitations of bandwidth, screen real estate, and input facilities, there are three different approaches/models currently in use: 1 CDPD [7] is a wireless IP network that overlays on the existing AMPS (analog) cellular infrastructure. 1

Transcript of Web Services and Information Delivery for Diverse Environments

WebServicesandInformationDelivery for

DiverseEnvironments

JulianaFreire BharatKumar

Bell Laboratories,600MountainAve.,MurrayHill, NJ07974,USA�juliana,bharat� @research.bell-labs.com

Abstract

Thereis a growing needfor techniquesthatprovidealternativemeansto accessWebcontentand

services,be it the ability to browsethe Web througha voice interfacelike the PhoneBrowser, or

throughawirelessPDA or smartphone.TheWebwasdesignedandworkswell for desktopcomput-

ers,to beviewedin largescreensandthroughgoodnetwork connections.However, usingtheWeb

througha phoneor a smallwirelessdevice posesa numberof challenges.In this paper, we discuss

theissuesinvolvedin makingexisting Webcontentandservicesavailablefor diverseenvironments,

and describePersonalClipper, a systemthat allows casualusersto easily createcustomized(and

simplified)viewsof Websitesthatarewell-suitedfor differenttypesof terminals.

1 Introduction

The ability to take information,entertainmentand e-commerceon the go hasa lot of promise. The

wirelessdatamarket is expectedto grow enormouslyin thenext few years.In theUS alone,Dataquest

expectsthat thenumberof wirelessdatasubscriberswill explodefrom 3 million in 1998to 36 million

in 2003. Thus,very soon,millions of peoplewill beableto accesstheWeb,orderservicesandgoods

from wirelessInternetdevices.However, theexisting Webinfrastructureandcontentweredesignedfor

desktopcomputersandarenotwell-suitedfor devicesthathavelessprocessingpowerandmemory, small

screens,andlimited inputdevices.In addition,wirelessdatanetworksprovidelessbandwidth,havehigh

latency andarenotasstableastraditional(wired)networks.

Considerfor exampleaccessingthe Web from a personaldigital assistant(PDA) suchasthe Palm

Pilot. CurrentwirelessdataservicessuchasOmnisky [14] run over CDPD1, whosethroughputrates

vary from 5-6 kpbsup to 12-13kbps. With a screensizeof 160x160pixelson a 6x6cmsurface,it can

beveryhardto browsethroughlargepageswith rich graphics.In addition,input facilitiesarelimited —

evenwith Palm’sGrafitti text inputsystem,enteringtext canbevery timeconsuming.

In order to addresstheselimitations of bandwidth,screenreal estate,andinput facilities, thereare

threedifferentapproaches/modelscurrentlyin use:

1CDPD[7] is awirelessIP network thatoverlayson theexistingAMPS(analog)cellularinfrastructure.

1

� Re-engineeringexisting Websites: contentproviderscreatedifferentversionsof their Websites

thatprovidecontentformattedfor specificdevices.For example:TheNew YorkTimeshasapalm-

friendlysectionat ��������� ����������������������������� �!�#"$�%���� &'������&'���������(�)�*� "��+�,����-,��.$�������/� ; �����0 "$��!�#"$� pro-

videsa specializedinterfacefor Web-enabledphones,aswell asfor thePalmVII [2]; andvarious

otherWebsitesnow have mobilephone-friendlyversions(see[22] for a list suchsites).� Creatingspecializedwrappers that export a different view of a Webpage or service: third-party

servicessuchas 1����)�*&2�!�#"$� and ")&3����� ����"$1)�*� � �!�#"$� providewrapperswhichexportwireless-friendly

clippingsof asetof Webpagesandservices,suchasstockquotes,traffic andweatherinformation.

Thesewrappersrequirenomodificationsto theunderlyingWebsites.� Using proxiesthat filter and reformatWeb content: proxiescan be programmedto transform

contentaccordingto client’s displaysizeandcapabilities.For example,ProxiWeb[18] transforms

HTML pagesandembeddedfiguresinto a formatthatcanbedisplayedonaPalmPilot.

But theseapproacheshave drawbacks(Table1 summarizesthe featuresof theseapproaches).From a

contentprovider’s perspective, having to createandmaintainmultiple versionsof a Website to support

differentdevices is labor intensive and can be very expensive. The sameis true of specializedWeb

clippings,asprograms(or wrappers)have to becreatedfor eachWebsiteandtheseneedto beupdated

every time the underlyingsite changes.From a user’s point of view, both solutionsarerestrictive, as

neitherall Web sitessupportall kinds of devices,nor wrapper-basedsolutionsoffer clippings for all

contentor servicesausermayneed.

Proxy transcoders,on the other hand,performon-the-fly contenttranslationand thus, are a good

generalsolutionfor allowing usersto browsevirtually any Web site. The kind of translationdoneby

theseproxiesincludereductionof imageresolution,modificationof HTML constructsthat cannot be

effectively viewed in smallerscreens(e.g., ProxiWeb rewrites pagesthat containframesso that they

displaythe links correspondingto the frames),andtranslationfrom HTML to otherlanguagessuchas

the wirelessmarkuplanguage(WML) [25]. But sinceWeb pagesmustbe presentedas faithfully as

possible,thesegeneralpurposeproxiesdo not performany personalization— Web pagesarealways

displayedin theirentirety. This is clearlynot theidealsolutionfor somebodyaccessingtheWebthrough

a cellular phonewith a 3-line display. Besides,somefeaturesarehardandsometimesimpossibleto

translate.For example,existing browsersfor thePalm Pilot do not supportJavaScriptandthus,it is not

possibleto guaranteethatpageswith JavaScriptwill behave correctlyin thesebrowsers.It is often the

casethatproxiesarenotableto transcodecomplex pages.

In thispaper, wedescribethePersonalClippersystem.PersonalClipperprovidesaplatformthatallows

end-usersto easilycreateandmaintainpersonalizedclippingsof Web sites. TheseWeb clippingsare

shortcutsto contentandservicesa user(or a groupof users)is interestedin, suchasthe CNN health

headlines,weatherinformation for a specificcity, flight information from Travelocity, or one’s bank

balance.By allowinguserstocreatetheirownWebclippings,aservicecanbeofferedthatispersonalized

and not restrictedto a setof supportedWeb sites,and userscaneasily customizesuchclippings for

specificdevices.

2

multi-versionsites wrappedservices transcodingproxies PersonalClipper

creationcost high high n/a low

maintenance high high n/a low

personalization limited limited none high

coverage low low medium high

Table1: Summaryof delivery techniques

Theprocessof creatingclippingsis quitesimple: it requiresno programmingexpertise, andcanbe

doneby casualWebusers.Furthermore,PersonalClippergeneratesclippingsthatarerobust to certain

changesto Websites,andthustheneedfor maintenanceis reduced.Unlike othersystemsfor creating

personalportals(e.g., Portal-to-Go[15], ezlogin[9]), thePersonalClippersystemoffersprivacy: clipping

creation(andretrieval) canbe donefrom the user’s machine,without the interventionof a third-party

server.

Thestructureof thepaperis asfollows. Westartin Section2 with amotivatingexample.In Section3

we describethePersonalClippersystem,its methodologyandarchitecture.Section4 describeshow the

PersonalClippercanbeusedto createviewsof pagesandservicesthatarewell-suitedto differenttypesof

devices.Relatedwork is discussedin Section5. Weconcludein Section6 with somefuturedirections.

2 Motivating Example

Considerthe following scenario.Julianaplansto attendthe VLDB conferenceandsheis looking for

flights from JFK to Cairo that leave from JFK on September9th, andreturnfrom Cairoon September

16th.Shemusttake thefollowing steps:

� Goto ��������� ��45464�� ��&3��7���� "��� �8�#�!�#"$�� ChoosetheFind/Booka Flight option,� Enterthelogin information,� Choosethe9 BestItinerariesoption,� Specifydetailsof itinerary.

This seriesof steps(depictedin Figure1) producesa pagewith a list of alternative flights. Now, if she

wantsto do this from herPalmPilot throughawirelessmodem,therearesomeproblems:2� manyinteractionsareneededto accesstheflightspage: Giventhehighlatency andlow-bandwidth

of wirelessdataservices,performingall thesestepsthrougha wirelessmodemor on a cellular

phonecan be hard (especiallyif a significantamountof information needsto be input), very

time consuming,andsometimesimpossible(e.g., certainpagesrequireJavaScriptwhich is not

supportedby micro browsers,suchasProxiWeb[18] andAvantGo[5]).

2In fact, we werenot able to accessthe Travelocity site usingeither the ProxiWeb [18] browser, which wasnot ableto

retrieveeventheinitial page,or AvantGo[5], whichdoesnotsupportsecureconnections.

3

Figure1: Sequenceof stepsto retrieve flight itinerariesfrom ��&3��7���� "��� �8� �!�#"$�

http://webclipserver.com/jfreire? clipping=travelcairo&mode=pull

Figure2: Retrieving flight itinerariesfrom ��&9� 7���� ":��� �8� �!�#"$� usingthePersonalClipperserver

4

� irrelevant informationis downloaded:Most times,oneis only interestedin a subsetof the infor-

mationpresentedin a Web page. In this example,not only the usermustdownloada seriesof

intermediatepages,but shemustalsodownloadthe whole Flights pageeven thoughshemight

only need,say, thefirst threeitineraries.Beingableto accessonly thedesiredinformationis espe-

cially importantin thewirelessenvironmentwherebandwidthis scarceandexpensive,andscreen

spaceis limited.

If we considerthe exampleabove, the ideal would be to createa shortcut that gives a one-click

accessto the first threeitineraries(asshown in Figure2). In general,it would be useful if onecould

easilycreatenot only simpleshortcuts,but alsodifferentviewsof Websitesthatarebettersuitedto be

accessedfrom differentterminals.In thePalm Pilot scenario,it would beusefulto reducethenumber

of requiredinteractions,andtheamountof datainput andtransferred.For example,onecouldcreatea

clippingtemplatefor Travelocity thatwouldautomaticallylogin,andalwaysfill in thedepartingcity and

preferredairlinewith default values,andrequirefrom theuserjust thetravel datesanddestination.

3 The PersonalClipper

In this sectionwe describethePersonalClippersystemandits architecture,anddiscussthemain issues

involvedin creatingandaccessingclippings.

Therearetwostepsinvolvedin creatingWebclippings:retrievingaWebpage,andextractingelements

from a retrieved page. Given the growing trendof interactive Web sitesthat publishdataon demand,

retrieving theinformationfrom theWebis becomingincreasinglycomplicated.Many sites,from online

classifiedadsto banks,requireusersto fill a sequenceof forms and/orfollow a sequenceof links to

accessapagethey need,andoften,thesehard-to-reachpagescannotbebookmarkedusingthebookmark

facilities implementedin popularbrowsers. In order to createclippings of thesepages,the process

to accessthemmustbe automated.Also, asdescribedin the exampleof Section2, oncethe desired

pageis retrieved, a usermay want to specify individual elementsof the pagesheis interestedin, so

that irrelevant informationis filteredout. A Webclipping thusmustencapsulatetheactionsrequiredto

retrieve aparticularpage,andthespecificationof whichelementsshouldbeextractedfrom thatpage.

It is possibleto automatethe retrieval of pagesby writing programsin Java or in morespecialized

languagessuchasWebL [11]. Onecanalsowrite Perl scriptsto extract individual fragmentsof Web

pages.However, thisapproachis notfeasiblefor casualWebusersthatarenotprogrammers.In addition,

giventhedynamicnatureof theWeb,maintainingtheseprogramsandscriptscanbeverycostly, asthey

might requiremodificationsevery timeWebsiteschange.

ThePersonalClipperaddressestheseproblemsby providing a VCR-styleinterfacesimilar to theWe-

bVCR[3] to transparentlyrecordbrowsingsteps;anda point-and-clickinterfaceto let usersselectpage

fragments.Furthermore,the systemusestechniquesthat enhancethe robustnessof clippings,so that

they work evenif certainchangesoccurin theunderlyingWebsites.

After aclipping is created,it canbeaccessedthroughaPersonalClipperserver, thatmaybelocatedat

5

gatewayVoice

proxyPalm

proxyWAP

Web

ServerPersonalClipper

http

http

http

httpvoice

wap

ProxiWeb

http

Figure3: AccessingWebClippings

a user’s machine,at a serviceprovider, or insideanIntranet.As Figure3 shows, thePersonalClipperis

aWebservicethatacceptsrequestsfrom HTTPclients.A requestto thePersonalClipperservercontains

an identifier for a particularclipping3, which whenexecuted,accessesa particularWeb page,clips it,

andreturnstheresulting(clipped)pageto therequestingclient.

The architectureof the PersonalClipperserver is shown in Figure 4. The PersonalClipperserver

consistsof thefollowing modules:1) theclipping DB, which storesclipping specifications;2) theuser

profile manager, that performsuserauthenticationfor sensitive clippingsstoredon the server (e.g., a

clipping that retrieves a user’s 401(k) balance);3) the clipping scheduler, that periodically executes

clippings(if sospecifiedby theclippingcreator);4) thecachemanager, thatstorescachedclippings;and

5) theclippingexecutionengine,that interactswith anHTML parser, Javascriptinterpretor, andHTML

contentextractor, to executespecifiedclippings.

In what follows we give a moredetaileddescriptionof clipping creationandexecution.For easeof

presentation,we restrictour discussionto thescenariowherethePersonalClipperserver is hostedasa

Web-basedservicethatausercanaccessusinga Java-enabledWebbrowser.

3.1 Creating Web Clippings

Webclippingshave two components:retrieval andextraction.As depictedin Figure4, thePersonalClip-

perprovidesappletsfor bothtasks:therecordingappletandtheextractionapplet.

In order to createa clipping, a usermustfirst specify the pageto be clipped. If the pagerequires

multiple stepsto be retrieved and doesnot have a well-definedURL, the usercan usethe recording

3As will bedescribedin Section3.2,requestsmayalsoincludeotherparameterssuchasinputvaluesfor theclipping.

6

Browser

RecordingApplet

ClippingExecution

CacheManager

User ProfileManager

ClippingScheduler

HTMLParser

JavascriptInterpretor

ContentExtractor

Clipping DB

ExtractionApplet

Engine

Personal Clipper Server

http

Figure4: PersonalClipperServer Architecture

appletto createthescriptto accessthepage.Therecodingappletis avariantof theWebVCR[3]. It has

a VCR-styleinterfaceto recordbrowsingactions.Whentheuserclicks the record buttonon theapplet,

sheis promptedto input theURL for startingpagewhich is thenloadedinto a browserwindow. From

thispointon,theappletmonitorsall useractionsin thatbrowserwindow4. Thismonitoringis transparent

to theuser, whocansimply navigateherway to thefinal pageasusual.Whentheuserreachesthefinal

page,thesequenceof recordedactions(i.e., links traversed,formsfilled alongwith theuserinputs,and

any otherinteractionswith active contenton thepage5) is saved.

Duringtherecordingprocess,if theuseris requiredto fill out forms,shecanoptionallyspecifywhich

field valuesareto bestoredin theclipping specificationitself, andwhich onesareto berequestedfrom

the userevery time the clipping is executed. This allows the userto createparameterizedclippings.

For example,a clipping to retrieve stockquoteinformationfrom ;=<,� ��>?���:�!�#"$� canhave asa parameter

the ticker symbol,so that theuserdoesnot needto createa separateclipping for eachstock. Also, for

securityreasons,a usermaychooseto not to save certainkindsof informationsuchpasswordsinsidea

clipping,or to save it encrypted.

In contrastwith the currentpracticeof writing wrapperprograms(e.g., using languagessuchas

WebL [11] or Java), the PersonalClippersystemoffers an alternative to quickly andeasilycreateac-

cesswrappers/scriptsthatrequiresnoprogramming— creatingandupdatingthesewrappersis a simple

processinvolving only theusualbrowsingactions.

Oncethedesiredpageis retrieved, theextractionappletcanbeusedto specifythe fragmentsof the

4The appletaddsJavascriptevent handlersto all active elementson the page,and when an event fires, it recordsthe

correspondingaction.For moredetailson themonitoringprocess,thereaderis referredto [3].5Note however that this is currentlyrestrictedto handlingJavascript,andnot arbitraryactive contentsuchasappletsand

pluginsonapage.

7

pagethatshouldbeextracted.An interestingproblemis how to identify thesefragments.In general,any

extractionspecificationchosenneedsto providetheability to 1) addressindividualor groupsof arbitrary

elementsin a page,and2) specifyrules(thatusethe above addressingscheme)to extract the relevant

contentfrom thepage.Wewantedasolutionthatwasstandard,powerful, portableandefficient.

Our first choicewasto usethe DOM API [8] to specifyextractionexpressions.However this API

is ratherlimited, e.g., it doesnot allow the retrieval of tablesfrom an HTML document. We found

XPath [26] to be a better, more flexible addressingschemethan the DOM API. XPath views XML

documentsasa tree and provides a mechanismfor addressingany nodein this tree. Onedrawback

of usingXPath is that it requirespagesto be well-formed. Sincebrowsersarevery forgiving in this

respect,many Web sitesgeneratepagesthat are ill-formed (e.g., have overlappingtags,missingend

tags,etc.).Consequently, thePersonalClippersystemmustfirst cleanup HTML pages(e.g., usingtools

suchasHTML Tidy [10]) beforeusersinputXPathexpressionsover thedocumentto specifythedesired

content.

The XPath expressionsbelow extract the first threeitineraries(eachitinerary is representedby an

HTML table)from theflight selectionpageof theTravelocityexampleof Section2.

@ @�A=BDCFEG@�H�I�J)K=@�L#M#NOBPM#Q=R!S�T9@�B�U$H�EVM,R!S�TXW@ @�A=BDCFEG@�H�I�J)K=@�L#M#NOBPM#Q=R!S�T9@�B�U$H�EVM,R!Y�TXW@ @�AZB�CFEG@�H�I�J,KZ@�L#M NOB[M QZR!S�T9@�B[U$H�EVMR \�T (1)

@ @�A=BDCFEG@�H�I�J)K=@�L#M#NOBPM#Q=R!S�T9@�B�U$H�EVM,R ]^I _�`GBD`DI�N5aGbdcfegUNhJ6]^I _ `GB�`2I�N5aGbjilk�T(2)

@ @�B�U$H�EVM,R L�I�NOB�U`GNm_:aV_ B�Q�`GNonpaGb�q!r[]sQ�`2L#M#rtbmU,NhJj]pI�_�`GB�`2I�N5aGb6cue�T(3)

Theseexpressionscanbe rathercomplicated,andwriting themcanbe an involved task. In addition,

thereare multiple ways to specify a particularpageelement,and somemay be preferableto others

(asexplainedin Section3.3). To addresstheseproblems,we arecurrentlydesigninga point-and-click

interface that lets usersselectportionsof Web pages(as sheseesthem in the Web browser), and it

automaticallygeneratesextractionexpressions.The point-and-clickinterfacewill provide userswith

differentlevelsof abstractionthatcorrespondto abreadth-firstsearchin theportionof thedocumenttree

that is visible in thebrowser. For example,if a useris interestedin particularcellsof a table,hemust

first selectthetableandthen,zoominto thetableto selectthedesiredcells.

Figure5 illustratesaWebclippingspecification(simplifiedfor expositionpurposes)for theTraveloc-

ity exampledescribedin Section2. Thefirst partof a clipping specificationcorrespondsto a sequence

of browsing steps(i.e.,cwv�xzy{i

,c|y6},~�� i

, andc|�/�zxz� i

). Thec|�(�F�dxz�z����i

ele-

mentscontaintheextractionspecifications.Notethatmultiplefragmentscanbespecified,andusersmay

chooseto specifythesefragmentsaccordingto theterminalwherethey will bedisplayed.For example,

8

if theclipping is to bedisplayedin a Palm Pilot, theusercouldchooseto extract thefirst 3 itineraries

(theextractiontagwith � Q U)nC�M#NOB NhUC�Md� r � `GQ)_�B Y `GB�`GNmM Q�UQ�`DM�_ r), whereasif theclippingis to bedis-

playedin a Web-enabledcellularphonewith a 3-line display, a singleitinerarymaybepreferable(e.g.,

theextractiontagwith � Q U,n,C�M#NOB NhUC�Md� r � `GQ)_ B `GB�`GNmM Q�UQ�K r).

Giventheunpredictablebehavior of theWeb(network delays,unreachablesites,etc.),cachingplays

an importantrole in a PersonalClipperserver. Userscanspecifyfor eachclipping, if andhow often it

shouldbeexecuted(e.g., weatherinformationfrom for a user’s hometown shouldberefreshedevery 6

hours)andcached.

3.2 Executing Clippings

After a clipping is specified,it canbesaved,anduploadedto a PersonalClipperserver. Usersmaythen

accessclippingsvia URLs thatuniquelyidentify them.Usersmayfurtherspecifyadditionalparameters

suchasinput valuesfor clipping (e.g., thepassword to accessa bankaccount);the modeof operation

(pull or push);whethertheclippingshouldbecached;andhow oftenit shouldberefreshed.

In the pull mode, the URL invokes a CGI script at the server, which in turn executesthe clipping

specificationandimmediatelyreturnstheclippedcontentto therequestingclient. In thepushmode, the

executionanddelivery of theclipping areasynchronous,i.e., theclipping canbereturnedto theclient

later, possiblythroughprotocolsotherthanHTTP (e.g., clippingscouldbeemailedto users).Thepush

modeis preferablewhenback-endWebsitesareslow or temporarilyunreachable,or whentheenduser

cannotor doesnotwantto keepasessionopenfor too long6.

The clipping executionis as follows. The pagecorrespondingto the startingURL is fetchedand

parsed.Theuseractionsarethenexecutedin sequence,someof whichmightcausenew Webpagesto be

fetched.For example,link traversalsareexecutedby fetchingthecorrespondingURL; form submissions

areexecutedby first filling theform fieldswith therecordeduserinputs,andthensubmittingtheform;

andif thereareany Javascripteventhandlerson elementsof thepagetheuserhasinteractedwith, such

actionsarefilteredthroughtheJavascriptinterpretorto ensurethatthesamehandlersfire duringreplay.

After thefinalpagehasbeenretrieved(andcleaned),theextractionexpressionsareevaluatedto extract

thedesiredcontent.Thiscanbedonewith anXSLT [27] interpretorsuchasXT7. Theextractedcontent

is thenreturnedto theclient.

Note that all processing(retrieval andextraction)is doneat thePersonalClipperserver. Only select

portionsof Webpagesarereturnedto therequestingclient,effectively giving usersaone-clickaccessto

desiredcontent,andconsiderablyreducingthecommunicationbetweentheclientandthePersonalClip-

perserver. This featureis speciallyusefulin wirelessenvironmentswhereusershave to accesstheWeb

throughhigh latency andlow-bandwidthconnections.

SinceWeb pagesmay changebetweenrecordand replay, the PersonalClipperusestechniquesto

ensurethatreplayinga sequenceof recordedactionswill leadto theintendedpage,andthat thecorrect

6Somewirelessservices,suchasSprintPCS,chargefor usagetime.7XT is availableat http://www.jclark.com/xml/xt.html.

9

�F�)� �z���������(���������)��� � ���,� �����$ �� ��¡��¢���)� �%��£�:¤�¥�¦z�

���)��§��z�©¨�ª���d�¢���)��§��z��©� �)��«%�©� �����¬�����#­��$®,¯�  ���)��� � ���,� �����$ �����¡Z� ��� °�¨����°:��¨$ ����)��±���� ¡$��­�ª,���³²´ ¤�� µ�¶¸·�¹,º»²¼¯¸�½��� � ��«�����)����°)���%��¨�ª���d�½�#�)����°)���%��¢���,�¸�£���,��¨)���$  ­��,��ª�¡$� ¨ �$ ¾�)��¨ ¿:®�À Á�Â)�½�����,�¸�

�¢�#�:¤�¥�¦z���Ã,Ä ��Åz�³�³Æ Ç�ÇÈ� ��°:��¨)« ���#¡gÇ�ÇÉ�³�½��Ã,Ä���Åz��£�:¤�¥�¦z�³�³Æ Ç�Ç5Ê�Ë,�)®#�$¤��$��¨,���)���:� �)®�� ��¨ ¿ÈÇ�ÇÌ�³�¢�#�:¤�¥�¦%���Ã,Ä ��Åz�

�©¨,��¡:���³�½��¨,��¡$����Í¡$�����,��­����,�,®#�z�Ρ$�����,��­�����)����°)���%�³�£�#� ����°)������½�,�#�:� ��¨z��� �����$®¸����� ­��$®,¯�  ���)���)�����,� �����$ ��#��¡%� Ï�Ï)Ð ������°�¨ ¡$���¨Z ��#�,��±·�¹,º¸²Ñ¯³�¢� �,�#�:� ��¨z��¢���,�¸�£���,��¨)���$  ­��,��ª�¡$� ¨ �$  «���� ¡s®�À Ò�Â)�½�����,�¸�� ´�Ó�Ó �)·d�

 ? ? � ´�Ó�Ó �(�³�F¨,��¡:�������$��� �����$� ��¨¬�£��¨)��¡$���³�£���,�g�½Ô%�¢�����,�¸��������)�z���)��­� ���¢�������,���³�F� �)� �¬�É®#�)���)��­%�£��� � ���¬����)� ���©�)��ª�¨)­������%�½�#� � �d�³�¢� ´�Ó�Ó ��� ? ? � ´�Ó�Ó �(�³�F¨,��¡:���£­ ���)����� ������)�����%�½��¨,��¡$���³�½���,�g�Õ¯�Ò%�£�����)�d��������)�z���)��§��(�£�������,���³�F� �)���(�¢®#�)��� ��­%�¢��� �)���¬����)� ���½Ö�Ã�¦z�¢���)� �d�³�½� ´�Ó�Ó �z�� ´�Ó�Ó �(�³�F¨,��¡:���£­ ���)����� ¡$��¨ ���¬�¢��¨,��¡$���³�£���,�g�ׯ�¯¸�£�����,�g��������)�z�¢®�� ���,�#�³ÇÍ� ¨,�%�¢�������,���³�F� �)���¬�Ì®#�)��� ��­��¢��� �)� �¬��¢®�� ���,�#�)��­ ��¨ ­ ��§z��Á¬�½�g�³�©� ��§��¬�£·����(�¢���)��§��z�³�£� ´�Ó�Ó �z�� ´�Ó�Ó �(�³�F¨,��¡:���£­ ���)����� ­ ���¬�¢��¨,��¡:���³�½���)�d�ׯ�Ø%�¢�����,�¸��������)�z�¢®�� ���,�#�³ÇÍ� ¨,�%�¢�������,���³�F� �)���¬�Ì®#�)��� ��­��¢��� �)� �¬��¢®�� ���,�#�)��­ ��¨ ­ ��§z�ׯ�ج�£�d�³���)��§��z�Ù¯�Ð%�¢���)��§��z�³�£� ´�Ó�Ó �z� ? ? 

�¢� ´�Ó�Ó �)·g��¢��Ã,Ä���Åz���¹�Ú Ó � ´ Û�Ó «��)��° ¡$��¨ � ¨)��¡$�g²jÜ)«:���:®#� Ð ���$��¨,��� ���$� �,®#Ü,�

�����)��Ý���:À���� ¨ �)���¨:®)ÞG®����$�?¨ °$Þ3ß+à ÜG���$������Ü�ß^��¨)­6�,�)® ���$� � ¨=Þ9ßO�½Ô Â�¢��¹�Ú Ó � ´ Û�Ó ���¹�Ú Ó � ´ Û�Ó «��)��° ¡$��¨ � ¨)��¡$�g²jÜ)«:���:®#� ���:��¨,���)����� Ü�

�����)��Ý���:À���� ¨ �)���¨:®)ÞG®����$�?¨ °$Þ3ß+à ÜG���$������Ü�ß^��¨)­6�,�)® ���$� � ¨=Þ9ßO�½Ø Â�¢��¹�Ú Ó � ´ Û�Ó �

Figure5: WebClipping for retrieving theitinerariesfrom ���D����� ��46454�� ��&3��7���� "��� �8� �!�#"$�

10

fragmentsareextracted— evenwhentheunderlyingpagesaremodified.

3.3 Robustness Issues

Usually, changesto Webpagesdo not poseproblemsto a userbrowsing theWeb,but they do present

a challengeto a systemthatperformsautomaticnavigation. In a sequenceof recordedbrowing actions,

somelinks maycontainembeddedsessionids,andformsmaycontainhiddenelementsthatchangefrom

oneinteractionto the next. Thus,for eachuseractionduring replay, thePersonalClippersystemmust

locatethe correctobject (link, form or button) to be operatedon, and this canbe challengingin the

presenceof changesto Webpages(e.g., suchasaddition/removal of bannerads).

To ensurethat clippingsexecuteproperlyandretrieve the intendedpage,enoughinformationmust

besavedfor eachaction.For example,for a link traversalthePersonalClippersaves: theDOM address

of the link, its text andURL. During replay, if an exactmatchfor the link cannotbe found in a page,

heuristicsare usedthat try and find the closestmatch for it. For a more detaileddiscussionon the

heuristicsused,see[3]. Notehowever that if thepagestructurechangesradically, theseheuristicsmay

fail, in whichcasetheclippingwill needto bere-recorded.

Extractionexpressionscanalsobemaderobust to changesto Webpages.For example,in theXPath

expression(1) above, if anew �#�����[��& tagis addedto thedocument,theexpressionwill no longerretrieve

thecorrecttables.Besidestheindex of theparticularnodeto beextracted,thespecificationmayinclude

extra informationthat helpsthe systemidentify certainelementsif the indiceshappento change.For

instance,theXPathexpression(3) specifiestableswith anindex lessthan5 andthatcontainthe“price”

string — this expressionwould still retrieve the correctitinerarieseven if new �#��������& tagsareadded.

Robustnesscanalsobeimprovedby addingredundancy in thespecification,for example,thepathfrom

therootof thedocumentto theelement,andsomecontextual information(suchassurroundingtext) [16].

4 Delivering Clippings to Diverse Terminals

ThePersonalClipperfunctionsasa Webservice,andasFigure3 shows, thedestinationfor theclipped

contentcanbe any user-agentthat understandsHTTP (e.g., a browseron a user’s desktop).The Per-

sonalClipperplatform can thus be usedto createpersonalportalslike �g������D�+� ��)� �!�#"$� that puts to-

getherWebclippingswith informationfrom variousWebsites,andthatuserscanaccessfrom theirWeb

browserswith a single-click.ThePersonalClippercanalsobeusedin conjunctionwith othergateways

andtranscodingproxiesto provide contentto devicesthatdo not handleHTTP/HTML, for example,it

canbeusedtogetherwith aWAP gateway.

Therearemany benefitsto usingthePersonalClipperfor delivering informationto diverseterminals.

By offloadingall processingandmostnetwork communicationto a server, it fits well thethin-clientar-

chitectureusedfor wirelessdevices.In addition,by customizingandfiltering content,it cansignificantly

simplify Webpages,makingthe job of transcodingproxiesa lot easier. In this section,we examinein

moredetail someof the issuesinvolved in usingthe PersonalClipperin conjunctionwith transcoding

11

proxies.For simplicity, we focusonWAP proxies.

The WirelessApplication Protocol(WAP) is basedon a 3-tier architecturewherethe centralcom-

ponent,the gateway, is responsiblefor encodinganddecodingrequestsfrom wirelessdevicesto Web

serversandvice-versa.As Figure3 illustrates,asa userbrowsestheWebthrougha Web-enabledcel-

lular phone,requestsaresentto a WAP gateway. TheWAP gateway decodesandexecutestherequests

(e.g., a URL fetch). Whentherequesteddocumentis retrievedfrom theWeb,it is translatedinto WML

(WirelessMarkupLanguage),appropriatelyencoded,andreturnedto thephone.SinceWAP gateways

talk HTTP andHTML, it is straightforward to useany existing WAP gateway togetherwith a Personal-

Clipperserver.

WAP providesa pushframework [23] thatcanbeusedin conjunctionwith thePersonalClipperpush

modeto providebatch/asynchronouscontentretrieval. Theusagescenariois asfollows. An enduserre-

questsaclippingfrom aPersonalClipperserverby specifyingits URL andoptionallyasetof parameters

(e.g., thefrequency of push).ThePersonalClipperserver would thenactasa pushinitiator, periodically

retrieving andfiltering thespecifiedcontent,andpushingit to theuser’sdevicevia apushproxygateway.

Notificationservicescouldalsobebuilt usingthepushmechanism.For example,rulescouldbeadded

to theclippingspecificationthatdictateunderwhatconditionstheclippingshouldbepushedinto thede-

vice. For devicesthatdonot supporta pushframework, differentmechanismsmaybeused:specialized

servers/gateways could be layeredon top of the PersonalClipperto sendinformationto pagers,email

addresses,or convert contentto speechandsendit to avoicemailbox.

To enablesecuree-commerceservices,andto allow endusersto accesssensitive information(e.g.,

401(k) balance),thereneedsto be a mechanismto provide securitybetweenthe device andthe back-

endWebsites.SincethePersonalClipperserver executesrequestson user’s behalf,it is not possibleto

establishanend-to-endsecureconnection.Thenext bestscenariois to have two secureconnections:one

betweenthedevice andthePersonalClipperserver, andanotherbetweenPersonalClipperandtheback-

endWebsite.For WAP devices,thisrequiresWTLS (WirelessTransportLevel Security)[24] to provide

application-level security, ratherthansecureconnectionsbetweentheuser-agentandtheWAP gateway

only. In thisscenario,for devicesthatcannothandleHTML, thetaskof transcodingtherequest/response

mustbe performedat the PersonalClipperserver, sincea separateWAP gateway would not be ableto

accesstheencrypteddataflowing from thePersonalClipperserver to theWAP device.

Note that tighter couplingbetweenthe PersonalClipperandtranscodingproxiesis possible. In this

scenario,the PersonalClippercould be usedas a universalserver that acceptsrequestsfrom various

devicesandreturnscontentformattedaccordingwith the type of the device — multi-device clippings

couldbecreatedthat specifyhow theclipping shouldbedisplayedin differentdevices. A á �ãâ(}:�¬�tagcouldbeaddedto clippingspecifications(seeFigure5), for example:

��ä�¹�å$¤ Û ¹5�����,�g²jÜ:������� ª����� ���,��¨,��Ü�¡$��­�� �³²jÜ�¨,��¿$� ��Ê�Ò�Ò�Ò�Ü,���ä$¤ ·�æ � ´�ç «��)��° ¡$��¨ �¸² Ü «���:®#� ���$��¨)���)����� Ü �d�

�¢��ä�¹�å$¤ Û ¹%�

The á �ãâ/}:�¬�tag may alsocontaininformationaboutgeneralcapabilitiesandcharacteristicsof the

12

device,assuggestedin theW3CCC/PPnote[6].

5 Related Work

The areaof informationdelivery to heterogeneousdeviceshasattracteda lot of attentionrecently. In

the domainof wirelessdevicessuchasPDAs andcellular phones,the WirelessApplication Protocol

(WAP) initiative [25] is workingonstandardsolutionsto enablewirelessusersaccessto secure,reliable,

statefultransactionservicesvia resourceconstrainedportableterminals[12]. The main objectives of

theWAP Forumare: to bring Internetcontentandservicesto digital cellularphonesandotherwireless

terminals;createa protocol that will work acrossdiffering wirelessnetwork technologies;enablethe

creationof contentandapplicationsthatscaleacrossa very wide rangeof bearernetworks anddevice

types.WAP thushasthepotentialto enabletransport-independent client/servercommunicationssessions

from portabledevices over wirelesslinks. However, WAP also facesimportantchallenges.WAP is

basedona3-tierarchitecturewherethecentralcomponent,thegateway, is responsiblefor encodingand

decodingrequestsfrom wirelessdevicesto Webserversandvice-versa.Giventhegrowing complexity of

Websites(e.g., thepresenceof scriptinglanguages,dynamiccontent,malformedcontent),transcoding

canbe very hard,and in practice,many pagesandservicesare just not amenableto transcodingand

cannotbeaccessedthroughWAP. By allowing usersto easilycustomizeservicesandfilter out irrelevant

contentand complex features,the PersonalClippersystemgreatlysimplifies the transcodingprocess,

increasingtheWebcoveragefor WAP devices.

The simplificationof transcodingappliesto domainsotherthanWAP. The PhoneBrowser[17] pro-

videsa programmableplatform that gives the generalpopulationof Web pageauthorsthe meansfor

building Interactive Voice Response(IVR) systemswithout having to own any IVR equipment.Web

IVR applicationsarecurrentlybuilt usingtranscoderssuchasSpyglass’Prism[20]: contentproviders

write Prismscriptsto transcodeexisting pagesinto versionsthatmoreamenableto beingreadout. The

PersonalClippersystemcan be usedas an alternative to automaticallygeneratethesescriptswithout

requiringusersto write programs.

In theareaof informationintegration,many systemsandtechniqueshavebeenproposedto wrapWeb

sites. Most of the work in this area,though,focuseson extractingstructurefrom semi-structureddata

(e.g., [4, 1]). Theextractorcomponentof thePersonalClippersystemis notconcernedwith understand-

ing thestructureor discoveringtheschemaof theunderlyingdata,but in providing robustmechanismsto

identifyinghigh-level HTML or XML syntacticcomponents(e.g., thefirst tableafteraspecificstring).

Thefirst versionof PersonalClipperusesXPathto addressspecificcomponentsto beextractedfrom

Webpages.Otherlanguagescouldalsobeusedfor this purpose,for exampleWebL [11] or thescheme

usedby W4F[19]. Theselanguagesprovidegoodmechanismsto extractfragmentsfrom documents– in

somecases,they areeasierto usethanXPath. However, XPathis a widely acceptedstandardandthere

arefreelyavailabletoolsto processXPathexpressions.

Recently, therehasbeena proliferationof personalizationsystemswhich offer servicesthat range

from notificationsaboutchangesto certainWeb pages(e.g., Mindit [13]) to the creationof personal

13

portals(e.g., Portal-to-Go[15], ezlogin[9], Yodlee[28]). Thesesystemshave somedrawbacks,most

notably:� Limitedcoverage: servicesoffer clippingsfor a limited numberof sites.For example,Yodlee2Go

[28] allows usersto checkflight info on Expedia,but it doesnot allow usersto accessExpedia’s

rentalcarservices.8� Lack of privacy: in orderto usetheseservices,usersareforcedto go throughthird-partyservers

thatcanobserve all userinteraction(passwordsinputaswell ascontentretrieved).

ThePersonalClippersystemaddressesbothof theseproblems:it letsuserscreateclippingsfor virtually

any Website/page;andbyplacingaPersonalClipperserveratauser’smachine,it offerscompleteprivacy

— clippings can be createdand accessedwithout the needto go througha third-party server. It is

worth pointingout thateventhoughthemainmotivationfor PersonalClipperis to provide personalized

Webclippingsto end-users,thesystemcanalsobeusedby portalservicesto simplify thecreationand

maintenanceof specializedwrappers.

6 Discussion

ThePersonalClipperprovidesaplatformthatletsend-usersaswell ascontentproviderseasilycreateand

maintaincustomized(andsimplified)views of Webpagesandservices.Thesecustomizedclippingsare

easyto create(clippingcreationrequiresnoprogrammingexpertise);requirelow maintenance(they are

robust to certainchangesin theunderlyingWebsites);they arehighly customizable;andclippingsmay

becreatedfor virtually any Website.

Whenusedasa Webservice,thePersonalClipperserver performsall processing(pageretrieval and

contentextraction) requiredto constructa clipping, andreturnsto the requestingclient just the final

clipping. By reducingthenumberof interactionsrequiredto retrieve hard-to-reachpages,andallowing

usersto customizeclippingsso that needfor datainput is diminishedandonly the desiredcontentis

retrieved,thePersonalClippercanbeanintegral partof a thin-clientarchitecturefor contentdelivery. It

canbe speciallyusefulin environmentswhereusershave to accessthe Web throughhigh latency and

low-bandwidthconnections.

Furthermore,in conjunctionwith transcodinggateways,customizedclippingscanbeusedto provide

contenttodevicesthatdonothandleHTTP/HTML, for example,it canbeusedtogetherwith aWAPgate-

way. An importantadvantageof usingthePersonalClipperin thisscenariocomesfrom thecustomization

andcontentfiltering, whichcansignificantlysimplify Webpages,makingthejob of transcodingproxies

a lot easier.

Finally, it is worth pointingout thatWebclippingscontainusefulinformationaboutthecapabilities

of Webservices,suchasfor instancetheattributesneededto retrieve a certainWebpage.It would be

interestingto investigateif andhow this informationcanbeusedto facilitatenot only thediscovery and

selectionof specificservices,but alsotheprocessof combiningdifferentservices.

8Note thateven thoughezlogin.com[9] offers theoption for usersto createtheir own Webclippings,they arenot ableto

createclippingsof certainhard-to-reachpages,thatfor exampleinvolve JavaScriptactions.

14

Acknowledgements: The authorsthankJayantHaritsafor usefulcommentson the first draft of this

paper.

References

[1] B. Adelberg. NoDoSe- atool for semi-automaticallyextractingstructuredandsemi-structureddata

from text documents.In Proc.SIGMOD, pages283–294,1998.

[2] Amazonanywhere.http://www.amazon.com/anywhere.

[3] V. Anupam,J.Freire,B. Kumar, andD. Lieuwen.AutomatingWebnavigationwith theWebVCR.

In Proc.of WWW, pages503–517,2000.

[4] N. AshishandC.A. Knoblock.Wrappergenerationfor semi-structuredinternetsources.SIGMOD

Record, 26(4):8–15,1997.

[5] Avantgo. ���D����� ��46454��¾��7 �,���Dè)"Z�!�#"$� .

[6] HTML Tidy. ���D����� ��46454�� 4³é$�!")&'è��êdëh�:ìXí¸êgîïPðñðóòñò .

[7] CDPD. http://www.wirelessdata.org/develop/cdpdspec.

[8] DOM. �������:�� ��45464�� 4»é$�!",&'è��êgëh�:ëóî^ð^ï?ôjí¬õ/ï?ö���7���� ï?÷ .[9] ezlogin. ��������� ��45464��!��0 � "�è$����!�#"$� .

[10] HTML Tidy. ���D����� ��46454�� 4³é$�!")&'è�:òZ�#"$�� ���:ëO��è è,�����P����� -)� .[11] T. KistleraandH. Marais. WebL: a programminglanguagefor theWeb. In Proc.of WWW, 1998.

http://www.research.digital.com/SRC/WebL/index.html.

[12] JamesKobielus. Wirelessapplicationprotocol. TechnicalReportv1, The Burton Group,April

2000.

[13] Mind-it. ���D����� ��46454����������(����-s�!�#"$�%� .[14] Omnisky. http://www.omnisky.com.

[15] Portal-to-go.���D����� ��46454%�!")&3����� ����"$1)�*� � �!�#"$� .

[16] T. PhelpsandR. Wilenski. Robust intra-documentlocations. In Proc. of WWW, pages105–118,

2000.

[17] Phonebrowser. http://phonebrowser.research.bell-labs.com/.

[18] ProxiWeb. �������:�� ��45464�����&t"�.:���������!�#"$� .

15

[19] ArnaudSahuguetandFabienAzavant.Building light-weightwrappersfor legacy webdata-sources

usingW4F. In Proc.of VLDB, pages738–741,1999.

[20] Spyglassprism. http://www.spyglass.com.

[21] Voicexml. http://www.voicexml.org.

[22] http://www.sprintpcs.com/wireless/wwbrowsing providers.html.

[23] Wappusharchitecturaloverview. http://www.wapforum.org/, November1999.

[24] Wirelesstransportlayer securityspecificationversion1.1. http://www.wapforum.org, November

1999.

[25] WirelessApplication ProtocolForum. WirelessApplicationProtocol: The CompleteStandard.

Wiley, 1999.

[26] XPath. �������:�� ��45454%� 4»é$�!",&'è��êgëh� .:������� .[27] XSLT. ���D����� ��46454�� 4³é$�!")&'è��êdëh� .��#� � .[28] Yodlee2go.���D����� ��46454�� � ":-=� �#� �!�#"$� .

16