Personalization of information access for electronic catalogs on the web

21
Electronic Commerce Research and Applications 1 (2002) 20–40 www.elsevier.com / locate / ecra Personalization of information access for electronic catalogs on the web a, b * Benjamin P.-C. Yen , Robin C.W. Kong a School of Business, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong b Department of Industrial Engineering and Engineering Management, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong Accepted 10 April 2002 Abstract To attract and keep people at a web site is one of the major challenges in electronic commerce. More advertising, providing more useful information, or designing more flashy pages cannot prevent the Internet surfers from getting lost in the huge amount of information, especially the electronic catalogues. One solution is to customize the website for each for individual user through the analysis of preferences and interests in the user profile. The site should be able to customize itself on information content, information organization and information display. Information collection, analysis, and customiza- tion form a process to improve and customize the web site for each user, without Webmaster interference in normal operation. In this paper, a Personalized Electronic Catalogue (PEC) System is proposed to synthesize the customization of information content, organization, and display for electronic catalogs. An industrial application is used to demonstrate the improvement of information access for electronic catalogs. 2002 Elsevier Science B.V. All rights reserved. Keywords: Electronic catalogs; Customization; Information access; Electronic commerce; Personalization 1. Introduction about the company, product or service; (2) the web sites provide well-organized electronic catalogs with The World Wide Web (WWW) has emerged as the functions to help users to browse products; and (3) media with the most potential market. In order to the web sites support transaction functions in addi- gain the strategic advantages of future competition in tion to electronic catalogs for on-line trading. How- Electronic Commerce (EC) on the Web, many ever, many web sites are loaded with a large amount companies have established their web sites as a of information, especially the electronic catalogs in business frontier even without any profit. In general, electronic shopping malls (such as Amazon [1] and most commercial web sites can be categorized into Yahoo!Shopping [2]) and auction sites (such as eBay three categories: (1) web sites provide information [3] and Yahoo!Auctions [4]). For example, eBay provides 4320 categories with 4 million auctions and 450 000 items added each day. *Corresponding author. Tel.: 1 852-2241-5668; fax: 1 852- It is not surprising that users feel lost and frus- 2858-5614. E-mail address: [email protected] (B.P.-C. Yen). trated due to the disorganized, obsolete, irrelevant 1567-4223 / 02 / $ – see front matter 2002 Elsevier Science B.V. All rights reserved. PII: S1567-4223(02)00004-2

Transcript of Personalization of information access for electronic catalogs on the web

Electronic Commerce Research and Applications 1 (2002) 20–40

www.elsevier.com/ locate/ecra

P ersonalization of information access for electronic catalogs onthe web

a , b*Benjamin P.-C. Yen , Robin C.W. KongaSchool of Business, The University of Hong Kong, Pokfulam Road, Hong Kong, Hong Kong

bDepartment of Industrial Engineering and Engineering Management, The Hong Kong University of Science and Technology,Clear Water Bay, Kowloon, Hong Kong

Accepted 10 April 2002

Abstract

To attract and keep people at a web site is one of the major challenges in electronic commerce. More advertising,providing more useful information, or designing more flashy pages cannot prevent the Internet surfers from getting lost in thehuge amount of information, especially the electronic catalogues. One solution is to customize the website for each forindividual user through the analysis of preferences and interests in the user profile. The site should be able to customize itselfon information content, information organization and information display. Information collection, analysis, and customiza-tion form a process to improve and customize the web site for each user, without Webmaster interference in normaloperation. In this paper, a Personalized Electronic Catalogue (PEC) System is proposed to synthesize the customization ofinformation content, organization, and display for electronic catalogs. An industrial application is used to demonstrate theimprovement of information access for electronic catalogs. 2002 Elsevier Science B.V. All rights reserved.

Keywords: Electronic catalogs; Customization; Information access; Electronic commerce; Personalization

1 . Introduction about the company, product or service; (2) the websites provide well-organized electronic catalogs with

The World Wide Web (WWW) has emerged as the functions to help users to browse products; and (3)media with the most potential market. In order to the web sites support transaction functions in addi-gain the strategic advantages of future competition in tion to electronic catalogs for on-line trading. How-Electronic Commerce (EC) on the Web, many ever, many web sites are loaded with a large amountcompanies have established their web sites as a of information, especially the electronic catalogs inbusiness frontier even without any profit. In general, electronic shopping malls (such as Amazon [1] andmost commercial web sites can be categorized into Yahoo!Shopping [2]) and auction sites (such as eBaythree categories: (1) web sites provide information [3] and Yahoo!Auctions [4]). For example, eBay

provides 4320 categories with 4 million auctions and450 000 items added each day.*Corresponding author. Tel.:1 852-2241-5668; fax:1 852-

It is not surprising that users feel lost and frus-2858-5614.E-mail address: [email protected](B.P.-C. Yen). trated due to the disorganized, obsolete, irrelevant

1567-4223/02/$ – see front matter 2002 Elsevier Science B.V. All rights reserved.PI I : S1567-4223( 02 )00004-2

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 21

information when they surf the information sphere. It out the need for integrating the electronic productis getting worse when the number of web sites and catalog into the technological, organization andthe volume of data associated soar. Even though strategic aspects of a company in order to gain thevarious search engines are provided in many of these real advantages of electronic catalogs, which issites, they cannot guarantee that the user can really capable of helping day-to-day organization manage-find what they are looking for because they may not ment. Koch and Turk [8] illustrate the developmentbe able to specify some attributes or keywords in the process of the electronic product catalog and presentquery. From NUA Internet survey [5], there are the architecture of the EPK-fix system, which con-many users having problems with searching the structs electronic production catalogs systematically.product information. The preliminary goal of an EC Handschuh et al. [9] describe the concept of Mediat-web site is to attract and keep visitors. It is more ing Electronic Product Catalog (MEPC) based onefficient and cost effective to keep existing and Q-Technology. MEPC is an intermediary betweenattract new visitors by providing better services than buyers and sellers, which adds value to both sup-just providing more unique information, attractive pliers and customers through the formation of alayout, or more advertisement. One way to solve the federated system of autonomous electronic productproblem of information overloading is to provide a catalogs. Palmer [10] conducted a survey to comparecustomized web site for each individual visitor. The the similarity and difference among four differentinformation customization is based on the idea that B2C electronic catalog sites (beverage, music, com-information can be treated as a kind of product puter and clothing) with ordering capabilities eithercustomizable. However, before any customization through on-line or telephone purchasing. Researcherscan be done, it is required to know about the users [11–13] has given the definition of electronicfirst—their preferences. Web technology not only catalogs in informational and functional perspective,enables reaching out to the WWW, but also collec- as well as the role of electronic catalogs in electroniction of valuable information from the users implicitly commerce.or explicitly as well. By analyzing this information, In general, the research for information access onwe can construct the user profile to know and serve WWW can mainly be divided into four groups basedthe user. Hence, a user profile can be constructed and on their application and scope. The first group is webthe web site can be customized for each user to site customization based on user access information.obtain the information desired. The second group is agents based intelligence search

Many researchers have been seeking the way to for information retrieval and discovery. The thirdresolve the problem of information overloading. In group provides intelligence browser and agent tothe following, a literature review of research on support user navigation on the Internet based on userelectronic catalogs and information access on the preferences. The last group is concerned with theweb is provided. There is a lot of research carried collection of user information on the Web.out in the electronic catalog area related to EC. Perkowitz and Etzioni [14–17] proposed an Artifi-Ginsburg et al. [6] introduced three electronic cial Intelligence (AI) approach to create the Adaptivecatalog models, the ‘Do-It-Yourself’ Model, the Web Site, which can improve the site organizationThird-Party E-catalog Integrator Model, and the based on the users access log with the assumption ofReal-Time Knowledge Discovery Model. In the ‘Do- each originating computer corresponding to a par-It-Yourself’ Model, the buyer company initializes the ticular user. PageGather, based on the clusteringdevelopment of electronic catalogs; while in the algorithm, processes the access log and measures theThird-Party E-catalog Integrator Model, the firm co-occurrence frequencies between pages to generateseeks help from a third-party to develop the master a similar matrix and the corresponding graph. Clus-electronic catalog and then rent the access service. In ters are then extracted from the graph and ranked tothe Real-Time Knowledge Discovery model, the eliminate the overlap. The Webmaster selects clustersbuyer firm uses the advanced software technique, that are associated with an index page of links withinsuch as agents, to carry out the real-time information it. Yan et al. [18] proposed the use of user accessdiscovery on the Internet. Garcia Gosalvez [7] points patterns to generate hyperlinks, which are captured

22 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

in the access log and analyzed off-line in an interval et al. [24] presented the development of intelligentbasis, to improve the information access. Each user personal Internet agent based on automatic textualsession is formulated as ann-dimensional vector analysis of Internet documents and hybrid simulatedwhere each element refers to a page of interest. The annealing algorithm. With the capability of com-clustering algorithm is used to discover the cluster of parison of similarity for Internet document, theusers with similar interests from the vectors. When a hybrid simulated annealing algorithm is used touser browses through the web site, the matched search for the relevant Internet document for thecluster is used for the corresponding hyperlinks. user. Tu and Hsiang [25] proposed an InteractiveSarukkai [19] used a Markov Chain model based on Information Retrieval (IIR) agent architecture madethe user access information for link prediction and up ofGroup agent which handles group knowledgepath analysis. The URL or HTTP request is formu- and preference, andPersonal agent which keepslated as the Markov states and the transition matrix is track of the individual user profile. Each agentestimated for path prediction. In addition to link operates and achieves its goal through the collabora-prediction, the Markov Chain model also provides tion of its subagents to generate category knowledgeweb server HTTP request prediction, adaptive web of group and individual preferences at the documentnavigation, web tour generation, and personalized vector model and clustering algorithm.hub/authority. Wang et al. [20] proposed a personal- The third group of research concerns navigationized product information filtering model to filter and assistant for user during the browsing process.rank the product information with linear functions on Joachims et al. [26] introduced the Web-Watcher,the user preference. Only matched items are pre- based on a learning approach with user feedback tosented to the user for selection. The user preference improve the quality of advice for navigation interac-is updated with inductive learning method in the tively. Similarly, Liaberman [27] introduced theselection process. intelligent agent,Letiza, which works with a conven-

The research in the second group focuses on tional web browser to keep track of the user brows-intelligence agents to help the user to seek for the ing behavior and interests. Furthermore, Berghel etinformation on the Internet. Cheung et al. [21] al. [28] presented a web browser called the ‘Cyber-proposed a model of four-level classification tool browser’ to customize the information access for thewhere the level four tool has the property of learning content within the web page, which include keywordthe behavior of both information user and infor- and sentence extraction according to user selection.mation source. The web tool is composed of three In addition to the three groups of research men-agents to help the user retrieve the desired infor- tioned above, some papers are mainly about gather-mation on WWW, and they are the Learning Agent, ing user information on the Internet. Lin et al. [29]the Monitor Agent, and the Suggestion Agent. Chen described an approach for capturing user accessand Kuo [22] propose a personalized information patterns on the WWW to address the problem thatretrieval system based on the user profile modelled the web server will only recognize the proxy serveras the Semantic Relevance (SR) and Co-occurrence instead of the individual user. The method used is(CO) of keywords to capture the real meaning of the called ‘page conversion’ and each page in the site isuser query. The process is composed of issuing encoded into a cipher in the server-side. When thequery, enhancing query, selecting document, and user requests a page, a client-side program (decipher-updating profile. Chang et al. [23] identified that the ing module) is downloaded from the server andmanually constructed hyperlinks have the property of reports the event of page access to the Access Patterndecreasing relevance as the number of hyperlinks Collection Server (APCS) before deciphering thebetween two pages increase and some relevant encoded page and presenting it to the user. Richar-information far from the root page is difficult to dson [30] does a comparison of the existing tools todiscover. They present a Site Traveling Algorithm gather the access information on the Internet, such as(STA) to discover the relevant information, in which visitor counter and guest book.the relevance of the retrieved document is evaluated From the review above, although much researchwith the content popularity and richness (CPR). Yang has been done on information retrieval and site

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 23

organization, there lacks an integration of these two distribution. In general, the definition of an electronicaspects. The research mainly considers general Web catalog can be viewed from two aspects—functioninformation retrieval rather than the electronic and structure.catalogs in electronic commerce. In addition, there islittle research on the customization for electronic 2 .1. Functioncatalogs on the WWW, which is one of the majorsources for the problem with information overload- In a broad view, the functions of electronicing. Furthermore, the customization of electronic catalogs can be seen as virtual gateways for thecatalogs should include the personalization of in- buyer to obtain information about the product,formation presentation that is critical for product purchase the product, make payment, access supportselection and evaluation. In this paper, we focus on service, and cooperate with the sellers [12]. In athe issues of customization for electronic catalogs narrow view, electronic catalogs help the buyer tofrom the aspects of information organization, retriev- browse through the collection of product informational and presentation. A system, called Personalized in multimedia and retrieve the relevant subset ofE-Catalog (PEC), is proposed to customize and product information [13]. It enables the buyer topersonalized information access of electronic retrieve the reference for the selection of productcatalogs on the Web. The system design and im- from the entire collection to assist the sourcingplementation is presented with an industrial example. process [9]. A functional definition of electronicThe minimum number of pages accessed (Min NPA) catalogs provided by Stanoevska-Slabeva andis used as the comparison variable to demonstrate Schmid [11] is as, ‘‘ . . . interactive and multimediathat the PEC system can improve performance in two interface between buyers and sellers on the Internet,examples. We conclude with the strength of the which support product representation, search andapproach and future research directions. classification and have interface to market services as

negotiation, ordering and payment. . . ’’ In addition,the functional definition of electronic catalog can beviewed from the perspective of both the seller and

2 . Overview and definitions the buyer. For the seller, it is an integrated back-endsystem for data update, customization of catalog

A catalog originated from the library as a descrip- layout, and collection of customer information; whiletive list of the library collection about 2000 years for the buyer, the searching and browsing functionsago [31]. As time went by, the use of the catalog help product selection and the ordering process [7].became more popular and no longer restricted to In this paper, we define theelectronic catalog as anbooks in the library. The styles and facilities of the interactive and multimedia information space on thecatalogs are changing along with the media carrying Internet with information creation, update, browsing,the catalogs. The paper catalogs provide product searching, presentation, classification, and customi-information, colorful photos, and an index for zation facilities to assist catalog users to construct,searching. With the adoption of computer technolo- maintain, and retrieve the product information. Ongy, the catalog evolutes to electronic media (such as top of the primitive information processing functionsCD-ROM) for its cost effective and large storage mentioned above, the electronic catalogs also pro-capacity nature. In the electronic media, the style of vide negotiation functions for communication andthe catalogs is no longer restricted to pictures and transaction functions for purchasing to facilitate thetext; audio, video and hypertext are also used for procurement business processes (Fig. 1). In general,information presentation. Other than a simple index, sellers are more concerned about creating and updat-sophisticated functions (such as keyword/ image ing information in the electronic catalogue. On thesearch engines) are also provided for users to search other hand, buyers are more interested in browsing,the information in the catalogs. As the Internet searching, presenting, and customizing the electronictechnology emerges, catalogs are also migrating onto catalogue. The classification, negotiation, and trans-the Web for on-line access, quick update, and easy action function are important for both parties.

24 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

group of products inside the electronic catalog. Eachrecord in the catalog represents a product andconsists of a set of attributes and the correspondingvalues,h(A, n)j, to denote the product information,such as price, manufacturer, functions, and images.The records are classified according to the attributeand value pair (A, n). The records share somecommon set of attributes for classification. Theattribute of a record in the catalog can be denoted asa set of hhA j,hA j,hA jj, where hA j is the set ofa c o a

common attributes for all records in the catalog andmust not be an empty set,hA j is a set of commonc

attributes for some records in the catalog, andhA j iso

a set of attributes only for a particular record. Forexample, there are three recordsa, b, g in thecatalog, and denoteA as the common attribute forhsjFig. 1. Functional definition of electronic catalogs.the set of recordshsj. In Fig. 3, in order to form acatalog,hA j but hA j, hA j, hA j, hA j, hA jabg ab ag bg a b

2 .2. Structure or hA j can be empty. The records can be classifiedg

by the value of the common attribute in the catalogFrom a structural point of view, the electronic as follows (Fig. 3):

catalog can be defined as a collection of classified hA j5 hA , A jabg L11 L12

information and presented in forms of a catalog tree hA j5 hA jbg L2

as shown in Fig. 2. In the following, the wordrecord hA j5 hA ja 1

(a node in Fig. 2) is used to represent the set of hA j5 hA jb 2

information describing a particular product or a hA j5 hA jg 3

Fig. 2. A catalog tree.

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 25

the electronic catalog not only includes the ordinaryelectronic catalog of products, services or infor-mation, but also the Internet directory service, suchas Yahoo, a catalog of URLs. Therefore, although astandard product catalog is used to demonstrate thepersonalization of the electronic catalog, similartechniques can also be applied to other kinds ofelectronic catalogs.

3 . Personalization of electronic catalogs

The approach of Web site customization on in-formation content, access, and presentation is pro-

Fig. 3. Attribute set for three recordsa, b andg in catalog.posed to tackle information overloading on theInternet. The customization process starts with the

a 5 h(A , n ), (A , n ), (A , n )j information collection about the user explicitly orL11 111 L12 121 1 a

b 5 h(A , n ), (A , n ), (A , n ), (A , implicitly on-line during navigation of the catalog.L11 112 L12 121 L2 21 2

n )j The information is then analyzed for profile con-b

g 5 h(A , n ), (A , n ), (A , n ), (A , n )j struction. Various personalization approaches can beL11 112 L12 122 L2 22 3 g

Thus, the catalog tree can be constructed by taking applied to personalize the electronic catalog based onthe common attribute in the set of records, for the profile. The whole process is repeated as shownexample,A and A for the set ofa, b andg. in Fig. 5. There are three main steps for catalogL11 L12

Based on the value of attributeA of each record, customization, namely information collection, infor-L11

two group of records are identified as shown in Fig. mation analysis, and personalization.4(a), (a) with valuen and (b, g ) with valuen .111 112

By applying a similar method for the record set (b, 3 .1. Electronic catalog personalization cycleg ), we can obtain Fig. 4(b). Therefore, searchingdown the catalog can uniquely identify a record. There are two main issues for information collec-According to the structural definition, the scope of tion—what and how. The information collected can

Fig. 4. Record classification process.

26 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

analyze information depending on the types of data.The detailed information about the page content isnormally well organized and ready for use; however,user preferences on presentation are collected im-plicitly through user navigation which needs to beprocessed by statistical analysis or other techniques(such as clustering) for usable format used in profileconstruction.

The personalization of electronic catalogs canactually be viewed in three different aspects, theinformation retrieval, the site organization, and theinformation presentation. In information retrieval, thefocus is on the information content retrieved from

Fig. 5. Electronic catalog personalization cycle. the electronic catalog. After submitting a particularsearching query, the user receives various amounts

be date, time, file accessed, and the IP address of the of information based on the constraints and criteria.remote machine, which normally are available in the If the volume of returned information is too large,log of web server log. Many commercial software then the user may need to refine or continue thepackages, such as WebTrends Log Analyzer [32], are query, which is usually time-consuming. In order toavailable for analyzing and extracting valuable in- customize the search result based on user prefer-formation from the server log and generating a report ences, the information can be either filtered orof the access patterns and statistics. However, due to ranked. Although the filtering process is straight-the common use of the proxy server for reasons of forward, the user may change the interests from timesecurity and resource utilization, analyzing the web to time and useful information may be filtered outserver log has one major drawback—normally it is accidentally. Thus, the query result is ranked ornot able to track the access pattern of the individual highlighted based on the user preferences.user. The proxy server functions as a middleman The main objective of site organization is tobetween the user and the web server. The server log provide a well-organized site structure for the user towill only record the access from the proxy server access the pages containing the information desired.rather than from the user directly, so the log only Since physically there are only a few pages in CGIrepresents the aggregated access patterns of all users. and the pages are inter-connected, improving theThough solutions like Page Conversion [33] can physical structure of the web site is not as critical asalleviate this problem, additional downloading of a page prediction. Although the query result can bedecryption program and processing time of page described as a set of pages, actually it is the subset ofconversion are required. In addition, the information the information residing in the electronic catalog,collected is normally restricted to the file level. On which can be specified by various search criteria. Bythe other hand, Common Gateway Interface (CGI) applying statistic analysis on the user profile, a set ofcan be used to keep track of user access patterns and search criteria can be created to describe the queryrecord the information in databases. This approach goals to generate personalized hyperlinks that leadcan guarantee that all levels of information can be the user to the desired page.gathered from the user if login function and dynamic Information presentation mainly concerns how theconstruction of Web pages are provided, which is retrieved information is presented in the pages ofsuitable for application of electronic catalogs that is electronic catalogs. Since the web page itself is anormally constructed in nature. The additional in- kind of interface for the user to interact with theformation collected can be access sequence, time electronic catalog, a customized user interface canspent on each page, searching query and criteria, and provide an efficient and familiar working environ-the preference of web page layout. ment to the user, such as desktop customization in

In general, there are many different ways to Microsoft Windows. There are two ways that the

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 27

user can customize the information presentation in Engine retrieves the catalog information in thethe form of templates. The first is modularization of system database as well as the user profile andelements inside the page, and the system guides the returns the ranked and highlighted information. Atuser to customize the web page step by step by the same time the UGP Engine performs datachoosing the modules and the parameters. The other analysis on user profile and generates a set ofway is the user can construct the template themselves predicted goals. Next the result from IR Engine andwith HTML (Hyper Text Markup Language) and UGP Engine is sent to the PG Engine together withstandard marking. From the standard marking em- the User Presentation Profile from the User Profilebedded in the HTML, the system is able to present Manager and the PG Engine generates the resultingthe information according to the user specification. catalog web pages sent to the client browser. The

PEC System architecture and the process flow forelectronic catalog personalization are shown in Fig.

3 .2. System design and specification 6.

The PEC (Personalized Electronic Catalog) sys- 3 .2.1. User profile modelingtem consists of the System Database, the User Profile The user profile denotes the interests toward theManager, the Information Retrieval (IR) Engine, the information in the electronic catalog, which is classi-User Goal Prediction (UGP) Engine, and the Page fied into two parts, the catalog information profileGeneration (PG) Engine. In general, a user is and the presentation profile. The information struc-required to login before he/she can access the ture of the catalog is in the form of the attribute andelectronic catalog through various supporting func- value pair (A, n) that implies an estimation of usertions. When the system receives the request, the User preferences on certain kinds of information in theProfile Manager will first update the user profile in electronic catalog. Although the user may not con-the system database and the updated user profile is cern all the attributes of a particular record thatsent to the IR Engine and UGP Engine. The IR he/she interested in, in a long run, a set of attribute

Fig. 6. Architecture and process flow of PEC system.

28 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

templates for the generation of electronic catalogweb pages.

3 .2.2. System DatabaseThe System Database, as the central information

repository, mainly consists of three parts accordingto the data type, catalog information, user profile andsystem supporting information. There exists a certainset of attributes (basic catalog information) commonto all records in the electronic catalog, and theremaining attributes (additional catalog information)are either common to a subset of records or only fora particular record. The catalog is subject to modi-fication or new addition. The user profile is com-posed of the catalog information profile, the set ofattributes, value and preference degree, and userpresentation profile, a collection of templates. Thesystem supporting information includes the basic

Fig. 7. Structure of catalog information profile.user information, authentication information, a list ofavailable modules and parameters for the generation

and value pairs that the user is really interested in of a template, and other system information. Thecan be formed. Therefore, the attribute and value pair design of the System Database is depicted in Fig. 8.can be an estimator of user preferences toward theelectronic catalog information. The catalog infor- 3 .2.3. User Profile Managermation profileh(A, n, f )j is a collection of tuples of The main function of the User Profile Manager isan attribute (A), a value (n), and a preference degree to retrieve and update the user profile. The catalog( f ) to represent the degree of interest on user information profile and presentation profile are pro-preferences as shown in Fig. 7. The presentation cessed separately. In the process of updating the userprofile is defined as the information on how the profile as shown in Fig. 9, upon user request forinformation is presented in the electronic catalog catalog information, a set of search criteria in theweb pages. The information of the presentation form of attribute and value pairs, is forwarded to theprofile is based on user input and a collection of User Profile Manager. The manager checks each

Fig. 8. Design of System Database.

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 29

Fig. 9. Update process of Catalog Information Profile in User Profile Manager.

attribute and value pair in the searching criteria in and rank/highlight the result based on the userthe user profile. If the pair is found in the user profile. The record with the preference degree greaterprofile, the corresponding preference degree will be than a certain threshold as interest prediction isupdated; otherwise, a new pair will be added to the ranked at the top of the list or highlighted to catchuser profile. The pair with preference degree smaller the user attention. The process of personalization inthan a given value is removed from the user profile the IR Engine is shown in Fig. 10.due to the issues of management and performance.On the one hand, the user can interact with the User 3 .2.5. UGP EngineProfile Manager through a series of screen guidance The UGP Engine is used to generate a set ofto construct their template from the module pro- hyperlinks linking to the target pages as the goal ofvided. During the process, the user needs to choose the user. As the target pages are dynamically gener-the modules for display on the web page, in which ated web pages, the hyperlinks actually are repre-the customization approach of Bus Modularity [34] sented by the searching criteria. The searchingis adopted. The web page itself functions as a bus criteria for the predicted pages can be generated byand different modules are plugged into the bus. As statistically analyzing the preference degree of theeach module has a set of parameters, the user can attribute and value pairs in the user profile usingconfigure the parameters for each module being clustering algorithms [18,35]. The preference degreeselected, or keep the default value in order to achieve for these pairs in the user profile represents aboth flexibility and usability of the customization distinctive pattern compared to the remaining pairsprocess. The resulting template is presented to the as shown in Fig. 11 where thex-axis represents theuser for confirmation with the possibility of further preference degree and they-axis represents themodification. On the other hand, the User Profile attribute and value pairs. Therefore, by performingManager can also help the user to generate the the statistical analysis on the preference degree in thetemplate through HTML and standard marking. The user profile, a set of attributes and value pairs can beUser Profile Manager needs to validate and interpret found to predict the user target page. There are twothe document before the final generation. issues regarding the prediction of the user goal. The

first issue is the information used to generate the3 .2.4. IR Engine predicted links. In order to have better performance

The function of the IR Engine is to retrieve the and resource utilization, the non-significant infor-catalog information according to the search criteria, mation is filtered prior to the generation process. The

30 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

Fig. 10. Process flow of IR Engine for personalization.

other issue is about the number of pairs. If the set is sentation profile, and send the resulting pages to thetoo small, the target pages are likely to have too client browser. The user templates define the pre-many records; if the set is too large, it is likely that sentation of the web page, including the element intoo few records or no record qualified for the target the web page, style and location. Therefore, the PGpage. It is essential to determine the threshold for the Engine only needs to integrate the information fromnumber of pairs, but the number of pairs cannot the IR Engine and UGP Engine with the templates.really guarantee the number of records to be re- The integration process is illustrated in Fig. 12.turned. Consequently, it is suggested to performpre-checking of the number of return records in orderto avoid too many or too few records returned. 4 . System implementation and example

3 .2.6. PG Engine In the implementation of the PEC System, theThe function of the PG Engine is to generate the modular approach of implementation is adopted. In

web pages from the templates in the user pre- addition to the Basic Electronic Catalog Web Site,

Fig. 11. Preference degree in user profile.

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 31

with Microsoft Access or SQL Server. As all webpages are dynamically generated, ASP (Active ServerPage) is used to create the web pages as well as theremaining modules. Furthermore, ADO (ActiveXData Object) with ODBC (Open Database Connec-tivity) is used to link between the ASP and theSystem Database. The schematic diagram of thesystem is shown in Fig. 13.

Fig. 12. Web page generation in PG Engine.4 .1. System development

the modules include the System Database, User There are nine physical web page files in the webProfile Manager, IR Engine, UGP Engine, and PG site. The pages can be divided into four groups as inEngine. IIS (Internet Information Server) or PWS Fig. 14, the public, the catalog, the catalog mainte-(Personal Web Server) from Microsoft is used as the nance, and the user profile maintenance pages.web server. The System Database is implemented Amount the four groups of pages only the catalog

Fig. 13. Schematic diagram of PEC system.

Fig. 14. Structure of the PEC site.

32 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

pages need to be customized. When the user accesses accepting new attributes of records (Fig. 15(a)).the catalog page, if they have defined their pre- Therefore, for each item in the electronic catalog,sentation profile, the page is presented in styles as there is only one record in the B-Info, which isdefined in the template; otherwise the default page is mapping to multiple records in A-Info. However,displayed. Before accessing the system, the user first there is a unique key mapping A-Info to B-Info.needs to login or register. The main catalog page is Similar to the catalogue database, the user profiledisplayed and the user can access the catalog through database is also divided into two parts: theuserthe searching functions, create/update their catalog catalogue information profile and the user pre-in the catalog maintenance page, and create/updatesentation profile. In the user profile database, thethe presentation profile in the user profile mainte- structure of the catalog information profile is similarnance page. to the A-Info in the catalog database, including the

user identity, the attribute, the value, and the prefer-4 .1.1. System Database ence degree (Fig. 15(b)). For the user presentation

From the conceptual design, the system database profile, it is a collection of templates and parametersis divided into three parts: thecatalogue database, for the user and template information includes thethe user profile database, and thesystem-supporting user identity, the template, and the content. Thedatabase. In the catalog database, the basic catalog template field is referred to the page using thisinformation (B-Info) stores all the common infor- template, such as the main catalog page, the submation for all records in the electronic catalog and catalog page or the detail page in the electronicadditional catalog information (A-Info) is capable of catalog, while the content is the actual template

generated by the user.The system-supporting database, as shown in Fig.

15(c), consists of theuser information, moduleinformation, and othersystem-supporting informa-tion. Since the objective of implementing the systemis for demonstration purpose, only the simplest andnecessary user information is collected and stored inthe database. The user information table includes theuser name, user identity, and password; whereas themodule information is made of basic module tables,the parameter tables and the parameter list tables.These two tables are used in the template generationprocess by the special tag embedded in the moduleinformation. The module information can be repre-sented as a hierarchal structure of module-parameter-value. Therefore, the design can ensure multipleparameters for a module and multiple parametervalues for a parameter provided for users to select.For the database schema of other system supportingtables as well as the detail schema of the systemdatabase, please refer to Table 1.

4 .1.2. User Profile ManagerThe User Profile Manager is composed of the

catalog information profile and the presentationprofile. The main function of this part is to set anappropriate value for the preference degree. Theupdate of preference degree needs to capture the

Fig. 15. Structure of System Database. previous user behavior, while being sensitive enough

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 33

Table 1PEC system database schema

Table name Field name Description

AInfo Attribute InformationP ACode Attribute Code

AName Attribute NameAPInfo Additional Product InformationP PCode Product Code

ACode Attribute CodeAValue Attribute Value

BPInfo Basic Product InformationP PCode Product Code

Image Image of the ProductDPModule Default Presentation Module InformationP MID Module ID

MData Module DataDPPara Default Presentation Parameter InformationP PID Parameter ID

PData Parameter DataPModule Available Presentation Module InformationP MID Module ID

MData Module DataPPara Available Presentation Parameter InformationP PID Parameter ID

PData Parameter DataUPModule User Presentation Profile Module InformationP UserID User IDP MID Module ID

Mdata Module DataUPPara User Presentation Profile Parameter InformationP UserID User IDP PID Parameter ID

PData Parameter DataUCIProfile User Catalogue Information ProfileP UserID User IDP ACode Attribute CodeP AValue Attribute Value

PDegree Preference DegreeUserInfo User InformationP UserID User ID

Pwd PasswordUserName User Name

P, primary key.

to detect the shift of user interest. The following is wheref 9 is the pervious preference degree;R,defined for updating the preference degree constant;f, current preference degree after updating;

f , the initial preference degree for a new attributeo

and value pair being added, andm, a constant.The attribute and value pair (A, n) If f ,m, the corresponding record pair is removed

from the catalog information profile. The new prefer-In searching criteria Not in searchingence degreef is the sum of the previous preferencecriteriadegree and the adjusted value. The sensitivity toward

Not in user profile In user profile In user profile change in the preference is dependent on the valueR.Given a larger value ofR, the more sensitive thef 5 f f 5 f 91R f 5 f 92Ro system will be toward the shift in interest. However,

34 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

Fig. 16. Sample layout of catalog pages.

if the value is too large, the system will be too bar, keyword search engine interface, catalog searchsensitive towards the recent information, and will not engine interface, search result listing area and de-be useful for tracking the user preference. Further- tailed information area as shown in Fig. 16. Each ismore, the initial preference degreef , m and R associated with a set of configurable parameters aso

determine how long an un-accessed record will stay listed in Table 2. The user interface as the guidancein the user profile. The updating process of the for the template generation process in User Profilepreference degree is implemented as an independent Manager is presented as a series of web pages. Thecomponent for any plug-in update methods of prefer- user first selects the catalog page to be customizedence degree. The three constants are implemented as and chooses the modules in the page. Each module issystem environmental variables in order to provide required to configure parameters and then a previewan easy way for configuration in the updating of the resulting template is provided. The process ofprocess. template generation for user presentation profile is

The User Profile Manager also functions as an shown in Fig. 17.interface to guide the user in module selection andconfiguration process of template generation. Thereare three types of catalog pages, namely the main 4 .1.3. IR Enginecatalog page, the sub catalog page, and the detail The major functions of the IR Engine are topage. The page content can be classified into tool retrieve information based on searching criteria and

Table 2Complete list of available modules and corresponding parameter set

Module Parameter list

Page Background colorPage heading Font Font size Font color

Background PositionTools bar Font Font size Item name

Background color PositionLower menu bar Font Font size Background color

PositionKeyword search engine Background color PositionCatalog search engine Background color No. of column PositionSearch result listing area Font Font size Highlight font

Highlight color No. of record per row No. of row of recordPosition

Detail information area Font Font size Font colorBackground Image size Position

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 35

Fig. 17. Template generation process in User Profile Manager.

rank/highlight the result based on the catalog in- 4 .1.5. PG Engineformation profile. Together with the information The PG Engine generates the web pages byretrieved in the first step, the corresponding aggre- integrating the catalog information from the IRgated preference degree of each result record is Engine, hyperlinks from the UGP Engine, and thecalculated as well. The comparison between the template from the user presentation profile infor-aggregated preference degree and a given configur- mation. Actually, the template is made up of HTMLable threshold value is carried out to determine as well as special tags. During the integration, thehighlighting. After the ranking and highlighting PG Engine template reader parses the template lineprocess, the information in predefined format is sent by line for special tags and interprets the remainingto the PG Engine as in Fig. 18. line. Otherwise the line is sent to the web page

buffer first and the interpreter determines plug-in4 .1.4. UGP engine information and retrieves the information from the

The UGP Engine is used to generate a set of IR Engine and the UGP Engine. The process ishyperlinks as user goal, based on the catalog in- shown in Fig. 20.formation profile. A simple clustering algorithm, theleader algorithm [18,35], is employed in the profile 4 .2. HKTAIGA.COM—an industrial exampleanalysis to generate the predicted user goals. Al-though the leader algorithm is dependent on the The HKTAIGA.COM [36] originates from Elec-result of the vector order and the unbound distance tronic Commerce Front End for Hong Kong Apparelbetween the cluster vector and cluster medium, the Industry Community (EC) Project, an ITF projectalgorithm is fast and memory efficient in only one funded by the HKSAR Government. The objectivepass, which can guarantee a quick response and low of the EC project is to build up the informationmemory usage for each user session. The modular technology and electronic commerce infrastructureapproach of implementation enables plug-in func- for the apparel industry in Hong Kong. Intions for new algorithms. The process flow is shown HKTAIGA.COM, the user is able to search forin Fig. 19. accessory information (such as button, zipper and

Fig. 18. Process flow in IR Engine.

36 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

Fig. 19. Process flow of UGP Engine.

shoulder pad) and fabric information (such as In the scenario, the user is a merchandiser of awoven). In addition, the user can also create/update garment manufacturer who searches for various typeshis /her own company Web page and product catalog. of buttons and buckles, especially those made ofIn December 2000, there are about 1000 members plastic. As the merchandiser always searches for theand 7000 products in about 20 categories in the plastic button and buckle, the PEC system willHKTAIGA catalogue. In the following, a scenario is recognize the preference that this user likes theused to illustrate the comparison between PEC product type button and buckle, and plastic material.system and HKTAIGA.COM, where PEC system is When the merchandiser searches for the plastican electronic catalogue capable of personalizing button again in the PEC system, he/she only needsitself and HKTAIGA.COM is an ordinary electronic to click on the button category in the main catalogcatalogue. The minimum number of pages accessed with all the plastic buttons being displayed on the(Min NPA) from the catalog root is used as the first page, and the system can recognize the mer-comparison variable. chandiser’s preference to rank the plastic button at

Fig. 20. Process flow of PG Engine.

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 37

Fig. 21. Scenario comparison—search for plastic button.

38 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

Fig. 22. Scenario comparison—search for new plastic button and buckle.

B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40 39

the top of the list. On the other hand, if the identity for each user and storing it in the usermerchandiser chooses to use HKTAIGA.COM, he/ computer (e.g. as cookies) as the profile.she needs to specify searching criteria for plastic 2. The user profile can be used in various ways.material after clicking on the button category. Fig. 21 One example is to ‘push’ the preferred in-shows that the Min NPA is 1 and 2 for PEC System formation to the user for on-line response. Theand HKTAIGA.COM, respectively. system can be more proactive to send the user

When the merchandiser searches for all the plastic the updated information in real-time.buttons and buckles, the PEC user only needs to 3. The user profile can be used for target market-access the shortcut page any time during navigation, ing as well. As we know about the user interest,which is generated based on the input information to we can target only those users that are interestedform a cluster and the corresponding Min NPA is in the corresponding promotion. The privacyonly 2. On the other hand, the HKTAIGA.COM user issue on user information collection is importantneeds to click either button or buckle category first, as well.specify the searching criteria, go back to the root 4. The user profile and management of personali-page, access the other category, and specify the zation can be further elaborated. The impactsearching criteria again. In this case, The Min NPA is study of the personalization and user profile onequal to 5 (Fig. 22). In the case of a keyword search, marketing is also worth trying.the PEC system will rank the resulting informationaccording to user preference, while theHKTAIGA.COM user needs to go through all the

A cknowledgementsreturned information as usual. The above examplesillustrate two more important aspects of the PEC

We thank the anonymous reviewers for the com-system. The first is the continuously updated userments. It is a pleasure to thank the members ofprofile, which ensures that personalization canHKTAIGA project and EC Front-End for Hongevolve according to the shift of the user interest. TheKong Apparel Industry Community project, spon-second is the customizable catalogue interfaces thatsored by Hong Kong ITDC (under grant AF/114/96provide a personal and user-friendly working inter-and AF/13/98), for implementing part of the systemface.as in the case study. This research was funded in partby Taran Eastman Publishing Ltd. (under grant TEIL99/00.EG01).5 . Conclusions

Information overloading and access performanceon the Internet are critical in most EC applications. R eferencesIn this paper, we focus on the issues of the customi-zation for electronic catalogs from the aspects of [1] Amazon.com (http: / /www.amazon.com)information organization, retrieval and presentation. [2] Yahoo! Shopping (http: / /shopping.yahoo.com)

[3] eBay (http: / /www.ebay.com)A system, called Personalized E-Catalog (PEC), is[4] Yahoo! Auctions (http: / /auctions.yahoo.com)proposed for customization in electronic catalogs on[5] NUA Internet Surveys, Catalog Shopping Beats Onlinethe Web. The system design and implementation is

Shopping (http: / /www.nua.ie /surveys/?f5VS&art id5]presented with an industrial example. The future 905355366&rel5 true)

directions can be addressed under four aspects: [6] M. Ginsburg, J. Gebauer, A. Segev, Multi-vendor electroniccatalogs to support procurement: Current practice and futuredirections, in: Global Networked Organization 25th Interna-1. The logon process can be more flexible in ordertional Bled Electronic Commerce Conference, 1999.to accommodate a larger domain of applications

[7] M. Garcia Gosalvez, Electronic product catalog: What iswhere the user does not need to logon before missing?, Electronic Market 7 (3) (1997) 3–5.they can access any information in the catalog. [8] N. Koch, A. Turk, Towards a methodical development ofThis can be done by generating a unique electronic catalogs, Electronic Market 7 (3) (1997) 28–31.

40 B.P.-C. Yen, R.C.W. Kong / Electronic Commerce Research and Applications 1 (2002) 20–40

[9] S. Handschuh, B.F. Schmid, K. Stanoevska-Slabeva, The [23] C.H. Chang, C.C. Hun, C.L. Hou, Exploiting hyperlinks forconcept of mediating electronic product catalog, Electronic automatic information discovery on the WWW, in: Proceed-Market 7 (3) (1997) 32–35. ings of the Tenth IEEE International Conference on Tools

[10] J.W. Palmer, Retailing on the WWW: The use of electronic with Artificial Intelligence, 1998, pp. 156–163.product catalog, Electronic Market 7 (3) (1997) 6–9. [24] C.C. Yang, J. Yen, H. Chen, Intelligent internet searching

[11] K. Stanoevska-Slabeva, B. Schmid, Internet electronic prod- agent based on hybrid simulated annealing, Decision Supportuct catalogs: an approach beyond simple keywords and Systems 28 (2000) 269–277.multimedia, Computer Networks 32 (2000) 701–715. [25] H.C. Tu, J. Hsiang, An architecture and category knowledge

[12] A. Segev, D. Wan, C. Beam, Designing Electronic Catalogs for intelligent information retrieval agents, Journal of Deci-for Business Value: Results of the CommerceNet Pilot, sion Support Systems 28 (2000) 255–268.CITM Working Paper CITM-WP-1005, 1995. (http: / /haas. [26] T. Joachims, D. Freitag, T. Mitchell, WebWatcher: A tourberkeley.edu/citm/publications/papers/wp-1005.pdf) guide for the World Wide Web, in: Proceedings of IJCAI-97,

[13] A.N. Keller, M.R. Genesereth, Using Infomaster to create a Nagoya, Japan, 1997, pp. 770–775.housewares virtual catalog, International Journal of Elec- [27] H. Liaberman, Letizia: an agent that assists Web browsing,tronic Commerce and Business Media 7 (4) (1997) 41–44. in: Proceedings of the Fourteenth International Joint Confer-

[14] M. Perkowitz, O. Etzioni, Adaptive Web sites: an AI ence on Artificial Intelligence (IJCAI-95), 1995, pp. 924–challenge, in: Proceedings of the Fifteenth International Joint 929, Vol. 1.Conference on Artificial Intelligence, 1997, pp. 16–21, Vol. [28] H. Berghel, D. Berleant, T. Foy, M. McGuire, Cyberbrows-1. ing: information customization on the Web, Journal of the

[15] M. Perkowitz, O. Etzioni, Adaptive Web sites: automatically American Society for Information Science 50 (6) (1999)synthesizing Web pages, in: Proceedings of AAAI98, 1998, 505–513.pp. 727–732. [29] I.-Y. Lin, X.M. Huang, M.S. Chen, Capturing user access

[16] M. Perkowitz, O. Etzioni, Adaptive Web sites: Conceptual patterns in the Web for data mining, in: Proceedings of thecluster mining, in: Proceedings of the Sixteenth International 11th International Conference on Tools with Artificial In-Joint Conference on Artificial Intelligence, 1999, pp. 264– telligence, 1999, pp. 345–348.269, Vol. 2. [30] O. Richardson, Gathering accurate client information from

[17] M. Perkowitz, O. Etzioni, Towards adaptive Web sites: World Wide Web sites, Interacting with Computers 12 (6)conceptual framework and case study, Artificial Intelligence (2000) 615–622.118 (1–2) (2000) 245–275. [31] The Concise Columbia Electronic Encyclopaedia, 3rd Edi-

[18] T.W. Yan, M. Jacobsen, H. Garcia-Molina, U. Dayal, From tion (http: / /www.encyclopedia.com/articles/02414.html)user access patterns to dynamic hypertext linking, Computer [32] WebTrends Log Analyzer (http: / /www.webtrends.com/)Networks and ISDN Systems 28 (7–11) (1996) 1007–1014. [33] I.Y. Lin, X.M. Huang, M.S. Chen, Capturing user access

[19] R.R. Sarukkai, Link prediction and path analysis using patterns in the Web for data mining. Proceedings 11thMarkov chains, Computer Network 33 (2000) 377–386. International Conference on Tools with Artificial Intelli-

[20] Z. Wang, C.K. Siew, X. Yi, A new personalized filtering gence, pp. 345–348, 1999.model in Internet Commerce, in: Proceedings of SSGRR [34] B.J. Pine, Mass customization: A new frontier in business(Scuola Superiore G. Reiss Romoli), Rome, Italy, 2000. competition, Harvard Business School Press, Boston, MA,

[21] D.W. Cheung, B. Kao, J. Lee, Discovering user access 1993.patterns on the World Wide Web, Knowledge-Based Systems [35] A.J. Hartigan, Clustering algorithms, Wiley, New York,10 (7) (1998) 463–470. 1975.

[22] P.M. Chen, F.C. Kuo, An information retrieval system based [36] HKTAIGA.COM (http: / /www.hktaiga.com)on user profile, Journal of System and Software 54 (2000)3–8.