Semantic community Web portals

19
Computer Networks 33 (2000) 473–491 www.elsevier.com/locate/comnet Semantic community Web portals S. Staab a,b,L,2 , J. Angele b,2 , S. Decker a,b,2 , M. Erdmann a,1 , A. Hotho a,1 , A. Maedche a,1 , H.-P. Schnurr a,b,2 , R. Studer a,b,2 , Y. Sure a,1 a Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germany b Ontoprise GmbH, Hermann-Loens-Weg 19, 76275 Ettlingen, Germany Abstract Community Web portals serve as portals for the information needs of particular communities on the Web. We here discuss how a comprehensive and flexible strategy for building and maintaining a high-value community Web portal has been conceived and implemented. The strategy includes collaborative information provisioning by the community members. It is based on an ontology as a semantic backbone for accessing information on the portal, for contributing information, as well as for developing and maintaining the portal. We have also implemented a set of ontology-based tools that have facilitated the construction of our show case — the community Web portal of the knowledge acquisition community. 2000 Published by Elsevier Science B.V. All rights reserved. Keywords: Web portal; Collaboration; Ontology; Web site management; Information integration 1. Introduction One of the major strengths of the Web is that virtually everyone who owns a computer may con- tribute high-value information — the real challenge is to make valuable information be found. Obviously, this challenge cannot only be achieved by centralized services, since the coverage of even the most pow- erful crawling and indexing machines has shrunk in the last few years in terms of percentage of the num- ber of Web pages available on the Web. This means that a proper solution to this dilemma should rather be sought along a principle paradigm of the World Wide Web, viz. self-organization. Self-organization does not necessarily mean auto- matic, machine-driven organization. Rather, from the L Corresponding author. 1 http://www.aifb.uni-karlsruhe.de/WBS/ 2 http://www.ontoprise.de very beginning communities of interest have formed on the Web that covered what they deemed to be of interest to their group of users in, what we here call, community Web portals. Community Web por- tals are similar to Yahoo and its likes by their goal of presenting a structured view onto the Web; how- ever, they are dissimilar by the way knowledge is provided in a collaborative process with only few resources (manpower, money) for maintaining and editing the portal. Another major distinction is that community Web portals count in the millions, since a large percentage, if not the majority, of Web or intranet sites is not maintained by a central depart- ment, but rather by a community of users. Strangely enough, technology for supporting communities of interest has not quite kept up with the complexity of the tasks of managing community Web portals. A few years ago such a community of interest would have comparatively few sources of information to consider. Hence, the overall complexity of managing 1389-1286/00/$ – see front matter 2000 Published by Elsevier Science B.V. All rights reserved. PII:S1389-1286(00)00039-6

Transcript of Semantic community Web portals

Computer Networks 33 (2000) 473–491www.elsevier.com/locate/comnet

Semantic community Web portals

S. Staab a,b,Ł,2, J. Angele b,2, S. Decker a,b,2, M. Erdmann a,1, A. Hotho a,1, A. Maedche a,1,H.-P. Schnurr a,b,2, R. Studer a,b,2, Y. Sure a,1

a Institute AIFB, University of Karlsruhe, 76128 Karlsruhe, Germanyb Ontoprise GmbH, Hermann-Loens-Weg 19, 76275 Ettlingen, Germany

Abstract

Community Web portals serve as portals for the information needs of particular communities on the Web. We herediscuss how a comprehensive and flexible strategy for building and maintaining a high-value community Web portalhas been conceived and implemented. The strategy includes collaborative information provisioning by the communitymembers. It is based on an ontology as a semantic backbone for accessing information on the portal, for contributinginformation, as well as for developing and maintaining the portal. We have also implemented a set of ontology-basedtools that have facilitated the construction of our show case — the community Web portal of the knowledge acquisitioncommunity. 2000 Published by Elsevier Science B.V. All rights reserved.

Keywords: Web portal; Collaboration; Ontology; Web site management; Information integration

1. Introduction

One of the major strengths of the Web is thatvirtually everyone who owns a computer may con-tribute high-value information — the real challengeis to make valuable information be found. Obviously,this challenge cannot only be achieved by centralizedservices, since the coverage of even the most pow-erful crawling and indexing machines has shrunk inthe last few years in terms of percentage of the num-ber of Web pages available on the Web. This meansthat a proper solution to this dilemma should ratherbe sought along a principle paradigm of the WorldWide Web, viz. self-organization.

Self-organization does not necessarily mean auto-matic, machine-driven organization. Rather, from the

Ł Corresponding author.1 http://www.aifb.uni-karlsruhe.de/WBS/2 http://www.ontoprise.de

very beginning communities of interest have formedon the Web that covered what they deemed to beof interest to their group of users in, what we herecall, community Web portals. Community Web por-tals are similar to Yahoo and its likes by their goalof presenting a structured view onto the Web; how-ever, they are dissimilar by the way knowledge isprovided in a collaborative process with only fewresources (manpower, money) for maintaining andediting the portal. Another major distinction is thatcommunity Web portals count in the millions, sincea large percentage, if not the majority, of Web orintranet sites is not maintained by a central depart-ment, but rather by a community of users. Strangelyenough, technology for supporting communities ofinterest has not quite kept up with the complexityof the tasks of managing community Web portals. Afew years ago such a community of interest wouldhave comparatively few sources of information toconsider. Hence, the overall complexity of managing

1389-1286/00/$ – see front matter 2000 Published by Elsevier Science B.V. All rights reserved.PII: S 1 3 8 9 - 1 2 8 6 ( 0 0 ) 0 0 0 3 9 - 6

474 S. Staab et al. / Computer Networks 33 (2000) 473–491

this task was low. Now, with so many more peopleparticipating a community portal of only modest sizemay easily reach the point where it appears to bemore of a jungle of interest rather than a convenientportal to start from.

This problem gave us reason to reconsider the tech-niques for managing community Web portals. Weobserved that a successful Web portal would weaveloose pieces of information into a coherent presen-tation adequate for sharing knowledge with the user.On the conceptual, knowledge-sharing, level we havefound that Davenport and Prusak’s maxime [6], “peo-ple can’t share knowledge if they don’t speak a com-mon language”, is utterly crucial for the case of com-munity Web portals. The only difference with Daven-port and Prusak’s thoughts derives from the fact thatknowledge need not only be shared between people,but also between people and machines.

At this point, ontologies and intelligent reasoningcome in as key technologies that allow knowledgesharing at a conceptually concise and elaborate level.These AI techniques support core concerns of the‘Semantic Web’ (cf. [3]). In this view, informationon the Web is not restricted to HTML only, butinformation may also be formal and, thus, machineunderstandable. The combination may be accountedfor by an explicit model of knowledge structuresin an ontology. The ontology formally representscommon knowledge and interests that people sharewithin their community. It is used to support themajor tasks of a portal, viz. accessing the portalthrough manifold, dynamic, conceptually plausibleviews on the information of interest in a particularcommunity, and providing information in a numberof ways that reflect different types of informationresources held by the individuals.

Following these principles when putting the se-mantic community Web portal into practice, incurs arange of subtasks that must be accounted for and re-quires a set of tools that support the accomplishmentof these tasks. In Section 2 we discuss requirementsthat we have derived from a particular applicationscenario, the KA2 portal, that also serves as ourtestbed for development. Section 3 describes how on-tologies are created and used for structuring informa-tion and, thus, appears as the conceptual cornerstoneof our community Web portal. We proceed with theactual application of ontologies for the purposes of

accessing the KA2 portal by navigating and queryingexplicit and implicit information through concep-tual views on and rules in the ontology (Section 4).Section 5 covers the information-provisioning partfor the community Web portal considering problemslike information gathering and integration. Then, wedescribe the engineering process for our approach(Section 6) and present the overall architecture ofour system (Section 7). Before we conclude with atie-up of experiences and further work, we compareour work with related approaches (Section 8).

2. Requirements for a community Web portal —the KA2 example

Examples for community Web portals abound.In fact, one finds portals that very well succeedregarding some of the requirements we describein this section. For instance, MathNet 3 introducesknowledge sharing through a database relying onDublin Core metadata. Another example, RiboWeb[1], offers means to navigate a knowledge base aboutribosomes. However, these approaches lack an inte-grated concept covering all phases of a communityWeb portal, viz. information accessing, informationproviding, and portal development and maintenance.We pursue a system that goes beyond isolated com-ponents towards a comprehensive solution for man-aging community Web portals.

2.1. Portal access by users

Navigating through a community Web portal thatis unknown is a rather difficult task in general. Infor-mation retrieval may facilitate the finding of piecesof texts, but its usage is not sufficient in order toprovide novice users with the right means for explor-ing unknown terrain. This turns out to be a problemparticularly when the user does not know muchabout the domain and does not know what terms tosearch for. In such cases it is usually more helpfulfor the user to explore the portal by browsing, giventhat the portal is well and comprehensively struc-tured. Simple tree-structured portals may be easyto maintain, but the chance is extremely high that

3 http://www.math-net.de/

S. Staab et al. / Computer Networks 33 (2000) 473–491 475

an inexperienced user looking for information getsstuck in a dead-end road. Here, we must face thetrade-off between resources used for structuring theportal (money, man-power) and the extent to whicha comprehensive navigation structure may be pro-vided. Since information in the community portalwill be continually amended by users, richly inter-related presentation of information would usuallyrequire extensive editing, such as is done, e.g., forYahoo. In contrast, most community Web portals re-quire that comprehensive structuring of informationfor presentation comes virtually for free.

There has been interesting research (e.g. [13] or[15]) that demonstrates that authoring, as well as read-ing and understanding of Web sites, profits from con-ceptual models underlying document structures ‘inthe large’, i.e. the interlinking between documents, aswell as document structures ‘in the small’, i.e., thecontents of a particular document. Rather naturally,once a common conceptual model for the communityexists and is made explicit, it is easier for the indi-vidual to access a particular site. Hence, in additionto rich interlinking between document structures ‘inthe large’, comprehensive surveys and indices of con-tents and a large number of different views onto thecontents of the portal, we require that the semanticstructure of the portal is made explicit at some point.The following sections will show that this stipulationdoes not raise a conflict with other requirements, butthat it fits well with the requirements that arise fromprovisioning of information and maintenance of theportal. Section 4 will elaborate on how such a con-ceptual level is exploited for a complex Web site withextensive browsing and querying capabilities.

2.2. Information provisioning through communitymembers

An essential feature of a community Web portalis the contribution of information from all (or at leastmany) members of the community. Though theyshare some common understanding of their commu-nity, the information they want to contribute comesin many different (legacy) formats. Still, presenta-tions of and queries for information contents mustbe allowed in many ways that need to be rather in-dependent from the way by which that informationwas provided originally. The Web portal must remain

adaptable to the information sources contributed byits members — and not vice versa. This requirementprecludes the application of database-oriented ap-proaches (e.g., [21]), since they presume that a uni-form mode of storage exists that allows for the struc-turing of information at a particular conceptual level,such as a relational database scheme. In real-worldsettings, one must neither assume that a uniformmode for information storage exists nor that onlyone particular conceptual level is adequate for struc-turing information of a particular community. In fact,even more sophisticated approaches such as XML-based techniques that separate content from layoutand allow for multiple modes of presentation appearinsufficient, because their underlying transformationmechanisms (e.g., XSLT or XQL [9,24]) are tooinconvenient for integration and presentation of vari-ous formats at different conceptual levels. The reasonis that they do not provide the semantic underpinningrequired for proper integration of information.

In order to integrate diverse information, we re-quire another layer besides the common distinc-tion into document content and layout, viz. explicitknowledge structures that may structure all the in-formation in different formats for a community atvarious levels of granularity. Different informationformats need to be captured and related to the com-mon ontology:(1) several types of metadata such as available on

Web pages (e.g., HTML META-tags),(2) manual provision of data to the knowledge base,

and(3) a range of different wrappers that encapsu-

late structured and semi-structured informationsources (e.g., databases or HTML documents).

Section 5 will address exactly these issues. Thequestion now remains as to how this kind of semanticcommunity Web portal is put into practice.

2.3. Development and maintenance

A community Web portal as we have stipulatedconstitutes a complex system. Hence, the developersand editors will need comprehensive tool supportfor presenting contents through views and links,and for maintaining consistency in the system, aswell as guidelines that describe the procedures foractually building such a portal. Indeed, some of our

476 S. Staab et al. / Computer Networks 33 (2000) 473–491

first experiences with the example portal describedin Section 2.4 was that even users who were wellacquainted with all the principles hated to acquiredetailed, technical knowledge in order to provideinformation or maintain ‘their’ information in theportal. Thus, we need a comprehensive concept thatintegrates tools and methods for building the portal,capturing information, and presenting its contentsto the community. While some of the tools will betouched upon in subsequent sections, developmentand maintenance issues in general will be dealt within Section 6.

2.4. The example

The example that we draw from in the rest ofthis paper is the portal for the Knowledge Anno-tation initiative of the Knowledge Acquisition com-munity (KA2; cf. [2]). The KA2 initiative has beenconceived for semantic knowledge retrieval fromthe Web building on knowledge created in the KAcommunity. To structure knowledge, an ontologyhas been built in an international collaboration ofresearchers. The ontology constitutes the basis toannotate World Wide Web documents of the knowl-edge acquisition community in order to enable intel-ligent access to these documents and to infer implicitknowledge from explicitly stated facts and rules fromthe ontology. Though KA2 has provided much of thebackground knowledge we now want to exploit, itlacked much of the ease for accessing and providingcommunity knowledge that we aim at with the KA2community Web portal.

Given this basic scenario, which may be easilytransferred towards other settings for communityWeb portals, we have investigated the techniquesand built the tools that we describe in the rest ofthis paper. Nevertheless the reader may note that wehave not yet achieved a complete integration of alltools and neither have we exploited all our technicalcapabilities in our up and running demonstrationportal (http://ka2portal.aifb.uni-karlsruhe.de).

3. Structuring the community Web

Let us now summarize the principal stipulationswe have found so far. We need:

ž a conceptual structure for presenting informationto the user,ž support for integrating information from different

granularities stored in various formats,ž comprehensive tool support for providing infor-

mation, developing and maintaining the portalž a methodology for implementing the portal.

In particular, we need an explicit structuringmechanism that pervades the portal and reaches fromdevelopment and maintenance, over provisioning topresentation of information. For this purpose, weuse an ontology as the conceptual backbone of ourcommunity Web portal.

3.1. The role of ontologies

An ontology is an explicit specification of a sharedconceptualization [14]. Their usefulness for informa-tion presentation (e.g. [13]), information integration(e.g. [30]) and system development (e.g. [2]) hasbeen demonstrated recently. We introduce anotherapplication area, viz. the use of ontologies for in-tranet management or, more specific in our example,for community portal management. The role of on-tologies is the capturing of domain knowledge ina generic way and the provision of a commonlyagreed understanding of a domain, which may bereused and shared within communities or applica-tions. Though only few communities have explicitlymodeled their knowledge structures yet (examplesare [1,26]), practically all share a common under-standing of their particular domain.

Hence, our strategy is to use an ontology as abackbone of our community Web portal. In fact,we even allow the usage of multiple views that re-flect diverging standards of understanding and usageof terminology in different subcommunities or fordifferent groups of users (e.g., novice vs. expert).Rules may then be used to translate between differ-ent views such that one may view the informationcontributed from another subcommunity.

3.2. Modeling

The KA2 ontology consists of (i) concepts defin-ing and structuring important terms, (ii) their at-tributes specifying properties and relations, and (iii)rules allowing inferences and the generation of new

S. Staab et al. / Computer Networks 33 (2000) 473–491 477

Fig. 1. Part of the KA2 ontology in an OntoEdit screenshot.

knowledge. Our representation language for definingthe ontology is F-Logic [16], which provides ade-quate modeling primitives integrated into a logicalframework.

To illustrate the structure of the ontology, thescreenshot in Fig. 1 depicts part of the KA2 on-tology as it is seen in the ontology developmentenvironment OntoEdit (cf. Section 6.2). The leftmostwindow shows the is-a-relationship that structuresthe concepts of the domain in a (possibly multiple)taxonomy. Attributes and relations of concepts areinherited by subconcepts. Multiple inheritance is al-lowed as a concept may fit into different branchesof the taxonomy. In Fig. 2, attributes and relationsof the concept Researcher appear in the middlewindow. Some of these attributes, like FIRSTNAME

and LASTNAME are inherited from the superordinateconcept Person. Relations refer to other concepts,like WORKSATPROJECT denoting a relation betweenResearcher and Project.

We use the KA2 ontology to manage and struc-ture the community portal. The structure of concepts

with its attributes and relations supports user navi-gation through the domain as detailed in Section 4.Beyond simple structuring we model rules allowinginferencing and by that means the generation of newknowledge. The rightmost window in Fig. 1 showssome rules of our KA2 ontology. One of them, theProjectCooperation rule, describes the cooperationof two researchers working in the same project.The rule states that two researchers cooperate, if aResearcher X works at a Project Proj and if a Re-searcher Y works at the same Project Proj and X isanother person than Y . The rule is formulated in anF-Logic representation:

(1) FORALL X , Y , ProjX :Researcher

[COOPERATESWITH!! Y :Researcher] X :Researcher

[WORKSATPROJEKT!! Proj:Project]AND Y :Researcher

[WORKSATPROJEKT!! Proj:Project]AND NOT equal(X , Y ).

478 S. Staab et al. / Computer Networks 33 (2000) 473–491

Fig. 2. Accessing the community Web portal.

With this rule, we may infer that researcher Xcooperates with researcher Y due to the fact that theywork in the same project, even, if there is no explicitfact that they cooperate.

4. Accessing the community Web portal

A major requirement from Section 2 has beenthat navigation and querying of the community Webportal need to be conceptually founded, because onlythen a structured, richly interwoven presentation maybe compiled on the fly. In fact, we elaborate in thissection how a semantic underpinning, like the KA2ontology described above, lets us define a multitudeof views that dynamically arrange information. Thus,our system may provide the rich interlinking that ismost adequate for the individual user and her nav-

igation and querying of the community Web portalthat we have aimed at in the beginning. We startwith a description of the query capabilities of ourrepresentation framework. The framework builds onthe very same F-Logic mechanism for querying asit did for ontology representation and, thus, it mayalso exploit the ontological background knowledge.Through this semantic level we achieve the inde-pendence from the original, syntactically proprietaryinformation sources that we stipulated earlier. Never-theless, F-Logic is as poorly suited for presentationto naive users as any other query language. Hence,its use is mostly disguised in various easy-to-usemechanisms that more properly serve the needs ofthe common user (cf. Section 4.2), while it still givesthe editor all the power of the principal F-Logicrepresentation and query capabilities. Finally in thissection, we touch upon some very mission-critical

S. Staab et al. / Computer Networks 33 (2000) 473–491 479

issues of the actual inference engine that answersqueries and derives new facts by combining factswith structures and rules from the ontology.

4.1. Query capabilities

Though information may be provided in a num-ber of different formats our underlying language forrepresentation and querying is F-Logic. For instance,using a concrete example from our showcase thefollowing query asks for all publications of the re-searcher ‘Studer’.

(2) FORALL Pub EXISTS ResID ResID:Researcher

[LASTNAME!! "Studer";PUBLICATION!! Pub].

The substitutions for the variable Pub consti-tute the publications queried by this expression. Theexpressiveness and usability of such queries is im-proved by the possibility to use a simple form of in-formation retrieval using regular expressions withinqueries. For instance the following query asks forabstracts that contain the word ‘portal’:

(3) FORALL Abstr EXISTS Pub, X

Pub:Publication [ABSTRACT!! Abstr] ANDregexp (“[p|P]ortal”, Abstr, X).

The substitutions for the variable Abstr are the ab-stracts of publications which contain the word ‘por-tal’. In addition, the query capabilities allow to makeimplicit information explicit. They use the back-ground knowledge expressed in the KA2 ontology in-cluding rules as introduced in Section 3.2. If we havea look at Web pages about research projects, informa-tion about the researchers (e.g. their names, their af-filiation, ...) involved in the projects is often explicitlystated in HTML. However, the fact that researcherswho are working together in projects are cooperatingis not explicitly given. A question might be: “Whichresearchers are cooperating with other researchers?”Querying for cooperating researchers, the implicit in-formation about project cooperation of researchers isexploited. The query may be formulated by:

(4) FORALL ResID1, ResID2 ResID1:Researcher

[COOPERATESWITH!! ResID2].

The result set includes explicit information abouta researchers cooperation relationships, which arestored in the knowledge warehouse, and also implicitinformation about project cooperation between re-searchers derived using the project-cooperation rule,which is modeled in the ontology.

4.2. Navigating and querying the portal

Usually, it is too inconvenient for users to querythe portal using F-Logic. Therefore we offer a rangeof techniques that allow for navigating and queryingthe community Web:ž A hypertext link may contain a query which is dy-

namically evaluated when one clicks on the link.Browsing is made possible through the definitionof views onto top-level concepts of the KA2 on-tology, such as Persons, Projects, Organizations,Publications, Research Topics and Events. Eachof these topics can be browsed using predefinedviews. For example, a click on the Projects hyper-link results in a query for all projects known atthe portal. The query is evaluated and the resultsare presented to the user in a table.ž A choice of concepts, instances, or combinations

of both may be issued to the user in HTMLforms. Choice options may be selected throughcheck boxes, selection lists or radio buttons. Forinstance, clicking on the Projects link (cf. up-per part of Fig. 2) an F-Logic query is evaluatedand all projects contained in the portal are re-trieved. The results can be restricted using topic-specific attributes contained in the KA2 ontologyfor projects, such as topics of a project, peopleinvolved etc. The selection list (e.g. for all peo-ple involved in projects) is generated dynamicallyfrom the information contained in the knowledgewarehouse (cf. Section 5.4). Using the form’scontents a query may be compiled and evaluated.ž A query may also be generated by using the

hyperbolic view interface (cf. Fig. 3). The hyper-bolic view visualizes the ontology as a hierarchyof concepts. The presentation is based on hy-perbolic geometry (cf. [18]) where nodes in thecenter are depicted with a large circle, whereasnodes at the border of the surrounding circle areonly marked with a small circle. This visualiza-tion technique allows a survey over all concepts, a

480 S. Staab et al. / Computer Networks 33 (2000) 473–491

Fig. 3. Hyperbolic view interface.

quick navigation to nodes far away from the cen-ter, as well as a closer examination of nodes andtheir vicinity. When a user selects a node from thehyperbolic view, a form is presented which allowsthe user to select attributes or to insert values forthe attributes. An example is shown in Fig. 4.The user is searching for the community mem-ber ‘Studer’ and his photo. Based on the selectednode and the corresponding attributes, a query iscompiled. The query result is shown in the rightpart of Fig. 2.ž Furthermore, queries created by the hyperbolic

view interface may be stored using the personal-ization feature. Queries are personalized for thedifferent users and are available for the user in aselection list. The stored queries can be consid-ered as semantic bookmarks. By selecting a pre-viously created bookmark, the underlying query

is evaluated and the updated results are presentedto the user. By this way, every user may create apersonalized view on the portal.ž Finally, we offer an expert mode. The most tech-

nical (but also most powerful and flexible) wayfor querying the portal requires that F-Logic istyped in by the user. This way is only appropriatefor users who are very familiar with F-Logic andthe KA2 ontology.

4.3. The inference engine

The inference engine answers queries and it per-forms derivations of new knowledge by an intelligentcombination of facts with an ontology denoted inF-Logic like the examples described above. Whilethe expressiveness of F-Logic and its Java-poweredrealization in our inference engine constitute two

S. Staab et al. / Computer Networks 33 (2000) 473–491 481

Fig. 4. OntoPad — providing semantics in HTML documents using HTML-A.

major arguments for using it in a semantic commu-nity Web portal, wide acceptance of a service likethis also depends on prima facie unexciting featureslike speed of service. The principal problem we en-counter here is that there exist worst case situations(not always recognizable as such by the user) wherea very large set of facts must be derived by the in-ference engine in order to solve a particular query.While we cannot guarantee for extremely fast re-sponse times, unless we drastically cut back on theexpressiveness of our representation formalism, weprovide several strategies to cope with performanceproblems:ž The inference engine may be configured to sub-

sequently deliver answers to the query insteadof waiting for the entire set of answers beforethese answers are presented to the user. Thus, an-

swers that are directly available as facts may bepresented immediately, while other answers thathave to be derived using rules are presented later.ž The inference engine caches all facts and inter-

mediate facts derived from earlier queries. Thus,similar queries or queries that build on previouslyderived facts may be answered fast.ž Finally, we allow the inference engine to be split

into several inference engines that execute in par-allel. Every engine may run on a different proces-sor or even a different computer. Every inferenceengine administers a subset of the rules and facts.A master engine coordinates user queries and dis-tributes subqueries to the slave engines. Theseslave engines either answer these subqueries di-rectly or distribute incoming subqueries to otherinference engines.

482 S. Staab et al. / Computer Networks 33 (2000) 473–491

The reader may note that though we have pro-vided all the technical means to pursue one or sev-eral of these strategies, our showcase has not reachedyet the amount of facts that really necessitates anyperformance enhancing strategies.

5. Providing information

‘One method fits all’ does not meet the require-ments we have sketched above for the information-provisioning part of community Web portals. Whatone rather needs is a set of methods and toolsthat may account for the diversity of informationsources of potential interest to the community por-tal. While these methods and tools need to obeydifferent syntactic mechanisms, coherent integrationof information is only possible with a conceptualbasis that may sort loose pieces of information intoa well-defined knowledge warehouse. In our setting,the conceptual basis is given through the ontol-ogy that provides the background knowledge andthat supports the presentation of information by se-mantic, i.e. rule-enhanced, F-Logic queries. Talkingabout the syntactic and=or interface side, we sup-port three major, different, modes of informationprovisioning. First, we handle metadata-based infor-mation sources that explicitly describe contents ofdocuments on a semantic basis. Second, we alignregularities found in documents or data structureswith the corresponding semantic background knowl-edge in wrapper-based approaches. Thus, we maycreate a common conceptual denominator for pre-viously unrelated pieces of information. Finally,we allow the direct provisioning of facts throughour fact editor. All the information is brought to-gether in a knowledge warehouse that stores dataand metadata alike. Thus, it mediates between theoriginal information sources and the navigatingand querying needs discussed in the previous sec-tion.

5.1. Metadata-based information

Metadata-based information enriches documentswith semantic information by explicitly adding meta-data to the information sources. Over the last yearsseveral metadata languages have been proposed

which can be used to annotate information sources.In our approach the specified ontology constitutesthe conceptual backbone for the different syntacticmechanisms.

Current Web standards for representing metadatalike RDF [28] or XML [27] can be handled withinour semantic Web portal approach. On the one hand,RDF facts serve as direct input for the knowledgewarehouse, while on the other hand, RDF facts canbe generated from information contained in the por-tal knowledge warehouse. We have developed SiLRI(Simple Logic-based RDF Interpreter), a logic-basedinference engine implemented in Java that can drawinferences based on the RDF data model [7].

XML provides the chance to get metadata for free,i.e. as a side product of defining the document struc-ture. For this reason, we have developed a methodand a tool called DTDMaker for generating DTDsout of ontologies [10]. DTDMaker derives an XMLdocument type definition from a given ontology inF-Logic, so that XML instances can be linked toan ontology. The linkage has the advantage that thedocument structure is grounded on a true semanticbasis and, thus, facts from XML documents may bedirectly integrated into the knowledge warehouse.

HTML-A, early proposed by [11], is an HTMLextension which adds annotations to HTML doc-uments using an ontology as a metadata schema.HTML-A has the advantage to smoothly integratesemantic annotations into HTML and prevents theduplication of information. An example is an HTMLpage that states that the text string ‘Rudi Studer’is the name of a researcher where the URL of hishomepage is used as his object identifier. UsingHTML-A, this could be realized by:

<HTML><BODY>

<A onto="page:Researcher"/><H1> HomePage of<A onto="page[firstName=body]">

Rudi</A><A onto="page[lastName=body]">

Studer</A></H1>...

</BODY></HTML>

S. Staab et al. / Computer Networks 33 (2000) 473–491 483

The keyword page refers to the Webpage thatcontains the ontological mark-up. The first anno-tation denotes an object of type Researcher thatrepresents the homepage of Rudi Studer. Subse-quent annotations define the FIRSTNAME and theLASTNAME attributes of this object by relating tothe values from the body of the corresponding an-chor-tags. To facilitate the annotation of HTML, wehave developed an HTML-A annotation tool calledOntoPad. An example annotation of an e-mail of theResearcher Rudi Studer using OntoPad is illustratedin Fig. 4. Similarly to HTML-A, it is possible toenrich documents generated with Microsoft Officeapplications with metadata by using our plug-insWord-A and Excel-A.

5.2. Wrapper-based information

In general, annotating information sources byhand is a time-consuming task. Often, however, an-notation may be automated when one finds regulari-ties in a larger number of documents. The principalidea behind wrapper-based information is that thereare large information collections that have a similarstructure. We here distinguish between semi-struc-tured information sources (e.g. HTML) and struc-tured information sources (e.g. relational databases).

5.2.1. Semi-structured sourcesIn recent years several approaches have been pro-

posed for wrapping semi-structured documents, suchas HTML documents. Wrapper factories (cf. [25])and wrapper induction (cf. [17]) have considerablyfacilitated the task of wrapper construction. In or-der to wrap directly into our knowledge warehousewe are currently developing our own wrapper ap-proach that directly aligns regularities in semi-struc-tured documents with their corresponding ontologi-cal meaning.

5.2.2. Structured sourcesThough, in the KA2 community Web there are no

existing information systems, we would like to em-phasize that existing databases and other legacy-sys-tems may contain valuable information for building acommunity Web portal. Ontologies have shown theirusefulness in the area of intelligent database inte-gration. They act as information mediators (cf. [30])

between distributed and heterogeneous informationsources and the applications that use these informa-tion sources. Existing entities in legacy systems aremapped onto concepts and attributes defined in theontology. Thus, existing information may be pumpedinto the knowledge warehouse by a batch process orit may be accessed on the fly.

5.3. Fact Editor

The process of providing new facts into theknowledge warehouse should be as easy as possi-ble. For this reason the hyperbolic interface tool (cf.Fig. 3) which may be used as a Fact Editor. In thismode its forms are not used to ask for values, butto insert values for attributes of instances of cor-responding concepts from the ontology. The FactEditor is also used for maintaining the portal, viz. toadd, modify, or delete facts.

5.4. Knowledge warehouse

The different methods and tools we have just de-scribed feed directly into the knowledge warehouseor indirectly when they are triggered by a Webcrawl. The warehouse itself hosts the ontology, i.e.the metadata level, as well as the data proper. Theknowledge warehouse is indirectly accessed, througha user query or a query by an inference enginesuch as described in Section 4. Hence, one maytake full advantage of the distribution capabilitiesof the inference engine and, likewise, separate theknowledge warehouse into several knowledge basesor knowledge marts. Facts and concepts are storedin a relational database; however, they are stored ina reified format that treats relations and concepts asfirst-order objects and that is therefore very flexi-ble with regard to changes and amendments of theontology.

6. Development of Web portals

6.1. The development and maintenance process

Even with the methodological and tool support wehave described so far, developing a Web portal fora community of non-trivial size remains a complex

484 S. Staab et al. / Computer Networks 33 (2000) 473–491

Fig. 5. The development process of the community Web portal.

task. Strictly ad-hoc rapid prototyping approacheseasily doom the construction to fail or they easily leadup to unsatisfactory results. Hence, we have thoughtabout a more principled approach towards the devel-opment process that serves as means for documentingdevelopment, as well as for communicating princi-pal structures to co-developers and editors of the Webportal. We distinguish different phases in the devel-opment process that are illustrated in Fig. 5. For themain part this model is a sequential one. Neverthe-less, at each stage there is an evaluation as to whetherand as to how easily further development may pro-ceed with the design decisions that have been accom-plished before. The results feed back into the resultsof earlier stages of development. In fact, experiencesgained by running the operational system often findtheir way back into the system.

The main stages of the development process andtheir key characteristics are given in the following.ž The process starts with the elicitation of user re-

quirements in the requirements elicitation phase.In this phase, requirements about important andinteresting topics in the domain are collected, theinformation goals of potential users of the por-tal are elicited, and preferences or expectationsconcerning the structure and layout of presentedinformation are documented. Results of this veryfirst phase constitute the input for the design ofthe Web site and for preliminary HTML pagesand affect the formal domain model embodied inthe ontology.ž The requirements determine, e.g., which views

and queries are useful for users of the portal,

which navigation paths they expect, how differ-ent Web pages are linked, or which functionalityis provided in different areas of the portal. Re-quirements like these are realized in the Web sitedesign. This design phase may be performed in-dependently to a very large extent from the under-lying formal structuring, i.e. the ontology. Sincea mock-up version of the Web site is developedearly in the development phase, one may checkearly whether the system to be developed reallymeets the users’ needs.ž In parallel to the development of the structure and

layout of the Web site an ontology engineeringprocess is started. The first phase elicits relevantdomain terms that need to be refined and amendedin the ontology engineering phase. First, the staticontology parts, i.e. the concept hierarchy, the at-tributes, and relations between concepts, are for-mally defined. Thereafter, rules and constraintsare developed. Rule development may necessitatea major revision of the concept hierarchy. Forinstance, new subconcepts may have to be intro-duced, attributes may have to turn into relationsor into other concepts, or relations may have tobecome concepts. This (intra-ontology) engineer-ing cycle must be performed until the resultingontology remains sufficiently stable.ž In the query formulation step the views and

queries described in one of the earlier phases areformalized. At first, their functionality is tested in-dependently from the Web site design. To expressthe information needs formally, the developer hasto access the ontology, whereby additional rules

S. Staab et al. / Computer Networks 33 (2000) 473–491 485

or relations that define new views or ease the def-inition of queries may become necessary. In orderto test ontology and queries, a set of test factshas to be prepared. During this process of testing,inconsistencies in the ontology may be detected,which may lead to a feed back-loop back into theontology engineering phase.ž Finally, Web pages are populated, i.e. the queries

and views developed during Website design, andformalized and tested during query formalizationare integrated into the operational portal. Infor-mation may be accessed via the portal as soon asa sufficient amount has been made available asoutlined in Section 5.During operation of the community portal it must

be fed and maintained:ž The user community provides facts via numerous

input channels (cf. Section 5).ž These facts may contain errors or undesired con-

tents, or the integration of different sources maylead to inconsistencies. To counter problems likethese, an editor is responsible to detect these casesand act appropriately. The detection of inconsis-tencies is supported by the inference engine viaconstraints formulated in F-Logic. The editor thenhas to decide how to proceed. He may contactresponsible authors, simply ignore conflicting in-formation, or he may manually edit the contents.ž Changing requirements of the community must

be reflected in the portal, e.g. popularity increas-ing in new fields of interests or technologies orviewpoints that shift may incur changes to theontology, new queries, or even a new Web sitestructure. In order to meet such new require-ments, the above-mentioned development processmay have to be partially restarted.

6.2. Tools for development and maintenance

The previous subsection has described the prin-cipal steps for developing a community Web portal.For efficient development of a community Web por-tal, however, the process must be supported by tools.In the following, we describe the most importanttools that allow us to facilitate and speed up the de-velopment and maintenance process. The tools coverthe whole range from ontology engineering (On-toEdit), query formulation (Query Builder), up to

the creation of dynamic Web pages with the help ofHTML=JavaScript templates. We just want to notehere that we also rely on common HTML editingtools for Web site design.

6.2.1. OntoEditThe OntoEdit toolset has already been mentioned

in Section 3.2 and a screenshot is shown in Fig. 1.OntoEdit is used during terminology engineering andrule development. It delivers a wide range of func-tionalities for the engineering of ontologies. As in-troduced in Section 3, ontology modeling, from ourpoint of view, includes the creation of concepts, at-tributes, relations, rules, and general metadata. To re-duce complexity and to simplify the difficult task ofontology modeling, OntoEdit offers different viewson the main modeling elements, thus facilitating thecomplex process of ontology modeling. The mod-eling task is usually started with introducing newconcepts and organizing them into a hierarchy (cf.left part of Fig. 1). The next step of the modelingtask uses the concepts to model attributes of andrelations between concepts. On the basis of theseknowledge structures, OntoEdit includes a rule viewenabling the user to model rules which state com-mon legalities (e.g. the symmetry of the cooperationrelationship between two persons).

6.2.2. Query BuilderWhile queries with low complexity can be ex-

pressed in F-Logic using the rule debugger alone, inother cases it is more convenient to create queriesusing our Query Builder tool. The hyperbolic view(cf. Section 4) interface may be configured for thistool to allow the user to generate queries instead ofposing queries. Such queries may then be integratedas links into a Web page with the help of a commonWeb page editor by copying and pasting it into theWeb editors form.

6.2.3. HTML=JavaScript templatesAnother time consuming activity is the develop-

ment of the Web pages that assemble queries fromparts and that display the query results. For thatpurpose we have developed a library of templatepages:ž Templates with check boxes, radio boxes, and se-

lection lists are available. These HTML forms pro-

486 S. Staab et al. / Computer Networks 33 (2000) 473–491

duce data which are used in JavaScript functionsto generate queries. These queries can then be sentto the inference engine using submit buttons.ž The results of a query are fed into a template page

as Javascript arrays. From these data differentpresentation forms may be generated (for thefuture, we envision that XML is returned by theinference engine and outlaid by XSL style sheets):– A general purpose template contains a table

that presents answer tuples returned by the in-ference engine in a HTML table. The templateprovides functions to sort the table in ascend-ing or descending order on different columns.Substitutions of certain variables may be usedas URLs for other entries in the table. Differ-ent data formats are recognized and processedaccording to their suffixes, i.e. a ‘.gif’ or ‘.jpg’suffix is interpreted as a picture and renderedas such (cf. Fig. 2 for an example).

– Results may also be fed into selection lists,radio boxes, or check lists. Thus, query resultscan provide the initial setting of further HTMLform fields. These forms can be used to createnew queries based on the results of previousones.

ž A user can create queries using the hyperbolicview interface (cf. Section 4). As a personaliza-tion feature of our Web portal he can store thesequeries by assigning a personal label. All storedqueries can be accessed through a selection list,to restart the query and retrieve the most up-to-date answers. This list of stored queries providesindividual short cuts to often needed information.The template pages are compatible with some

standard Web editors, i.e. the Web designer is able torearrange or redesign the elements without destroy-ing their functionality.

7. The system architecture

This section summarizes the major componentsof our system. An overall view of our system isdepicted in Fig. 6, which includes the core modulesfor accessing and maintaining a community Webportal:ž Providing information in our community Web

portal has already been introduced in Section 5.

In our approach we distinguish between metadata-based, wrapper-based and fact-based information.Metadata-based information (such as HTML-A,Word-A, Excel-A, RDF, XML) is collected fromthe Web using a fact crawler. Wrapper-based in-formation means integrating semi-structured andstructured information semi-automatically intothe knowledge warehouse. Using the fact edi-tor factual information can be directly added tothe knowledge warehouse.ž The knowledge warehouse is the knowledge base

of the community Web portal. It is structuredaccording to the ontology. The facts contained inthe knowledge warehouse and the ontology itselfserve as the input for the inference engine.ž The inference engine uses information from the

knowledge warehouse to answer queries. In ad-dition, it uses ontological structures and rules toderive additional factual knowledge that is onlyimplicitly provided. This inference mechanismmay also be used to reduce the maintenance ef-forts, because facts that may be automaticallyderived from other facts need not be provided bysome member of the community.ž Accessing the community Web portal means nav-

igating and querying for information as describedin Section 3. Queries embedded in the portal orformulated using the hyperbolic view interface

Fig. 6. System architecture.

S. Staab et al. / Computer Networks 33 (2000) 473–491 487

(cf. Fig. 3) are posted to the inference engine. Theresults may be delivered in different forms likeHTML, XML, or RDF.

8. Related work

This section positions our work in the context ofexisting Web portals like Yahoo and Netscape andalso relates our work to other technologies that are orcould be deployed for the construction of communityWeb portals.

One of the well-established Web portals on theWeb is Yahoo 4, a manually maintained Web index.Yahoo allows information seekers to retrieve Webdocuments by navigating a tree-like taxonomy oftopics. Each Web document indexed by Yahoo isclassified manually according to a topic in the taxon-omy. In contrast to our approach Yahoo only utilizesa very light-weight ontology that solely consists ofcategories arranged in a hierarchical manner. Yahoooffers keyword search (local to a selected topic orglobal) in addition to hierarchical navigation but isonly able to retrieve complete documents, i.e. it isnot able to answer queries concerning the contentsof documents, not to mention to present or combinefacts being found in different documents. Due to itsweak ontology Yahoo cannot extend information toinclude facts that could be derived through ontolog-ical axioms. The mentioned points are realized inour portal that builds upon a rich ontology enablingthe portal to give detailed and integrated answersto queries. Furthermore, our portal supports the ac-tive provision of information by the user community.Thus we get rid of the manual and centralized clas-sification of documents. The portal is made by thecommunity for the community.

A portal that is specialized for a scientific com-munity has been built by the Math-Net project, aninitiative for “setting up the technical and organi-zational infrastructure for efficient, inexpensive anduser driven information services for mathematics”[5]. At http://www.math-net.de/ the portal for the(German) mathematics community is installed thatmakes distributed information from several math-ematical departments available. The scope of of-

4 http://www.yahoo.com

fered information ranges from publications such aspreprints and reports to information about researchprojects, organizations, faculty etc. All these data areaccompanied by metadata according to the DublinCore 5 standard [29] that makes it comparativelyeasy to provide structured views to the stored infor-mation. The 15 elements of Dublin Core primarilydescribe metadata about a resource, e.g. its title,its author, or its format. The Dublin Core element‘Subject’ is used to classify resources as students, asconferences, as research groups, as preprints etc. Afiner classification (e.g. via attributes) is not possibleexcept for instances of the publication category. Herethe common MSC (Mathematical Subject Classifi-cation 6) is used that resembles an ontology of thefield of mathematics. The pros of Math-Net lie inthe “important organizational task” of “establishinga network of persons [...] dedicating their time andwork to the electronic information system” [5] and inthe technical task of collecting various resources andintegrating them to make them uniformly accessible.The cons of Math-Net are the lack of a rich ontologythat could enhance the quality of search results (esp.via inferencing), and the restriction to information inthe DC format.

Parts of Math-Net are implemented on the basisof the Hyperwave system [21], an elaborated Webserver that is based on databases providing informa-tion in a structured manner. Hyperwave has a lotof fancy and useful features such as external links,automatic handling of outdated pages, avoiding ofdangling links, presenting different views for differ-ent users, etc. The system is a useful basis for thedevelopment of a portal, but since it does not havethe notion of an ontology, the portal is restricted tothe power of the underlying database. Its capabili-ties clearly stay behind the inferential properties ofOntobroker’s inference engine.

Another community-focused portal is RiboWeb[1], an ontology-based data resource of publishedribosome data and computational models that canprocess these data. RiboWeb exploits several typesof ontologies to provide semantics for all the dataand computational models that are offered. Theseontologies are specified in the OKBC (Open Knowl-

5 http://www.purl.org/dc6 http://www.ams.org/msc/

488 S. Staab et al. / Computer Networks 33 (2000) 473–491

edge Base Connectivity) representation language [4].The primary source of data is given by publishedscientific literature which is manually linked to thedifferent ontologies. Both systems, RiboWeb and ourcommunity portal, rely on ontologies for offering asemantic-based access to the stored data. However,the OKBC knowledge base component of RiboWebdoes not support the kind of automatic deductionthat is offered by the inference engine of Ontobro-ker. Furthermore, RiboWeb does not include wrap-pers for automatically extracting information fromthe given published articles. On the other hand, thecomputational modules of RiboWeb offer process-ing functionalities that are not part of (but also notintended for) our community Web portal.

The Ontobroker project [8] lays the technologicalfoundations for the KA2 portal. On top of Onto-broker the portal has been built and organizationalstructures for developing and maintaining it havebeen established. Therefore, we compare our systemagainst approaches that are similar to Ontobroker.

The approach closest to Ontobroker is SHOE[19]. In SHOE, HTML pages are annotated via on-tologies to support information retrieval based onsemantic information. Besides the use of ontologiesand the annotation of Web pages the underlyingphilosophy of both systems differs significantly: inSHOE, arbitrary extensions to ontologies can be in-troduced on Web pages and no central provider indexis maintained. As a consequence, when specifying aquery, users can not know all the ontological termswhich have been used and the Web crawler will missannotated Web pages. In contrast, Ontobroker relieson the notion of a community defining a group ofWeb users who share a common understanding and,thus, can agree on an ontology for a given field.Therefore, both the information providers and theclients have complete knowledge of the available on-tological terms, a prerequisite for building a commu-nity Web portal. SHOE and Ontobroker also differwith respect to their inferencing capabilities. SHOEuses description logic as its basic representation for-malism, but it offers only very limited inferencingcapabilities. Ontobroker relies on Frame-Logic andsupports complex inferencing for query answering.

WebKB [20] aims at providing intelligent accessto Web documents. WebKB uses conceptual graphsfor representing the semantic content of Web docu-

ments. It embeds conceptual graph statements intoHTML-tags to provide metadata about the contentsof HTML documents. These statements are basedon an ontology defining the concepts and relationswhich may be used for annotating the HTML doc-uments. WebKB pursues a rather similar approachwhen compared to our backbone system Ontobro-ker: both systems use a general representation lan-guage for representing metadata (conceptual graphsand Frame-Logic, respectively) and embed metadatawithin the HTML source code. However, Ontobro-ker (and thus our portal) provides additional meansfor accessing non-HTML resources, e.g. exploitingXML, RDF metadata, or using ontology-based wrap-pers. Furthermore, the tool environment of WebKBdoes not offer the methods and tools that are neededto build a community portal on top of WebKB and,thus, to make an application out of a core technology.

The STRUDEL system [12] applies conceptsfrom database management systems to the processof building Web sites. STRUDEL uses a mediatorarchitecture to generate such Web sites. Wrapperstransform the external data sources, being eitherHTML pages, structured files or relational databases,into the semi-structured data model used withinSTRUDEL’s data repository. STRUDEL then usesso-called ‘site-definition queries’ to create multipleviews to the same Web site data. These queries aredefined in STRUQL, a query language for manipu-lating semi-structured data. When compared to ourapproach, the STRUDEL system lacks the semanticlevel that is provided in our approach by the domainontology and the associated inference engine.

The Observer system [23] uses a network ofontologies to provide access to distributed and het-erogeneous information. Each ontology can describethe information contained in one or more informa-tion repositories. Since these ontologies are linkedexplicitly by so-called inter-ontology relationships(synonym, hypernym, and hyponym), the informa-tions stored in the different resources are linked aswell. A user selects an ontology (a vocabulary) toexpress his query. The Observer system accesses theresources described by the selected ontology to an-swer the query. If the answers are not satisfactoryother ontologies and thus other resources can be ac-cessed. Observer provides means to state queries butdoes not offer predefined or customizable views.

S. Staab et al. / Computer Networks 33 (2000) 473–491 489

From our point of view, our community portalsystem is rather unique with respect to the collectionof methods used and the functionality provided. Ourapproach for accessing information, providing infor-mation and maintaining the portal are more compre-hensive than those found in other portals. We are ableto offer this functionality since our backbone systemOntobroker and its add-ons provide more powerfultechniques for, e.g., inferencing or extracting infor-mation from various sources than those offered bycomparable systems. Moreover, all these methodsare integrated into one uniform system environment.

9. Conclusion

We have demonstrated in this paper how a com-munity may build a community Web portal. Theportal is centered around an ontology that struc-tures information for the purposes of presentationand provisioning of information, as well as for thedevelopment and maintenance of the portal. We havedescribed a small-scale example, the KA portal, thatillustrates some of our techniques and methods. Inparticular, we have developed a set of ontology-based tools that allow to present multiple viewsonto the same information appropriate for browsing,querying, and personalizing Web pages. Adding in-formation in the up and running portal is possiblefor members of the community or the editors ofthe portal through a set of methods that support theintegration of metadata and (semi-)structured infor-mation sources, as well as the manual manipulationof data. Also the tools that support development ofthe portal and maintenance by the editors are ontol-ogy-based and, thus, fit together smoothly with theoverall process of running the community portal.

The concepts that we have explained are generalenough to apply to many other domains. As a secondapplication, we have constructed an intranet man-agement application for an IT service company, therationale being that the people in a company justform a community with common interests.

For the future, we will need to tackle some verypractical issues, like improving and integrating theinterfaces of our tools, adding a component for ses-sion management, or versioning of views and on-tologies. A major experience so far has been that

community members are only willing to contributeinformation if this is very easy to accomplish. Wecould not provide this quality of service from thestart. Hence our knowledge warehouse is still rathersmall, but large enough to show the viability ofour approach. Besides fancy interfaces, the most ur-gent needs will be the improvement of provisioning.There are some machine learning approaches (e.g.[22]) that may help with dedicated subtasks of theinformation provisioning process. Our general ex-perience, however, is that there is no single toolor technique that fits all needs, but there is a con-ceptual strategy that ties up all the loose ends, viz.ontologies.

References

[1] R. Altmann, M. Bada, X. Chai, M.W. Carillo, R. Chenand N. Abernethy, RiboWeb: an ontology-based system forcollaborative molecular biology, IEEE Intell. Syst. 14 (5)(1999) 68–76.

[2] R. Benjamins, D. Fensel and S. Decker, KA2: buildingontologies for the Internet: a midterm report, Int. J. HumanComput. Stud. 51 (3) (1999) 687.

[3] T. Berners-Lee, Weaving the Web, Harper, New York, 1999.[4] V. Chaudri, A. Farquhar, R. Fikes, P. Karp and J. Rice,

OKBC: a programmatic foundation for knowledge baseinteroperability, in: Proc. 15th Nat. Conf. on Artificial In-telligence (AAAI-98), 1998, pp. 600–607.

[5] W. Dalitz, M. Grotschel and J. Lugger, Information servicesfor mathematics in the Internet (Math-Net), in: A. Sydow(Ed.), Proc. 15th IMACS World Congress on ScientificComputation: Modelling and Applied Mathematics, Artif.Intell. Comput. Sci. 4 (1997) 773–778.

[6] T. Davenport and L. Prusak, Working Knowledge: HowOrganizations Manage What They Know, Harvard BusinessSchool Press, Cambridge, MA, 1998.

[7] S. Decker, D. Brickley, J. Saarela and J. Angele, A Queryand Inference Service for RDF, in: Proc. W3C Query Lan-guage Workshop (QL-98), Boston, MA, December 3–4,1998.

[8] S. Decker, M. Erdmann, D. Fensel and R. Studer, On-tobroker: ontology based access to distributed and semi-structured information, in: R. Meersman et al. (Eds.),Database Semantics: Semantic Issues in Multimedia Sys-tems, Kluwer, Dordrecht, 1999, pp. 351–369.

[9] A. Deutsch, M. Fernandez, D. Florescu, A. Levy and D.Suciu, A query language for XML, in: Proc. 8th Int. WorldWide Web Conf. (WWW’8), Toronto, May 1999, pp. 1155–1169.

[10] M. Erdmann and R. Studer, Ontologies as conceptual mod-els for XML documents, in: Proc. 12th Int. Workshop

490 S. Staab et al. / Computer Networks 33 (2000) 473–491

on Knowledge Acquisition, Modelling and Management(KAW’99), Banff, October, 1999.

[11] D. Fensel, S. Decker, M. Erdmann and R. Studer, Onto-broker: the very high idea, in: Proc. 11th Int. Flairs Conf.(FLAIRS-98), Sanibel Island, FL, May, 1998.

[12] M. Fernandez, D. Florescu, J. Kang and A. Levy, Catch-ing the boat with Strudel: experiences with a Web-sitemanagement system, in: Proc. 1998 ACM Int. Conf. onManagement of Data (SIGMOD’98), Seattle, WA, 1998,pp. 414–425.

[13] P. Frohlich, W. Neijdl and M. Wolpers, KBS-Hyperbook— an open hyperbook system for education, in: Proc.10th World Conf. on Educational Media and Hypermedia(EDMEDIA’98), Freiburg, 1998.

[14] T.R. Gruber, A translation approach to portable ontologyspecifications, Knowledge Acquisition 6 (2) (1993) 199–221.

[15] M. Kesseler, A schema based approach to HTML author-ing, in: Proc. 4th Int. World Wide Web Conf. (WWW’4),Boston, MA, December 1995.

[16] M. Kifer, G. Lausen and J. Wu, Logical foundations ofobject-oriented and frame-based languages, J. ACM 42(1995) 741–843.

[17] N. Kushmerick, Wrapper induction: efficiency and expres-siveness, Artif. Intell. 118 (2000) 15–68.

[18] L. Lamping, R. Rao and P. Pirolli, A focus C contexttechnique based on hyperbolic geometry for visualizinglarge hierarchies, in: Proc. ACM SIGCHI Conf. on HumanFactors in Computing Systems, 1995, pp. 401–408.

[19] S. Luke, L. Spector, D. Rager and J. Hendler, Ontology-based Web agents, in: Proc. 1st Int. Conf. on AutonomousAgents, 1997.

[20] P. Martin and P. Eklund, Embedding knowledge in Webdocuments, in: Proc. 8th Int. World Wide Web Conf.(WWW’8), Toronto, May 1999, pp. 1403–1419.

[21] H. Maurer, Hyperwave. The Next Generation Web Solution,Addison-Wesley, Reading, MA, 1996.

[22] A. McCallum, K. Nigam, J. Rennie and K. Seymore, A ma-chine learning approach to building domain-specific searchengines, in: Proc. 16th Int. Joint Conf. on Artificial Intelli-gence (IJCAI-99), 1999, pp. 662–667.

[23] E. Mena, V. Kashyap, A. Illarramendi and A. Sheth, Do-main specific ontologies for semantic information broker-ing on the global information infrastructure, in: N. Guarino(Ed.), Formal Ontology in Information Systems, IOS Press,Amsterdam, 1998.

[24] J. Robie, J. Lapp and D. Schach, XML Query Language(XQL), in: Proc. of the W3C Query Language Workshop(QL-98), Boston, MA, December 3–4, 1998.

[25] A. Sahuguet and F. Azavant, Wysiwyg Web Wrapper Fac-tory (W4F), Technical report, http://db.cis.upenn.edu/DL/WWW8/index.html, 1999.

[26] UMLS, Unified Medical Language System, http://www.nlm.nih.gov/research/umls/.

[27] W3C, XML Specification, http://www.w3.org/XML/, 1997.[28] W3C, RDFS Specification, http://www.w3.org/TR/PR-rdf-s

chema/, 1999.

[29] S. Weibel, J. Kunze, C. Lagoze and M. Wolf, Dublin CoreMetadata for Resource Discovery, Number 2413 in IETF,The Internet Society, 1998.

[30] G. Wiederhold and M. Genesereth, The conceptual basis formediation services, IEEE Expert/Intell. Syst. 12 (5) (1997)1997.

Steffen Staab is assistant professorfor Applied Computer Science atKarlsruhe University. He has pub-lished in the fields of computa-tional linguistics, information ex-traction, knowledge representationand reasoning, knowledge manage-ment, knowledge discovery, and in-telligent systems for the Web. Stef-fen studied computer science andcomputational linguistics between1990 and 1998, earning a M.S.E.

from the University of Pennsylvania during a Fulbright schol-arship and a Dr. rer. nat. from Freiburg University during ascholarship with Freiburg’s graduate program in cognitive sci-ence. Since then, he has also been working as a consultant forknowledge management at Fraunhofer IAO and at the start-upcompany Ontoprise.

Jurgen Angele received the diplomadegree in Computer Science in 1985from the University of Karlsruhe.From 1985 to 1989 he workedfor the companies AEG, Konstanz,and SEMA GROUP, Ulm, Germany.From 1989 to 1994 he was a re-search and teaching assistant at theUniversity of Karlsruhe. He did re-search on the operationalization ofthe knowledge acquisition languageKARL, which led to a Ph.D. from

the University of Karlsruhe in 1993. In 1994 he became a fullprofessor in Applied Computer Science at the University of Ap-plied Sciences, Braunschweig, Germany. In 1999 he cofoundedthe company Ontoprise together with S. Decker, H.-P. Schnurr,S. Staab, and R. Studer and has been CEO of Ontoprise sincethen. His interests lie in the development of knowledge man-agement tools and systems, including innovative applications ofknowledge-based systems to the World Wide Web.

S. Staab et al. / Computer Networks 33 (2000) 473–491 491

Stefan Decker is working as aPostDoc at Stanfords Infolab to-gether with Prof. Gio Wiederholdin the Scalable Knowledge Compo-sition project on ontology articula-tions. He has published in the fieldsof ontologies, information extrac-tion, knowledge representation andreasoning, knowledge management,problem solving methods and intelli-gent systems for the Web. He is oneof the designers and implementers of

the Ontobroker-System. Stefan Decker studied computer scienceand mathematics at the University of Kaiserslautern and finishedhis studies with the best possible result in 1995. From 1995–1999 he did his Ph.D. studies at the University of Karlsruhe,where he worked on the Ontobroker project.

Michael Erdmann gained his M.D.in Computer Science from the Uni-versity of Koblenz (Germany) in1995. Since October 1995 he hasbeen working as a junior researcherat the University of Karlsruhe (Ger-many). He is a member of the On-tobroker-Project-Team and currentlyengaged in finishing his Ph.D. aboutthe relationship between semanticknowledge modeling with ontologiesand XML.

Andreas Hotho is a Ph.D. studentat the Institute of Applied Com-puter Science and Formal Descrip-tion Methods at Karlsruhe Univer-sity. He earned his M.D. in Infor-mation Systems from the Universityof Braunschweig, Germany, in 1998.His research interests include the ap-plication of data mining techniqueson very large databases and intelli-gent Web applications.

Alexander Maedche is a Ph.D. can-didate at the Institute of AppliedComputer Science and Formal De-scription Methods at Karlsruhe Uni-versity. He received his diplomain Industrial Engineering (computerscience, operations research) in 1999from Karlsruhe University. His re-search interests include ontology en-gineering, machine learning, dataand text mining and ontology-basedapplications.

Hans-Peter Schnurr is a Ph.D. can-didate at the Institute of AppliedComputer Science and Formal De-scription Methods at Karlsruhe Uni-versity. He received his diploma inIndustrial Engineering in 1995 fromKarlsruhe University. Between 1995and 1998, Hans-Peter was workingas a researcher and practice analystat McKinsey and Company and isco-founder of the start-up companyOntoprise, a knowledge management

solutions provider. His current research interests include knowl-edge management methodologies and applications, ontology en-gineering and ontology-based applications.

Rudi Studer obtained a diploma inComputer Science at the Universityof Stuttgart in 1975. In 1982 he wasawarded a Doctor’s degree in math-ematics and computer science at theUniversity of Stuttgart, and in 1985he obtained his Habilitation in com-puter science at the University ofStuttgart. From January 1977 to June1985 he worked as a research scien-tist at the University of Stuttgart.From July 1985 to October 1989 he

was project leader and manager at the Scientific Center of IBMGermany. Since November 1989 he has been full professor inApplied Computer Science at the University of Karlsruhe. His re-search interests include knowledge management, intelligent Webapplications, knowledge engineering and knowledge discovery.He is co-founder and member of the scientific advisory board ofthe knowledge management start-up company Ontoprise.

York Sure is a Ph.D. candidateat the Institute of Applied Com-puter Science and Formal Descrip-tion Methods at Karlsruhe Univer-sity. He received his diploma in In-dustrial Engineering in 1999 fromKarlsruhe University. His current re-search interests include knowledgemanagement, ontology merging andmapping, ontology engineering andontology-based applications.