HealthFinland - A national semantic publishing network and portal for health information

11
Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 287–297 Contents lists available at ScienceDirect Web Semantics: Science, Services and Agents on the World Wide Web journal homepage: www.elsevier.com/locate/websem HealthFinland—A national semantic publishing network and portal for health information Osma Suominen a,, Eero Hyvönen a , Kim Viljanen a , Eija Hukka b a Semantic Computing Research Group (SeCo), Helsinki University of Technology (TKK), Department of Media Technology, University of Helsinki, Department of Computer Science, Finland b National Institute for Health and Welfare (THL), Finland article info Article history: Received 22 April 2009 Received in revised form 9 July 2009 Accepted 16 September 2009 Available online 24 September 2009 Keywords: Health information Semantic portal Metadata Faceted search Content aggregation Ontology service abstract Providing citizens with reliable, up-to-date and individually relevant health information on the Web is done by governmental, non-governmental, business and other organizations. Currently the information is published with little co-ordination and co-operation between the publishers. For publishers, this means duplicated work and costs due to publishing same information twice on many websites. Also maintaining links between websites requires work. From the citizens point of view, finding content is difficult due to e.g. differences in layman’s vocabularies compared to medical terminology and difficulties in aggregating information from several sites. To solve these problems, we present a national scale semantic publishing system HealthFinland which consists of (1) a centralized content infrastructure of health ontologies and services with tools, (2) a dis- tributed semantic content creation channel based on several health organizations, and (3) an intelligent semantic portal aggregating and presenting the contents from intuitive and health promoting end-user perspectives for human users as well as for other websites and portals. © 2009 Elsevier B.V. All rights reserved. 1. Introduction Health information is one of the most frequently searched mate- rials on the Web. Among American Internet users, about 8 million adults searched for information on at least one health topic on a typ- ical day in August 2006, most commonly information about specific diseases and problems, treatments, diet and nutrition or exercise [26]. However, a citizen searching for health information on the Web faces many challenges [18]. General Web search engines are the most common starting points used to find health information [26]. Web search engines require that users know or can guess suitable keywords. Search engines are generally not aware of the context of use and can- not match content to user needs based on, e.g., target audience or other demographic information. Apart from information that search engines can infer from link structure and other statistical measures, there is no notion of trusted information sources or qual- ity measures relevant to assessing the trustworthiness of health information and advice. Corresponding author. Tel.: +358 9 451 3346. E-mail addresses: osma.suominen@tkk.fi (O. Suominen), eero.hyvonen@tkk.fi (E. Hyvönen), kim.viljanen@tkk.fi (K. Viljanen), eija.hukka@thl.fi (E. Hukka). URLs: http://www.seco.tkk.fi (O. Suominen), http://www.seco.tkk.fi (E. Hyvönen), http://www.seco.tkk.fi (K. Viljanen), http://www.thl.fi (E. Hukka). In practice, general Web search result listings for common health topics often contain a heterogeneous mix of health-related Web pages. As an example, the top twenty search results for diabetes retrieved by Google on April 21, 2009 include news sites, non-governmental organizations such as patient groups, a Wikipedia article, commercial sites that promote health-related products and services, as well as government sites that seek to inform citizens. More specific queries such as diabetes preg- nant also turn up user-generated content, e.g., discussion forum postings, blogs and video clips. Sorting out relevant, reliable and useful content from this mix, and assessing the quality of health websites, requires background knowledge about the organizations that publish health informa- tion on the Web. When users do find interesting resources, finding related relevant content is not very easy as no single website can provide everything a user may be interested in, and the link- ing between sites is often insufficient. Many common information needs and situations, such as when a baby is born into a family, require aggregation of content from several information providers. The quality and trustworthiness of information varies. In many cases, it is difficult to know whether information on a Web page is based on scientific research, layman opinions and rumors, or whether it is motivated by commercial interests. Even basic qual- ity indicators such as information sources and publication dates are often absent on health websites, and users do not consistently check for them [26]. Even when good quality information is found, 1570-8268/$ – see front matter © 2009 Elsevier B.V. All rights reserved. doi:10.1016/j.websem.2009.09.003

Transcript of HealthFinland - A national semantic publishing network and portal for health information

Ha

Oa

Ub

a

ARRAA

KHSMFCO

1

raid[W

prenosmii

H

(

1d

Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 287–297

Contents lists available at ScienceDirect

Web Semantics: Science, Services and Agentson the World Wide Web

journa l homepage: www.e lsev ier .com/ locate /websem

ealthFinland—A national semantic publishing networknd portal for health information

sma Suominena,∗, Eero Hyvönena, Kim Viljanena, Eija Hukkab

Semantic Computing Research Group (SeCo), Helsinki University of Technology (TKK), Department of Media Technology,niversity of Helsinki, Department of Computer Science, FinlandNational Institute for Health and Welfare (THL), Finland

r t i c l e i n f o

rticle history:eceived 22 April 2009eceived in revised form 9 July 2009ccepted 16 September 2009vailable online 24 September 2009

a b s t r a c t

Providing citizens with reliable, up-to-date and individually relevant health information on the Web isdone by governmental, non-governmental, business and other organizations. Currently the informationis published with little co-ordination and co-operation between the publishers. For publishers, this meansduplicated work and costs due to publishing same information twice on many websites. Also maintaininglinks between websites requires work. From the citizens point of view, finding content is difficult due to

eywords:ealth informationemantic portaletadata

aceted searchontent aggregation

e.g. differences in layman’s vocabularies compared to medical terminology and difficulties in aggregatinginformation from several sites.

To solve these problems, we present a national scale semantic publishing systemHealthFinlandwhichconsists of (1) a centralized content infrastructure of health ontologies and services with tools, (2) a dis-tributed semantic content creation channel based on several health organizations, and (3) an intelligentsemantic portal aggregating and presenting the contents from intuitive and health promoting end-user

sers

ntology service perspectives for human u

. Introduction

Health information is one of the most frequently searched mate-ials on the Web. Among American Internet users, about 8 milliondults searched for information on at least one health topic on a typ-cal day in August 2006, most commonly information about specificiseases and problems, treatments, diet and nutrition or exercise26]. However, a citizen searching for health information on the

eb faces many challenges [18].General Web search engines are the most common starting

oints used to find health information [26]. Web search enginesequire that users know or can guess suitable keywords. Searchngines are generally not aware of the context of use and can-ot match content to user needs based on, e.g., target audiencer other demographic information. Apart from information that

earch engines can infer from link structure and other statisticaleasures, there is no notion of trusted information sources or qual-

ty measures relevant to assessing the trustworthiness of healthnformation and advice.

∗ Corresponding author. Tel.: +358 9 451 3346.E-mail addresses: [email protected] (O. Suominen), [email protected] (E.

yvönen), [email protected] (K. Viljanen), [email protected] (E. Hukka).URLs: http://www.seco.tkk.fi (O. Suominen), http://www.seco.tkk.fi

E. Hyvönen), http://www.seco.tkk.fi (K. Viljanen), http://www.thl.fi (E. Hukka).

570-8268/$ – see front matter © 2009 Elsevier B.V. All rights reserved.oi:10.1016/j.websem.2009.09.003

as well as for other websites and portals.© 2009 Elsevier B.V. All rights reserved.

In practice, general Web search result listings for commonhealth topics often contain a heterogeneous mix of health-relatedWeb pages. As an example, the top twenty search results fordiabetes retrieved by Google on April 21, 2009 include newssites, non-governmental organizations such as patient groups, aWikipedia article, commercial sites that promote health-relatedproducts and services, as well as government sites that seek toinform citizens. More specific queries such as diabetes preg-nant also turn up user-generated content, e.g., discussion forumpostings, blogs and video clips.

Sorting out relevant, reliable and useful content from this mix,and assessing the quality of health websites, requires backgroundknowledge about the organizations that publish health informa-tion on the Web. When users do find interesting resources, findingrelated relevant content is not very easy as no single websitecan provide everything a user may be interested in, and the link-ing between sites is often insufficient. Many common informationneeds and situations, such as when a baby is born into a family,require aggregation of content from several information providers.

The quality and trustworthiness of information varies. In manycases, it is difficult to know whether information on a Web page

is based on scientific research, layman opinions and rumors, orwhether it is motivated by commercial interests. Even basic qual-ity indicators such as information sources and publication datesare often absent on health websites, and users do not consistentlycheck for them [26]. Even when good quality information is found,

2 es and Agents on the World Wide Web 7 (2009) 287–297

imf

pccdcqap

mgos

seHiitssaHttba

utdmpldbr

2

Wcaw

cAeaatniba

Fig. 1. Three main components of HealthFinland.

88 O. Suominen et al. / Web Semantics: Science, Servic

t may not be targeted for the right audience: a treatment recom-endation for gestational diabetes issued by a government agency

or doctors may be useless for an anxious future mother.From the health information publishers’ point of view, a central

roblem today is that there are many actors and websites but littleo-ordination and cooperation between them, resulting in dupli-ate work even between non-competitive organizations such asifferent government institutions [18]. Reusing pre-existing Webontent is mainly done by manually linking websites, but such linksuickly expire when websites are restructured or shut down. Cre-ting good quality health content is problematic and requires arocess for quality control, e.g., regular reviews and updates.

The annotation of Web content with metadata is not very com-on on health websites, but it is sometimes mandated by, e.g.,

overnment standards. In these cases, the annotation processes areften tedious and require expertise in the use of large vocabulariesuch as the Medical Subject Headings (MeSH).1

HealthFinland2 is a semantic health information publishingystem that addresses these problems both from the publish-rs’ and citizens’ viewpoints. The concept and ideas behindealthFinland have been presented in [18,17]. The design and

mplementation of the portal information architecture and usernterface was presented in [32], and the design and implementa-ion of the portal in more detail in [34]. The FinnONTO ontologyervice infrastructure and services used in the system are pre-ented in [42,19,36]. This article complements our earlier worknd publications by providing a more detailed description of theealthFinlandportal components, the content creation infrastruc-

ure, the user interface, and the evaluations performed. Based onhe lessons learned in developing the prototype, the system haseen developed further and published in a production environments a national health portal.

In the following, we first present and overview of the ideasnderlying HealthFinland and the major components and archi-ecture of the system. We then present the ontologies and otheromain vocabularies used in the system as well as the metadataodel and the underlying content production system. Finally, we

resent the user interface of the semantic portal. In each section,essons learned in designing and implementing the prototype areiscussed, and related work is pointed out. In conclusion, contri-utions of the research are summarized and directions for furtheresearch are outlined.

. The HealthFinland approach

The main reasons for developingHealthFinland using Semanticeb technologies are [18]: facilitating cost-effective, distributed

ontent creation in an interoperable way, aggregating contentsutomatically based on semantics, and providing the end-usersith intelligent services.

First, HealthFinland minimizes duplicate redundant work andosts in creating health content on a national level by collaboration.goal of theHealthFinland collaborative production network is to

nsure that information about a health topic is produced only oncend by the organization that has the best professional experiencebout the topic. By using semantic technologies, the content canhen be re-used in different Web portals by the other organizations,

ot only in the organization’s own website. This possibility is facil-

tated by annotating the content locally with semantic metadataased on shared ontologies, and by making the global repositoryvailable by a semantic portal and as mash-up Web services.

1 http://www.nlm.nih.gov/mesh/2 http://www.seco.tkk.fi/applications/tervesuomi/

The second key idea behindHealthFinland is to try to minimizethe content maintenance costs of portals by letting the computertake care of semantic link maintenance and aggregation of con-tent from the different publishers. This possibility is also based onshared semantic metadata and ontologies. New content relevantto a topic may be published at any moment by any of the contentproviders, and the system is able to put the new piece of informa-tion in the right context in the portal, and automatically link it withrelated information.

The third major idea ofHealthFinland is to provide the end-userwith intelligent services for finding the right information based onhis or her own information needs or conceptual view to health, andfor browsing the contents based on their semantic relations. Theviews and vocabularies used in the end-user interface are indepen-dent of the content providers’ organizational perspective, and arebased on a “layman’s” vocabulary that is different from the medicalexpert vocabularies used by the content providers in indexing thecontent [32].

To implement these ideas, HealthFinland has three main com-ponents (cf. Fig. 1): (1) The domain vocabularies describe thedifferent aspects of health information such as topics, genres andaudiences, and they are published in an ontology repository aspart of the FinnONTO system of mutually interlinked vocabular-ies/ontologies [19]. (2) The content production system containsspecifications and tools for annotating, harvesting, and verifyingcontent. (3) The Semantic Portal for providing the contents forhuman users through semantic search and browsing services andfor machines, i.e., other Web applications, as semantic widgets(called “floatlets”) [23].

Fig. 2 gives a more detailed overview of the HealthFinlandsystem architecture. Metadata and documents are collected fromthe content publishers either by harvesting content and metadatafrom their content management systems or by annotating con-tent manually using the SAHA metadata editor [37] connected toONKI ontology services [36]. The content is validated and possibleproblems are reported to the content providers. The successfullyvalidated metadata is finally published through the portal forhumans and machines to use.

3. Domain vocabularies

A semantic portal needs a set of vocabularies which are ref-erenced by the metadata. The ontologies should describe all therelevant concepts in the application domain of the portal. InHealthFinland, ontologies and other controlled vocabularies areused for describing document subject matter, content genres andtarget audiences.

O. Suominen et al. / Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 287–297 289

nd sy

3

jth

aceaaltu

(

ocMh

Fig. 2. HealthFinla

.1. Subject vocabularies

In the case of health information and health promotion, the sub-ect matter ontologies must describe a large number of possibleopics such as diseases, treatments and anatomy as well as livingabits such as diet, exercise and substance use.

We started by analyzing pre-existing, established ontologiesnd other structured vocabularies for describing health-relatedontent. Reusing existing ontologies offers several advantages: itases semantic interoperability with other applications, saves timend money by avoiding unnecessary ontology engineering work,nd helps to ensure broad coverage of the subject area as estab-ished ontologies typically have been used and developed for a longime. We settled on three core subject domain ontologies that aresed for describing the subject matter of Web contents:

(i) The General Finnish Upper Ontology (YSO)3 that includesapproximately 20 000 concepts. The YSO ontology was cre-ated by transforming the General Finnish Thesaurus YSA4 intoRDF/OWL format using the Protégé editor5 and by manuallycrafting the concepts into rdfs:subClassOf hierarchies [19].

(ii) The international Medical Subject Headings (MeSH) whichincludes approximately 23 000 concepts. We transformed thevocabulary into the SKOS Core format6 without changing thesemantics of the vocabulary or its structure using conversiontools and methods developed at the Free University of Amster-dam [38].

iii) The European Multilingual Thesaurus on Health Promotion7

(HPMULTI), which included a Finnish translation. HPMULTIcontains approximately 1200 concepts related specifically tohealth promotion. We transformed HPMULTI into SKOS/RDFin a similar way as MeSH using a custom conversiontool.

All three vocabularies are required to cover the subject matter

f the portal properly. YSO is broad but does not include enoughoncepts that describe detailed medical content. On the other hand,eSH contains lots of useful medical concepts, is widely used in the

ealth sector, but is focused on clinical healthcare. HPMULTI com-

3 http://www.seco.tkk.fi/ontologies/yso/4 http://www.vesa.lib.helsinki.fi5 http://protege.stanford.edu6 http://www.w3.org/2004/02/skos/7 http://www.hpmulti.net/

stem architecture.

plements the two vocabularies by focusing on health promotionspecific terminology.

To make the three subject vocabularies semantically compat-ible with each other, we created the Health Promotion Ontologyusing YSO as the structural basis and extended it manually, usingthe Protégé ontology editor, with concepts from the two othervocabularies. Currently, the Health Promotion Ontology containsall concepts from YSO and HPMULTI together with some 2500 con-cepts from MeSH.

3.2. Content genre vocabulary

Content genres describe different types of documents basedon purpose, form and content [29]. Genres can be specific to theapplication domain, and in the case of health information, someimportant genres include articles, tests (for assessing your currenthealth or habits), official recommendations and news. Although gen-res are currently not very widely used in Web search, they provideinteresting means for restricting content searches in addition tosubject- or keyword-based searching.

We could not find a suitable pre-existing genre vocabulary forhealth information, so we used a lightweight user-centered processvery similar to Rosso’s card sorting survey method [29] to constructa classification of genres. Five test subjects (Rosso’s experiment hadonly three participants in the initial phase) were asked to sort aselection of forty printed content pages into groups based on theirform and structure, and finally label them with descriptive names.An initial genre vocabulary was constructed based on an analysisof the results. The vocabulary was later extended and restructuredas new genres were identified that were not well represented inthe original experiment. Some categories were merged or renamedbased on experiences from user testing. The resulting genre classi-fication currently includes 21 content genres and is expressed as aSKOS Core vocabulary.

3.3. Target audience vocabulary

Describing the target audience of documents allows content tobe selected and promoted to certain users. Audience-based match-ing requires a vocabulary of audiences and a user interface thatmakes use of audience information.

To describe the different target audiences of the HealthFinlandportal, we initially created a Target Audience Ontology. We con-structed the ontology by taking into account five possible waysof segmenting audiences: by age, gender, occupation, and patientgroup as well as some other special groups and roles, e.g., par-

2 es and

epsaotoft

toTeaeoadtifatH

ttcetpmositpl

3

iccopvSsav

Ovifta

90 O. Suominen et al. / Web Semantics: Science, Servic

nts and exercise teams. We used the classification of occupationsublished by Statistics Finland8 and a smaller target audience clas-ification used in a publication database at the UKK Institute9

s starting points for the ontology. We organized the differentccupations based on a division into health professionals (e.g. doc-ors, nurses and therapists) and other, non-health occupations. Thentology was built using the Protégé ontology editor and trans-ormed into the SKOS Core format. The final ontology contains morehan 800 target audiences.

However, we soon discovered that the size and structure ofhe ontology was unwieldy for annotations, and the hierarchiesften unsuitable for processing search queries in a health portal.he occupation classification we used was intended for a differ-nt purpose (collecting national level statistics about, e.g., salaries)nd mainly distinguished different occupations by the level ofducation they require, which had little bearing on the needsf portal audiences. Additionally, as the needs of professionalsre very different from those of ordinary citizens, we decided torop health professionals as a target audience altogether in ordero better serve ordinary citizens in situations and tasks involv-ng their personal health. Many other websites and portals thatulfil work-related needs of health professionals already exist,nd new ones, such as THL.fi10 published by the National Insti-ute for Health and Welfare, were being developed in parallel toealthFinland.

Finally, we settled on a relatively small audience classificationhat takes into account the needs and goals of users as well ashe tone of discussion in the available content. The classificationonsists of two levels: special categories for specific situations –.g. families with a baby, occupational health, and information forravellers – as well as three general categories for disease- androblem-oriented information, health improvement, and environ-ental issues, respectively. The intent is that content that fits one

f the special categories is classified therein, while the rest is clas-ified into one of the general categories. The current classificationncludes nine categories, but more special categories are expectedo be defined when new content is added into the HealthFinlandortal. The classification is expressed using the SKOS Core vocabu-

ary.

.4. National ontology library service ONKI

Methods for publishing controlled vocabularies and using themn applications are required when used in professional contentreation work such as in the case of the HealthFinland contentreation network. At the start of the HealthFinland project, nonef the Content Management Systems (CMS) used by the contentublishers supported referencing concepts described in controlledocabularies by URIs in the metadata, as would be required by trueemantic Web applications. Vocabularies also evolve constantly,o methods for publishing the latest versions are needed in suchway that all parties are always using the latest version of the

ocabularies.The HealthFinland content creation network uses The Finnish

ntology Library Service ONKI11 for publishing all the domainocabularies discussed above. ONKI is a pilot system for deploy-

ng ontologies and using them on a national scale [41]. ONKIocuses especially on ontology publishing and using them in con-ent indexing and information retrieval through both end-user andpplication interfaces. ONKI ensures that all content creators use

8 http://www.stat.fi9 http://www.ukkinstituutti.fi

10 http://www.thl.fi11 http://www.yso.fi

Agents on the World Wide Web 7 (2009) 287–297

the latest versions of each ontology. From the point of view of thecontent creators, ONKI is used for browsing the ontologies and tosupport annotation of content with ontological concepts. ONKI isalso used by the HealthFinland content aggregator application forfinding concept URIs for the term labels.

4. Content production system

A semantic portal requires rich metadata with well-definedobject types, properties (fields) and value ranges in order to pro-vide rich search facilities to end-users. To ensure interoperabilityand ease integration of systems, metadata schemes should be basedon common standards and frameworks such as Dublin Core [4].However, general-purpose metadata frameworks may be adaptedfor particular domains and applications by specifying additionalconstraints and extensions.

A trade-off exists between the level of semantic richness andease of metadata creation. If annotating objects requires largeamounts of manual work and the results and benefits of meta-data production are not clearly visible, less metadata will likelybe produced, or shortcuts will be taken that compromise metadataquality [8]. Metadata schemas should focus on effective annota-tions with an appropriate breadth and expressivity that does nothinder annotation while still giving benefits for the users who relyon the metadata.

We designed the HealthFinland metadata schema using theDublin Core Metadata Element Set and the Finnish JHS143 recom-mendation on document metadata [20] as starting points, choosingonly fields that were relevant in the domain of health information.The schema was extended with a domain-specific Genre field thatdescribes health information content types such as article, guide,news item and campaign. The original schema has been publishedin [18], and a detailed metadata specification for system implemen-tors is available in [33].

The metadata in HealthFinland is represented using RDF, con-forming to the recommendations for expressing Dublin Core in RDF[6]. A subset of the metadata can also be embedded in (X)HTMLpages using META and LINK elements based on the Dublin Corerecommendation [5].

4.1. Metadata production tools

Creating good quality metadata requires suitable tools thatassist content creators and annotators in the metadata produc-tion process. Many content producers use a content managementsystem (CMS) to produce and publish Web documents. Adding sup-port for metadata production using the HealthFinland metadataschema into such systems makes it easy for content authors toproduce metadata and keep it synchronized with updates in thecontent. In such a setting many metadata fields such as publicationand modification dates can also be automatically assigned values.

To make it easier to assign subjects and other concept valuesfrom the vocabularies used in theHealthFinlandmetadata schema,we have developed the ONKI Selector Widget (Fig. 3) [40]. It is anAJAX-enabled component that can easily be integrated into anyHTML form and allows the selection of concepts from any vocab-ulary available on the ONKI server. The concepts picked using thecomponent are stored in an HTML form field like any other meta-data. This way, the content management system does not need tobe aware of the vocabularies, and the newest vocabulary versions

are always directly available from the ONKI server.

To publish the metadata for use by the HealthFinland system,the content management system must be configured to expose itsmetadata either embedded into individual HTML pages or as RDFdata retrievable using HTTP.

O. Suominen et al. / Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 287–297 291

F Widga

mtcemtSa

4

cHvFHah

cuHmotdafP

4

wodub

4.4. Experiences and feedback

To gain some understanding into how the initial content cre-ation infrastructure works from the content publishers’ point of

ig. 3. Finnish Heart Association’s metadata production tool using ONKI Selectorssociation.

Not all content publishers use a CMS, and even if they do, addingetadata support to Web publishing systems may not always be

echnically or economically feasible. To create metadata for suchontent sources, we have developed the browser-based metadataditor SAHA [37] that can be configured with the HealthFinlandetadata schema. It was used in HealthFinland for manual anno-

ation of health information. SAHA uses the above mentioned ONKIelector Widget for supporting concept lookup from vocabulariesvailable on the ONKI server.

.2. Content publishers

Eight Finnish health organizations—the Finnish Diabetes Asso-iation, the Finnish Medical Society Duodecim, National Publicealth Institute, Social and Welfare Services of Oulu, Savonia Uni-ersity of Applied Sciences, Finnish Centre for Health Promotion,innish Institute of Occupational Health, and UKK Institute forealth Promotion Research—contributed to the health contentsvailable in the portal protype. The system contains almost 3500ealth content objects.

Depending on the organization and content, the content wasreated in different ways: two organizations used embedded mark-p conforming to the HealthFinland metadata scheme on theirTML pages; two organizations provided their content in XML for-ats that were then transformed into RDF by special transformers;

ne organization provided a relational database that was similarlyransformed into RDF; and three organizations provided the meta-ata directly in the specified RDF format using the SAHA editorttached to ONKI services. The XML and database sources used dif-erent subject vocabularies that had to be mapped to the Healthromotion Ontology before the transformation could be completed.

.3. Metadata aggregation and quality control

Using heterogeneous data sources presents some challenges

ith regard to data validity and quality control. We have devel-

ped an aggregation and validation tool that collects metadata fromifferent sources and processes it into a well-defined format forse in the HealthFinland portal. The metadata is collected eithery crawling the Web sites of content publishers or by retrieving

et. The widget has been integrated into the Navigo portal software used by the

metadata published in RDF format. Metadata expressed as part ofHTML documents is syntactically transformed into RDF, and anyconcepts expressed using concept labels are converted into URI ref-erences using Web Service facilities for concept lookups providedby the ONKI server. Finally, the aggregator validates the metadataagainst the HealthFinland metadata schema, and produces out-put in an RDF format that is guaranteed to conform strictly to theHealthFinlandmetadata schema. Any problems encountered dur-ing the aggregation and validation process are reported back to thepublisher by creating a problem report, shown in Fig. 4, which isimplemented as a set of browsable static HTML pages.

Fig. 4. Metadata feedback report.

292 O. Suominen et al. / Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 287–297

Table 1HealthFinlandmetadata schema. Required fields are marked in bold.

Name Qualified name Cardinality Value type Value range

Identifier dc:identifier 1 URILocator ts:url 0..1 URLTitle dc:title 1 Literal Non-empty stringDescription dc:description 1 Literal Non-empty stringLanguage dc:language 1 Typed literal RFC 4646

Publication time dct:issued 1 Typed literal W3CDTF (ISO 8601)Acceptance time dct:dateAccepted 0..1 Typed literal W3CDTF (ISO 8601)Modification time dct:modified 0..1 Typed literal W3CDTF (ISO 8601)

Publisher dc:publisher 1..* Instance foaf:OrganizationCreator dc:creator 0..* Instance foaf:AgentRights dc:rights 0..* Literal Non-empty stringNoIndex ts:noindex 0..1 Typed literal Boolean value

Genre ts:genre 1 Concept Genre classificationPresentation type dc:type 1 Concept DCMI type vocabularyFormat dc:format 1 Typed literal IANA MIME types

**

vmaoapm

aorbc

itepmt

qoee

tmceg

4

clutiTtcw

Subject dc:subject 1..Keyword ts:keyword 0..Audience dct:audience 1

iew, we collected user feedback from annotators after severalonths of use. The feedback was collected using an e-mail survey

nd a meeting between the content editors where the experiencesf different content producers was discussed and summarized intolist of notable problems and possible improvements. Overall, therocesses and tools were considered functional, but many improve-ents were also suggested.The annotators suggested that the metadata schema and related

nnotation tools could be improved by removing or hiding someptional fields (e.g., Coverage) that were seldom used. They alsoequested tools that could help them avoid repetitive work, e.g.,y using previous annotations as templates for annotating newontent.

The use of three different complementary subject vocabular-es was considered somewhat impractical as it forced annotatorso look up concepts in each vocabulary. The annotators had alsoncountered situations where a concept they needed was notresent in any of the available vocabularies, and requested someeans to add new necessary concepts into the vocabularies during

he annotation work.The feedback reports were considered a useful tool for metadata

uality control, but the details shown in the initial reports wereften overwhelming. A way to suppress warnings about acknowl-dged but unimportant problems, such as systematic syntacticrrors, was requested.

The content publishers also requested a way to suppress con-ent from appearing in theHealthFinland portal even though valid

etadata had been created. This was necessary for situations whereontent had been produced that was useful for the intended audi-nce of the originating website but not relevant or useful to theeneral public.

.5. Refined content creation process and metadata schema

Based on the feedback, we made several improvements to theontent creation process. The usability of SAHA and the prob-em reports has been enhanced in many ways, such as fixingser interface glitches and removing unimportant problems fromhe reports. The metadata schema has been updated, remov-

ng many optional fields that were not widely used in practice.he new schema allows the direct use of the Health Promo-ion Ontology as a subject vocabulary. The content publishersan thus avoid using all the other subject vocabularies excepthen they consider them necessary due to legacy metadata or

Concept Subject vocabulariesLiteral Non-empty stringConcept Audience classification

requirements of direct semantic interoperability towards thirdparty systems. However, as the Health Promotion Ontology con-tains mappings to the original source vocabularies, expressed usingthe SKOS mapping properties, indirect semantic interoperabilitywith systems using the original vocabularies is possible regard-less of which subject vocabularies the content publishers decide touse.

The new metadata schema, shown in Table 1, has been improvedby removing fields that were not considered useful. Some fields,such as Spatial and Temporal coverage, Part of relations, Referencesand Translations, had not been used in practice by annotators. TheMedia type field was intended to be used for distinguishing betweenWeb content and other publications such as printed books and CDs,but it was dropped as the scope of the portal search functions wasrestricted to Web content. The Source field was sometimes used inthe metadata, but its usage was not consistent. The inconsistenciescaused some technical and presentation issues in the portal userinterface, so the field was dropped from the metadata schema andpublishers were instead advised to present sources in the contentitself.

The definition of some metadata fields were also made stricter.In the original schema, documents could be categorized into mul-tiple Genres, Audiences, Presentation types and Languages. Afteranalyzing the situations where multiple categorizations for thesefields had been used, we found that in most cases the content beingdescribed was not suitable for inclusion in the portal. For example,when the metadata for a document claimed that it was intendedfor multiple audiences, in reality it was typically not written foranyone in particular and thus would not be very helpful for anypractical task faced by the users of the portal. Thus, we decidedto allow a single document to be placed in only one category foreach of these fields. Problematic documents that cannot obviouslybe classified into a single category will either be classified intothe most relevant category, reworked, or dropped from the por-tal entirely. As a side benefit, this restriction also simplified theimplementation of the portal user interface and the forms used forannotation.

Some metadata fields were also added to the schema at therequest of annotators. A literal field for Keywords was added with

the intent that annotators can use it to describe subjects that cannotbe found in any of the available subject vocabularies. Frequentlyoccurring keywords can then be considered candidates for inclu-sion as concepts in the Health Promotion Ontology. Additionally, afield for preventing the inclusion of content into theHealthFinland

es and

pcl

aDHdawHdfcoac

5

uitrfv

sfsmfit

svIts

bl

O. Suominen et al. / Web Semantics: Science, Servic

ortal (NoIndex) was added, to give content publishers a degree ofontrol over the publishing of their content through theHealthFin-and portal.

The specification for representing metadata on Web pages haslso been updated to reflect the updated recommendations of theublin Core workgroup [5]. The representation of metadata onTML pages now includes a reference to the Dublin Core meta-ata profile that, in principle, allows the metadata to be parsednd transformed into a valid RDF/XML representation by any soft-are agent that supports the W3C GRDDL recommendation [43].owever, our content aggregator still uses a custom HTML meta-ata parser as it is more robust than typical GRDDL parsers in theace of syntactical problems in the HTML code and can also pro-ess legacy representations of HealthFinland metadata based onur older specifications. Our parser can also make domain-specificssumptions and corrections that a general-purpose GRDDL agentannot be expected to make.

. Semantic portal

In order to fulfil its purpose, a citizens’ health portal must besable from the perspective of ordinary citizens. The usability of an

nformation portal is affected by, e.g., the layout and functionali-ies of the user interface. In a semantic portal used for informationetrieval, additional factors that affect usability include the searchacilities available and the user-visible parts of ontologies and otherocabularies used in the portal.

The faceted search paradigm [16] was chosen as the basis for theearch facilities in theHealthFinlandportal, as it had been success-ully applied in other semantic search systems [9,15]. In facetedearch, users navigate and make search queries in the portal byaking category selections along the available facets. However,

nding a useful and intuitive set of facets and categorizations inhe domain of health information was a challenge.

In many faceted search systems, the facets are directly con-tructed using the values found in metadata fields and theocabularies that define the sets of possible field values [9,27,30].n some systems, such as MuseumFinland, logical rules are used

o define additional facets that do not directly correspond to thetructure of the ontology [15].

In HealthFinland, the main subject matter vocabularies haveeen created for and by professionals such as doctors, nurses and

ibrarians for use at work or for scientific purposes. Their struc-

Fig. 5. HealthFinland portal wit

Agents on the World Wide Web 7 (2009) 287–297 293

ture and choice of terms are not always appropriate for the generalpublic and typical tasks and goals related to personal health, so afaceted navigation system directly based on the vocabularies wouldlikely not be very useful for users of a portal devoted to personalhealth issues. Instead, we constructed new, citizen-centric facetsand mapped these to the underlying ontologies. The end-user facetswere created using a user-centered card sorting method [32]. In thesearch facets, the Subject metadata field is represented by the fourindependent facets Topic, Life event, Group of people and Body part.The additional facets Document type (genre), Publisher and Audiencewere created by using the values of the corresponding metadatafields directly.

5.1. User interface design and implementation

The user interface of the HealthFinland portal was designedusing a design process inspired by the discount usability engineer-ing philosophy [24,21] that stresses the application of lightweightusability evaluations performed between incremental designupdates. The design was based on experiences with earlier por-tals and search systems, lightweight design methods such as paperprototypes, and evaluation using heuristics and opportunistic usertesting. Our evaluation was focused on a qualitative approach as wetried to find problems early on and thus guide the design process.

The first user interface sketches were drawn on paper and wemade extensive use of design pattern collections such as [35] andearlier faceted search systems. The presentation of search resultswas designed to show rich metadata that helps users quickly findrelevant items and filter out non-relevant results. The design ofthe search results listing was based on research by Crystal andGreenberg that evaluated the usefulness of metadata for healthinformation seekers [2]. The display of facets and search resultsis shown in Fig. 5.

In addition to faceted search, the portal provides recommenda-tion links based on ontological knowledge (e.g. “smoking is a riskfactor for lung cancer”) and an alphabetical index of concepts.

The user interface is multilingual and supports Finnish, Swedishand English. Cross-language queries are supported, so that, e.g.,

an English user interface and English-language categories may beused to perform searches for the Finnish-language content items,demonstrating the potential of multilingual ontologies. However,not all facet categories and ontological concepts have been trans-lated into all three languages.

h facets and search results.

294 O. Suominen et al. / Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 287–297

F theHea

awpsct[

nafw

5

Toiji

wtbtjss

uwtcbilto

ig. 6. AJAX-based Semantic Web Widget showing dynamic context-based links tosmall amount of JavaScript code into the HTML page.

In addition to serving end-users directly, the portal also providesfloatlet (Fig. 6), i.e., an AJAX-based Semantic Web Widget that can,ith a small amount of JavaScript code, be incorporated into otherortals to display related content items from the HealthFinlandystem. This way, the portal contents are exported for machine pro-essing and mash-up applications. The implementation is similar tohe related museum item display described in an earlier publication42].

The portal was implemented as a Java Servlet application run-ing on Apache Tomcat. It was built using the Tapestry frameworknd uses Jena for RDF functionality. Search and recommendationunctionality was implemented using the Lucene search engine,hich we enhanced to handle category and concept queries.

.2. Evaluation

The portal application was evaluated with a series of user tests.he first user interface mock-ups were presented to potential usersf the portal and the test subjects were asked to describe the usernterface elements they saw. The mock-ups were refined until sub-ects were able to understand the purpose and function of the usernterface.

The first working implementation of the portal was evaluatedith a two-phase usability test. The test subjects represented the

arget audience of the portal, i.e., ordinary citizens with varyingackgrounds such as students of different disciplines, administra-ive staff at the Helsinki University of Technology and a freelanceournalist. A pilot test with two users was used to refine the testetup and the information retrieval tasks, expressed as realisticcenarios involving the test subjects.

In the second phase, six users completed four retrieval taskssing the portal. The tasks were expressed as realistic situationshere the users had to solve problems or seek advice related to

heir own personal health. For a task to be considered successfullyompleted, the user was required to come up with a solution backed

y information retrieved using the portal and to express being sat-

sfied with the solution. Each task was successfully completed by ateast three users, with an average task success rate of 70%. Typicalask completion times were 3–10 min depending on the difficultyf the task. Users started each task from the portal front page that

althFinland content. The widget has been integrated into another portal by adding

shows the top level of the topic hierarchies, constructed using auser-centered card sorting process [32]. The vast majority of thefirst actions of test subjects took them closer to their goal, indi-cating that the front page design was successful and thus that thetopic structure was not confusing or misleading at this level. How-ever, deeper in the topic hierarchy the subdivision of topics wasoften confusing or overwhelming, as it reflected the structure ofthe underlying ontology. Most users did not make use of secondaryquery facets or recommendation links between documents, com-menting that they didn’t seem to help them achieve their task goals.The alphabetical index was seldom used, but was useful for theusers who tried it.

A post-test questionnaire of subjective usability was used to cal-culate a System Usability Scale (SUS) [1] score. The scale representsthe subjective usability of a system on a scale of 0 (worst) to 100(best) points. The calculated SUS score of the portal was 70.8 points,representing the finding that most users considered the usabilityof the portal to be generally good.

5.3. Refined user interface

Based on the user tests and feedback from content publishers,we made several changes to the user interface. The most significantchange was the separation of search and browsing facilities, whichhad been combined into one interface in the prototype portal. Inthe new portal, both the browsing and search user interfaces arebased on the categorization of documents by Section (audience),Topic, Situation, Genre and Publisher. Compared with the prototypeportal, we have dropped the facets for Group of people, Body part andLife event, which were not considered very useful in our usabilitytests, and added the Situation facet, which partly serves the samepurpose as the Life event facet but is specific to particular topics.

The section (audience), situations, genre and publisher for adocument are directly described in the metadata. The topic of adocument is indirectly inferred based on a mapping between the

citizen-centered portal topics and subject concepts described in theHealth Promotion Ontology, as in the prototype portal.

The portal has been split into top level Sections that correspondto the target audiences defined in the Audience Classification.Within each section, the content is organized into browsable Top-

O. Suominen et al. / Web Semantics: Science, Services and Agents on the World Wide Web 7 (2009) 287–297 295

Fig. 7. Browsing interface that is dynamically constructed from faceted metadata.

be re

iciHes

Fig. 8. Search interface, allowing queries to

cs, such as the Exercise topic shown in Fig. 7, and each topic mayontain content areas devoted to specific Situations. The browsing

nterface resembles many traditional health portals such as BBCealth12 and Yahoo! Health,13 but the interface is dynamically gen-rated based on the data model that defines the sections, topics andituations which are referenced in the metadata.

12 http://www.bbc.co.uk/health/13 http://health.yahoo.com

stricted by five facets as well as keywords.

The search facility is shown in Fig. 8. Searches are usually initi-ated by entering keywords, but can be further constrained by thefive available search facets.

6. Related work

The idea of theHealthFinlandportal architecture and ontology-based faceted search was inspired by earlier semantic portals suchas SWED [28], the MultimediaN E-culture demonstrator [31] andMuseumFinland [15]. An early approach of creating and using an

2 es and

oupccsa

asem

dctmiehiu[

oapr[cb

TdHtla

7

isdttctatiplS

d

96 O. Suominen et al. / Web Semantics: Science, Servic

ntology infrastructure is presented in [22]. The infrastructurenderlying HealthFinland was developed within the FinnONTOroject [19] where HealthFinland has been a major applicationase, in addition to the analogous CultureSampo system [14] forultural heritage content. Both systems are based on distributedemantic content creation, centralized ontology library services,nd a portal supporting semantic search and recommendation [12].

Vocabulary and ontology development in the health domain haslong tradition and has produced very large and comprehensive

ystems such as MeSH, UMLS14 and SNOMED CT.15 In our work,xisting vocabularies are used with additional mutual ontologyappings [7] and mappings to a general vocabulary (YSO).The pitfalls of metadata about Web resources have been well

ocumented [8]. We have tried to avoid some of the problems byreating tools that make vocabularies easily available for anno-ators and to report problems back to the source. Transforming

etadata from literal sources and databases has been discussed e.g.n [25]. In our case manual annotation using the online metadataditor SAHA together with ontology services is also used. While weave not performed rigorous evaluations of different annotation

nterfaces, our own experiences are consistent with a recent end-ser study of manual metadata creation using ontology services10].

Faceted search, also called view-based search or dynamic tax-nomies in IR research, has been studied by various authors forlong time [27,9,30]. Several semantic portals are based on the

aradigm, both domain-specific systems such as the SWED envi-onment portal [28] as well as generic search systems, e.g., /facet11]. Separating end-user facets from annotation ontologies is dis-ussed in [13]. The automatic construction of a browsing interfaceased on faceted metadata has been discussed in [3].

Several health portals share important aspects with our portal.he HONsearch system,16 published by the Health on the Net Foun-ation, allows the searching of health Web sites accredited with theONcode quality seal using keywords as well as target audiences

o restrict the search. The commercial Healia search engine17 simi-arly allows searches for health websites to be restricted by gender,ge and heritage.

. Conclusions

The HealthFinland system demonstrates how shared mean-ngs, expressed using ontologies, can be used to bind togetheryntactically and semantically heterogeneous content sources fromifferent publishers. The content creation, validation and aggrega-ion infrastructure, including ontology services and tools, enableshe collaborative publication of health content and reduces dupli-ate work. From an end-user perspective, the underlying semanticechnology enables a citizen-centered faceted search user interfaces well as more traditional services such as keyword search andopical index. All the tools and services discussed above have beenmplemented and tested in a real world setting, and the prototype

ortal has been published on the Web in 2008.18 The HealthFin-and system was awarded the third prize at the Open Track of theemantic Web Challenge 2008 competition.19

The National Institute for Health and Welfare is building a pro-uction version of theHealthFinlandportal. The first public version

14 http://www.nlm.nih.gov/research/umls/15 http://www.ihtsdo.org/snomed-ct/16 http://www.hon.ch/HONsearch/Patients/17 http://www.healia.com18 http://demo.seco.tkk.fi/tervesuomi/19 http://challenge.semanticweb.org

Agents on the World Wide Web 7 (2009) 287–297

of the portal20 was published in May 2009. The portal utilizes thesame content creation infrastructure discussed above, which hasbeen integrated into the Alfresco content management system. Theuser interface is implemented using Java portlet technology andbuilt using Liferay portal software.

8. Future work

Our ongoing research is focused on building personalizationservices into theHealthFinlandportal, such as push e-mail and rec-ommendation links based on user profile information. The portalwill also be expanded with a registry of health services that will besemantically interlinked with the information resources currentlyin the portal.

Creating and maintaining the current portal, the content pro-duction infrastructure, vocabularies and metadata, requires arelatively large amount of manual work. While vocabulary mainte-nance is a necessary burden for most semantic applications, we aredeveloping tools and processes that facilitate collaborative ontol-ogy development and will apply these to the maintenance of thevocabularies used in HealthFinland.

Manual annotation work could further be reduced usingsemiautomatic annotation tools [39] to suggest, e.g., documentlanguages and subjects based on text analysis. We are currentlyinvestigating the use of automatic categorization and topic extrac-tion for dynamic content—such as news feeds and discussion forumpostings—that could then be integrated as content sources into theportal with little or no manual effort.

The situations referenced by documents in the portal are cur-rently only defined within the content repository of the portal. Inthe future they may be published as a controlled vocabulary on theONKI server so that they can be referenced by content producersand third parties.

While some HealthFinland content publishers already exposetheir metadata, mainly through HTML meta tags, the aggregatedand validated RDF data that the portal relies on is not currentlyrepublished back to the Web. In the future the portal will beconfigured to expose at least some of its metadata. We are alsoinvestigating ways to expose the portal search results in machine-processable formats such as RSS feeds, in addition to the AJAXfloatlets that we already provide.

Acknowledgements

This work is part of the national Semantic Web ontology projectFinnONTO21 2003–2010, funded by the National Funding Agencyfor Technology and Innovation (Tekes) a consortium of 38 compa-nies and public organizations, and the Ministry of Social Affairs andHealth. The HealthFinland project is co-ordinated by the NationalInstitute for Health and Welfare in Finland. We thank JohannaEerola, Marko Ruohoranta, Matti Sarjakoski, Katri Seppälä and JouniTuominen for their input to the work reported in this paper.

References

[1] J. Brooke, SUS: a ‘quick and dirty’ usability scale, in: Usability Evaluation inIndustry, Taylor & Francis, London, 1996, pp. 189–194.

[2] A. Crystal, J. Greenberg, Relevance criteria identified by health informationusers during web searches, Journal of the American Society for Information

Science and Technology 57 (10) (2006) 1368–1382.

[3] A. Crystal, Facets are fundamental: rethinking information architecture frame-works, Technical Communication 54 (1) (2007) 15–27.

[4] Dublin Core Workgroup, Dublin core metadata element set, v. 1.1.http://dublincore.org/documents/dces/, 2008.

20 http://www.tervesuomi.fi21 http://www.seco.tkk.fi/projects/finnonto/

es and

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[[

[

[

[

[

[

[

[

[[

[

[

[

[

[

[creation and publication for cultural heritage legacy systems, in: Proceedings

O. Suominen et al. / Web Semantics: Science, Servic

[5] Dublin Core Workgroup, Expressing Dublin Core metadata using HTML/XHTMLmeta and link elements. http://dublincore.org/documents/dc-html/, 2008.

[6] Dublin Core Workgroup, Expressing Dublin Core metadata using the ResourceDescription Framework (RDF). http://dublincore.org/documents/dc-rdf/, 2008.

[7] J. Euzenat, P. Shvaiko, Ontology matching, Springer-Verlag, Berlin, 2007.[8] D. Hawking, J. Zobel, Does topic metadata help with web search? Journal of

the American Society for Information Science and Technology 58 (5) (2007)613–628.

[9] M. Hearst, A. Elliott, J. English, R. Sinha, K. Swearingen, K.-P. Lee, Finding theflow in web site search, Communications of the ACM 45 (9) (2002) 42–49.

10] M. Hildebrand, J. van Ossenbruggen, L. Hardman, G. Jacobs, Supporting sub-ject matter annotation using heterogeneous thesauri, a user study in web datareuse., Tech. Re INS-E0902, CWI, Amsterdam, 2009.

11] M. Hildebrand, J. van Ossenbruggen, L. Hardman, /facet: a browser for hetero-geneous Semantic Web repositories, in: Proceedings of ISWC 2006, Athens, GA,USA, Springer-Verlag, Berlin, 2006.

12] M. Hildebrand, J. van Ossenbruggen, L. Hardman, An analysis of search-baseduser interaction on the Semantic Web, Tech. Re INS-E0706, CWI, Amsterdam,2007.

13] M. Holi, E. Hyvönen, Fuzzy view-based semantic search, in: Proceedings ofASWC 2006, Beijing, China, Springer-Verlag, Berlin, 2006.

14] E. Hyvönen, E. Mäkelä, T. Kauppinen, O. Alm, J. Kurki, T. Ruotsalo, K. Sep-pälä, J. Takala, K. Puputti, H. Kuittinen, K. Viljanen, J. Tuominen, T. Palonen,M. Frosterus, R. Sinkkilä, P. Paakkarinen, J. Laitio, K. Nyberg, CultureSampo– Finnish culture on the Semantic Web 2.0. Thematic perspectives for theenduser, in: Proceedings of Museums and the Web 2009 (MW2009), Archives& Museum Informatics, Toronto, Canada, 2009.

15] E. Hyvönen, E. Mäkela, M. Salminen, A. Valo, K. Viljanen, S. Saarela, M. Junnila,S. Kettula, MuseumFinland – Finnish museums on the Semantic Web, Journalof Web Semantics 3 (2) (2005) 224–241.

16] E. Hyvönen, S. Saarela, K. Viljanen, Application of ontology techniques to view-based semantic search and browsing, in: Proceedings of ESWS 2004, Heraklion,Greece, Springer-Verlag, Berlin, 2004.

17] E. Hyvönen, K. Viljanen, O. Suominen, E. Hukka, HealthFinland—publishinghealth promotion information on the Semantic Web, in: Proceedings of DrMED2008: International Workshop on Describing Medical Web Resources. The 21stInternational Congress on the European Federation for Medical Informatics(MIE 2008), Göteborg, Sweden, 2008.

18] E. Hyvönen, K. Viljanen, O. Suominen, HealthFinland—Finnish health informa-tion on the Semantic Web, in: Proceedings of ISWC 2007 + ASWC 2007, Busan,Korea, Springer-Verlag, Berlin, 2007.

19] E. Hyvönen, K. Viljanen, J. Tuominen, K. Seppälä, Building a national SemanticWeb ontology and ontology service infrastructure—the FinnONTO approach,in: Proceedings of ESWC 2008, Tenerife, Spain, Springer-Verlag, Berlin, 2008.

20] JHS workgroup, JHS 143: Asiakirjojen kuvailun ja hallinnan metatiedot.http://www.jhs-suositukset.fi/suomi/jhs143, 2004.

21] S. Krug, Don’t Make Me Think: A Common Sense Approach to Web Usability,2nd ed., New Riders Press, 2005.

22] A. Maedche, B. Motik, L. Stojanomic, R. Studer, R. Volz, An infrastructurefor searching, reusing and evolving distributed ontologies, in: Proceedings ofWWW 2003, Budapest, Hungary, ACM Press, New York, 2003.

23] E. Mäkelä, K. Viljanen, O. Alm, J. Tuominen, O. Valkeapää, T. Kauppinen, J. Kurki,

R. Sinkkilä, T. Känsälä, R. Lindroos, O. Suominen, T. Ruotsalo, E. Hyvönen, Pro-ceedings of First Industrial Results of Semantic Technologies, co-located withISWC 2007 + ASWC 2007, cEUR Workshop Proc., vol. 293, Busan, Korea, 2007.

24] J. Nielsen, Guerrilla HCI: using discount usability engineering to penetrate theintimidation barrier, Cost-justifying Usability, Academic Press, Inc., Orlando,FL, USA, 1994, pp. 245–272.

[

Agents on the World Wide Web 7 (2009) 287–297 297

25] B. Omelayenko, Porting cultural repositories to the semantic web, in: Proceed-ings of the 1st Workshop on Semantic Interoperability in the European DigitalLibrary (SIEDL-2008), 2008.

26] Pew Internet & American Life Project, Online Health Search, 2006.27] A.S. Pollitt, The key role of classification and indexing in view-based

searching, Tech. rep., University of Huddersfield, UK. http://www.ifla.org/IV/ifla63/63polst.pdf, 1998.

28] D. Reynolds, P. Shabajee, S. Cayzer, Semantic Information Portals, in: Proceed-ings of WWW 2004, Alternate track papers & posters, ACM Press, New York,New York, 2004.

29] M.A. Rosso, User-based identification of web genres, Journal of the AmericanSociety for Information Science and Technology 59 (7) (2008) 1053–1072.

30] M. Schraefel, M. Karam, S. Zhao, mSpace: interaction design for user-determined, adaptable domain exploration in hypermedia, in: Proceedings ofAH 2003: Workshop on Adaptive Hypermedia and Adaptive Web Based Sys-tems, 2003.

31] A. Schreiber, A. Amin, M. van Assem, V. de Boer, L. Hardman, M. Hildebrand, L.Hollink, Z. Huang, J. Kersen, M. Niet, B. Omelayenko, J. Ossenbruggen, R. Siebes,J. Taekema, J. Wielemaker, B. Wielinga, MultimediaN E-Culture demonstrator,in: Proceedings of ISWC 2006, Athens, GA, Springer–Verlag, Berlin, 2006.

32] O. Suominen, K. Viljanen, E. Hyvönen, User-centric faceted search for semanticportals, in: Proceedings of ESWC 2007, Innsbruck, Austria, Springer–Verlag,Berlin, 2007.

33] O. Suominen, K. Viljanen, E. Hyvönen, TerveSuomi-portaalin metatietomäärit-tely (HealthFinland portal metadata specification), v. 2.0, Tech. rep., SemanticComputing Research Group, SW 2.0 project, 2009.

34] O. Suominen, Käyttäjäkeskeinen moninäkymähaku semanttisessa portaalissa(user-centric faceted search in a semantic portal), Master’s thesis, Universityof Helsinki, Dept. of Computer Science, February 2008.

35] J. Tidwell, Designing Interfaces, O’Reilly & Assoc, Sebastopol, California, 2005.36] J. Tuominen, M. Frosterus, K. Viljanen, E. Hyvönen, ONKI SKOS server

for publishing and utilizing SKOS vocabularies and ontologies as services,in: Proceedings of ESWC 2009, Heraklion, Greece, Springer–Verlag, Berlin,2009.

37] O. Valkeapää, O. Alm, E. Hyvönen, Efficient content creation on the semanticweb using metadata schemas with domain ontology services, in: Proceedingsof ESWC 2007, Innsbruck, Austria, Springer–Verlag, Berlin, 2007.

38] M. van Assem, V. Malaise, A. Miles, G. Schreiber, A method to convert thesaurito SKOS, in: Proceedings of ESWC 2006, Lecture Notes in Computer Science,Budva, Montenegro, 2006.

39] A. Vehviläinen, E. Hyvönen, O. Alm, A semi-automatic semantic annotationand authoring tool for a library help desk service, in: Emerging Technologiesfor Semantic Work Environments: Techniques, Methods, and Applications, IGIGroup, Hershey, USA, 2008.

40] K. Viljanen, J. Tuominen, E. Hyvönen, Publishing and using ontologies as mash-up services, in: Proceedings of the 4th Workshop on Scripting for the SemanticWeb (SFSW2008) at ESWC, 2008.

41] K. Viljanen, J. Tuominen, E. Hyvönen, Ontology libraries for production use: TheFinnish ontology library service ONKI, in: Proceedings of ESWC 2009, Heraklion,Greece, Springer–Verlag, Berlin, 2009.

42] K. Viljanen, J. Tuominen, T. Känsälä, E. Hyvönen, Distributed semantic content

of the 2008 IEEE International Conference on Distibuted Human-Machine Sys-tems, Athens, Greece, IEEE Press, 2008.

43] W3C, Gleaning resource descriptions from dialects of languages (GRDDL), W3CRecommendation, 11 September 2007. http://www.w3.org/TR/2007/REC-grddl-20070911/, 2007.