Health search engine with e-document analysis for reliable search results

13
International Journal of Medical Informatics (2006) 75, 73—85 Health search engine with e-document analysis for reliable search results Arnaud Gaudinat a,, Patrick Ruch b , Michel Joubert c , Philippe Uziel d , Anne Strauss h , Mich` ele Thonnet e , Robert Baud b , St´ ephane Spahni f , Patrick Weber f , Juan Bonal g , Celia Boyer a , Marius Fieschi c , Antoine Geissbuhler a,b a Health on the Net Foundation, Geneva, Switzerland b SIM, University Hospital of Geneva, Switzerland c LERTIM, Timone Hospital, Marseille, France d XR partner, Paris, France e MISS, Paris, France f NICE Computing, Lausanne, Switzerland g THALES Information System, Paris, France h Institut de Recherche pour le D´ eveloppement (IRD), Paris, France KEYWORDS Web Search engine; Trustworthy information; eHealth Summary Objective: After a review of the existing practical solution available to the citizen to retrieve eHealth document, the paper describes an original specialized search engine WRAPIN. Method: WRAPIN uses advanced cross lingual information retrieval technologies to check information quality by synthesizing medical concepts, conclusions and ref- erences contained in the health literature, to identify accurate, relevant sources. Thanks to MeSH terminology [1] (Medical Subject Headings from the U.S. National Library of Medicine) and advanced approaches such as conclusion extraction from structured document, reformulation of the query, WRAPIN offers to the user a privileged access to navigate through multilingual documents without language or medical prerequisites. Results: The results of an evaluation conducted on the WRAPIN prototype show that results of the WRAPIN search engine are perceived as informative 65% (59% for a general-purpose search engine), reliable and trustworthy 72% (41% for the other engine) by users. But it leaves room for improvement such as the increase of database coverage, the explanation of the original functionalities and an audience adaptability. Corresponding author. E-mail address: [email protected] (A. Gaudinat). 1386-5056/$ — see front matter © 2005 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.ijmedinf.2005.11.002

Transcript of Health search engine with e-document analysis for reliable search results

International Journal of Medical Informatics (2006) 75, 73—85

Health search engine with e-document analysis forreliable search results

Arnaud Gaudinata,∗, Patrick Ruchb, Michel Joubertc, Philippe Uzield,Anne Straussh, Michele Thonnete, Robert Baudb, Stephane Spahni f,Patrick Weber f, Juan Bonalg, Celia Boyera, Marius Fieschic,Antoine Geissbuhlera,b

a Health on the Net Foundation, Geneva, Switzerlandb SIM, University Hospital of Geneva, Switzerland

c LERTIM, Timone Hospital, Marseille, Franced XR partner, Paris, Francee MISS, Paris, Francef NICE Computing, Lausanne, Switzerlandg THALES Information System, Paris, Franceh Institut de Recherche pour le Developpement (IRD), Paris, France

KEYWORDSWeb Search engine;Trustworthyinformation;eHealth

SummaryObjective: After a review of the existing practical solution available to the citizento retrieve eHealth document, the paper describes an original specialized searchengine WRAPIN.Method: WRAPIN uses advanced cross lingual information retrieval technologies tocheck information quality by synthesizing medical concepts, conclusions and ref-erences contained in the health literature, to identify accurate, relevant sources.Thanks to MeSH terminology [1] (Medical Subject Headings from the U.S. NationalLibrary of Medicine) and advanced approaches such as conclusion extraction fromstructured document, reformulation of the query, WRAPIN offers to the user aprivileged access to navigate through multilingual documents without language ormedical prerequisites.Results: The results of an evaluation conducted on the WRAPIN prototype showthat results of the WRAPIN search engine are perceived as informative 65% (59%for a general-purpose search engine), reliable and trustworthy 72% (41% for theother engine) by users. But it leaves room for improvement such as the increase ofdatabase coverage, the explanation of the original functionalities and an audienceadaptability.

∗ Corresponding author.E-mail address: [email protected] (A. Gaudinat).

1386-5056/$ — see front matter © 2005 Elsevier Ireland Ltd. All rights reserved.doi:10.1016/j.ijmedinf.2005.11.002

74 A. Gaudinat et al.

Conclusion: Thanks to evaluation outcomes, WRAPIN is now in exploitation on theHON web site (http://www.healthonnet.org), free of charge. Intended to the citizenit is a good alternative to general-purpose search engines when the user looks up trust-worthy health and medical information or wants to check automatically a doubtfulcontent of a Web page.© 2005 Elsevier Ireland Ltd. All rights reserved.

1. Introduction

Since the Big Bang of the Worldwide Web overa decade ago, emanating from CERN, the hyper-text universe has been in constant expansion andaccelerating. Ubiquitous and pervasive yet eas-ily accessible, the Web is today’s oracle, with aninstant answer to any type of question. Like theall-encompassing library of Babel in the story byBorges [2], the Web can satisfy our every curiosity,yet it is a simple mirror of human thought, mixingthe exalted with the vile. The Web, with a thou-sand responses to every question, offers no clearanswer to the searcher, whose quest leads everdeeper into the vastness of undifferentiated knowl-

found a comfortable and lucrative home on theInternet.

The search engines have dealt remarkably wellwith the ever-increasing volume of information,with over 8 billion pages now indexed by Google.Less certain is the ability of the general searchengines to produce quality results as the databasesize increases. The spectacular success of Google[7] is probably due to its patented ‘‘PageRank’’algorithm Brin and co-workers [8] (or other algo-rithms based on page popularity), based on thenotion that a hyperlink from document A to doc-ument B implies that the authors of document Aconsider document B to be of value. It would be rea-sonable to assume that a page’s popularity wouldbe correlated to the quality of its content. A popu-lar site could ill afford to spread bad information,

links from reputed websites? Only an in-depth studywould reveal with certitude whether a relationshipexists between the number of inbound links and the

1.1. State of the art

Currently, most citizens and patients use generalpurpose search engines (80% according to Jansen[3] such as Google 48% or Yahoo 21.2% whensearching on the Web, for April 2005 accordingto Nielsen/Netrating [4]). According to Jansen [3],health and science occupy the 4th and 6th places,respectively, for the years studied (1997—2002 fordifferent search engines), showing the importanceof the health domain for the Internet citizen.

Other popular search tools include thematicdirectories such as Yahoo or the Open DirectoryProject (DMOZ). Specialized in the health domainare CISMeF [5] (only available in French) and HON-select [6] (in five languages) which present med-ical information arranged under the Medical Sub-ject Headings thesaurus (MeSH) of the U.S. NationalLibrary of Medicine. HONselect also offers advancedmultilingual features to facilitate comprehensionof web pages in languages other than those of theuser.

quality of web page content as judged by humanexperts.

Other practical approaches have been put forthbased on adherence to standard or selection by athird party. These include HON [11], AFGIS,1 WMA[12], and URAC [13]. The HONcode of Health onthe Net Foundation [14,15] is the unique exam-ple to have been deployed on a large scale, withover 5000 sites in 29 languages enrolled in a vol-untary accreditation program. The accreditationprocess is initiated by the site operator, who pre-pares the site for review by a HON medical reviewerwho checks for compliance with eight principles.Accreditation is free and remains in force as longas the site continues to pass an annual review.CISMeF and MEDLINEplus [16] are interesting initia-tives whereby librarians select quality resources,respectively, for scholar content and limited toFrench-language sites, and intended for patients

1 AFGIS: http://www.afgis.de/.

edge. The Web suffers from the overabundance ofinformation, and the highly variable quality of itscontent. On trivial matters, a myriad of sourcesanswer with a common voice, but it is more difficultto obtain an authoritative response to vital ques-tions in the field of health, where charlatans have

as shown by studies such as Amento et al. [9],and especially in the medical field by Borges etal. [10]. Are we therefore to conclude that searchengine results are indicative of quality? What aboutquality pages that lack ‘popularity’, such as newpages or those whose owner has not undertaken thepromotional efforts often needed to obtain back-

Health search engine with e-document analysis for reliable search results 75

Fig. 1 Functional schema of WRAPIN as a guide of this article.

covering English and Spanish languages web sites.Also, the MedCertain project has produced the HID-DEL [17] markup language allowing formalizationof a number of descriptive criteria for web pagecontent in great detail. The system may, however,be difficult to apply on a large scale due to theneed for web publishers to add the complex markupto each page. Human review remains necessarywith HIDDEL, to control misuse (or abuse) of thelanguage.

Numerous actors have thus contributed to theeffort to limit the potential for harm resultingfrom poor quality online health information. Var-ious approaches have been attempted but noneoffers a definitive solution, especially with regardto inexperienced Internet users, persons with alow degree of health literacy or those whosesearch is motivated by a medical crisis. Recentadvances in search ranking technology attemptto take account of topicalization, or theming,see Haveliwala [18]. This narrows the search topages for a relevant topic, but still does not dealwith the quality of the content itself. Generalpurpose search engines appear to be making lit-tle progress toward the analysis of informationqtdreisep

2. Materials and methods

In this section, we survey the set of methods andresources developed or simply used in the WRAPINproject.

2.1. Introduction to WRAPIN

To create an efficient search tool incorporatingbest practices for quality medical information, agroup of medical informatics/Internet experts fromHON, LERTIM, NICE and SIM proposed an innova-tive solution in the form of the EU-WRAPIN project(World Reliable Advice for Patients and Individuals).WRAPIN offers answers to many of the problemsdescribed above and was the fruit of multidisci-plinary collaboration with experts from organiza-tions including MISS and THALES IS. The main pur-pose of WRAPIN is to help users assess the credibilityof online medical information, using a referencebase constituted exclusively of trustworthy docu-ments.

To accomplish this WRAPIN needed advances fea-tures which will be described in the following para-graphs. The simplified functional diagram below(a

2

Tapc

uality. In the critical field of health informa-ion, it is preferable to guide users to restrictedomain search tools, which, while containing feweresources, are more likely to provide reliable, rel-vant results. Our experience has shown that qual-ty information is best delivered by user-centricearch tools that favor resources created to ben-fit the end users rather than the informationroviders.

Fig. 1) outlines the main WRAPIN functionalitiesnd their interactions.

.2. Medical trustworthy resources

he initial goal of WRAPIN was to make avail-ble, in the first place for patients, but also forrofessionals, a tool allowing the assessment of webontent quality for the medical field. In a general

76 A. Gaudinat et al.

way, WRAPIN represents an alternative to existingsearch engines, as it combines the best medicalWeb pages with other ‘hidden’ online documentsthat are not referenced by other search engines.A sample of information resources is presented,giving the user an appreciation of the depth andbreadth of scientific debate, agreement and con-troversy for a given subject, and a better under-standing of the relationships linking diverse ideasand actors. The choice of reference sources is crit-ical for WRAPIN and careful selection is required.Currently, eight sources are used by the system:PubMed of the U.S. National Library of Medicine(NLM), which counts over 15 million citations, Clini-calTrials.gov, also from NLM with the results of morethan 20,000 studies, MedHunt (HON), with some70,000 selected web pages, HONcodeHunt madeup of over 110,000 trustworthy webpages from theHONcode-compliant sites, BookShelf from NLM withover 24,000 pages from digitized medical referencetexts; HONNews with some 13,000 medical newsitems from HealthDay and HON; FDA (U.S. Food andDrug Administration) with a selection of more than1000 relevant pages from the US FDA; OESO, fromthe OESO Foundation, a collection of 1153 scientific

2.3. A large scale of inputs

The greatest innovative feature of WRAPIN is itsability to handle different types of query, especiallyentire web pages (specified by URL) as a query.Whereas, certain search tools can find relatedpages for a given page using a vector approach,WRAPIN analyses a page for the most importantmedical terms, performing a frequencies analysison MeSH terms found on the page. WRAPIN identi-fies keywords which are then used for: (1) weightedqueries to its indexes; (2) translations into lan-guages other than that of the initial query; and(3) serve to highlight the most important med-ical concepts dealt with by the document. Thisapproach is also applied to long texts or natu-ral language expressions that are submitted byusers as queries. This functionality opens up newpossibilities for online information searching com-pared with existing search tools (including thosein the medical field) which limit queries to justa few words. Theoretically — and practically — avoluminous document such as an academic the-sis could be used to query WRAPIN, if computa-tional time and resources are available. Applica-tsswipto(

uery

articles in the form of questions and answers fromthe field of esophageal disorders; and UroFrance,a database of over 2500 French-language articleson urology. By indexing these resources, WRAPINcovers an important range of medical subjects,merging scientific/technical documents with moreaccessible information destined for general readersincluding newly diagnosed patients and those withchronic illnesses as well as health professionals.

Fig. 2 WRAPIN interface (URL, short q

ions such as bibliographic research could be rea-onably envisaged in such cases. Alternatively, ahort query input is handled by standard means,ith the addition language translations and weight-

ng by frequency of medical terms. The figure belowresents the main interface of WRAPIN, showinghe first field for entry of a URL and the sec-nd permitting entry of an arbitrary block of textFig. 2).

or text with no limitation of length).

Health search engine with e-document analysis for reliable search results 77

2.4. MeSH mapper

The MeSH (Medical Subject Headings) of the U.S.National Library of Medicine is a thesaurus [1], ahierarchically arranged terminological resource forthe medical domain containing 22,997 descriptors.It was created to meet the need for indexing ofmedical literature. MEDLINE has proved its useful-ness over many years. The specialized librariansof MEDLINE have carefully selected MeSH termsfor over 15 million scientific articles, some ofwhich date back to the 1950s. The use of a the-saurus for indexing is not a new idea [19] and hasbeen widely applied for medical literature [20].The MeSH remains a precious resource when itcomes to manipulating multilingual medical textdata, or when performing queries expansion. AfterHONSelect, the first multilingual repertory basedon the MeSH hierarchy, and CISMeF, catalog ofFrench-language medical resources based on theMeSH, WRAPIN relies on automatic categorizationof queries and documents based on this nomencla-ture.

A first challenge is to efficiently extract MeSHterms from the analyzed documents. A greataibfrattnmc

extraction task [24]. Recently, this WRAPIN mod-ule (called, HonMeSHmapper system) participatedin an evaluation session with other French Languagemapper systems (conducted by Neveol et al. [25]),in which the results showed that HONMeSHMapperachieved the best overall F-measure.

The mapping of MeSH terms is crucial withinWRAPIN, as it is used throughout the system, forqueries as well as for documents. Mapping servesto categorize text using the most representativeMeSH headings, which are needed at the time ofindexing, searching, translation, scoring, in queryreformulation, and for formatting of the resultspage.

Table 1 presents an example of MeSH mapping(or research for key concepts) following analysisof a web page. The page in question has as itspurpose the sale of shark cartilage for the treat-ment of cancer. WRAPIN attempts to identify keyconcepts from the page in order to create a syn-thetic query to trustworthy databases. These keyconcepts are listed in order of decreasing impor-tance. The first column shows how the word (i.e.Entry Term or MeSH heading in case of a MeSHterm) appears within the text; a frequency is asso-c‘oektcclfa

ww.d

e

inhib

mount of research on concept recognition in med-cal text has already been done [19—22]. Theest known is probably the indexing initiative [21]rom the U.S. National Library of Medicine thatesulted in the well-known MetaMap system [22]nd the related citation algorithm [23], wherehe goal is to help or replace the human anno-ator of MEDLINE. For WRAPIN, we investigated aon-supervised approach based on a space vectorodel, which provided results as good as those that

an be obtained with MetaMap for the MeSH term

Table 1 MeSH mapping results for the URL ‘‘http://w

In text Frequency Key concept

Cartilage 306 Cartilage

Cancer 102 NeoplasmsTumor 36Tumors 36Cancers 3

Cells 111 CellsCell 54

Treatment 96 TherapeuticsTreated 30Treatments 12Therapeutic 9Treat 6

Shark cartilage 146 Shark cartilag

Angiogenesis inhibitors 42 Angiogenesis

iated with each word. Here however, the conceptshark cartilage’ is not a MeSH term, but becausef its frequent occurrence in the page is consid-red as relevant (WRAPIN is capable of identifyingey terms composed of up to three words). In thehird column, the terms are grouped by key con-epts (e.g. MeSH heading). To the right of each keyoncept, the type (MeSH or other) is listed, fol-owed by the cumulative frequency of occurrenceor the different forms of each concept and, finally,score based on the cumulative frequency and the

iscount-vitamins-herbs.net/shark-cartilage.htm’’

Type 4.1.1.1.1 Total 4.1.1.1.2 Score

MeSH 306 0.047

MeSH 177 0.027

MeSH 165 0.025

MeSH 153 0.023

Other 146 0.022

itors MeSH 42 0.012

78 A. Gaudinat et al.

type (medical or non-medical) and its inverse fre-quency (this method favors MeSH terms).

The work done during the WRAPIN project showsthat the UMLS (Unified Medical Language Systemof the U.S. National Library of Medicine) knowl-edge sources may contribute to a better indexingof medical documents by the use of MeSH terms[26]. Special attention has been focused on a (verylarge) piece of knowledge contained in the UMLSknowledge sources: co-occurrences between majorMeSH terms in the Medline literature [27]. Previ-ous work within the ARIANE project demonstrateda way to translate semantic relationships betweenconcepts into MeSH sub-headings [28]. The reverseis done in WRAPIN: translation of sub-headings asso-ciated with several terms by the UMLS, accord-ing to co-occurrence frequencies, into relationshipsbetween the concepts represented by the terms.The aim is to propose to the indexer possible seman-tic associations between concepts it has recognizedin analyzed texts. When associations are validatedaccording to this knowledge database, the indexeris able to refine its results.

UMLS is a project of the U.S. National Libraryof Medicine. UMLS has two main components: the

tionships. Among them, for instance, Diagnosesapplies on the two types, Diagnostic Procedure andPathologic Function, Treats applies on Therapeuticor Preventive Procedure and Pathologic Function.These semantic relationships are defined at sucha general level that is not always possible to mapa type onto its linked concepts by an automaticpertinent inheritance of the meaning that the rela-tionships convey: it is obvious that every diagnosticprocedure cannot be used to diagnose every patho-logic function. Nevertheless, since a Coronarogra-phy is linked to the type Diagnostic Procedure, itcan be used to diagnose a Coronary Arteriosclerosisthat is linked to the type Disease or Syndrome andthus to the type Pathologic Function. The SemanticNetwork provides today a framework for biomedi-cal concepts isolated into the Metathesaurus thatcan be considered as an operational ontology.

Our working hypothesis, which is then applied intreatments, is that significant associations betweenconcepts are those for which there exist semanticrelationships in the literature materialized by co-occurrences in the above UMLS knowledge source.The exploitation of this hypothesis is done onextracted MeSH terms and produces a ranking list ofccplseps

2

Taktfko

fpmaTmMfFcrI

Metathesaurus and the Semantic Network. TheMetathesaurus contains not only MeSH but alsothe most useful medical nomenclatures. The coreconcepts of the Metathesaurus are connected togeneric types of concepts in the Semantic Network.These types of concepts are interconnected bysemantic relationships [29]. The data structure ofthe Metathesaurus is based on hierarchies and asso-ciations. The association relationship links a giventerm to related terms and to a preferred term.The (pre-)order relationship structures the pre-ferred terms into more generic terms and more spe-cific ones. This later relation divides the thesaurusinto several so-called microthesauri, according toa local specificity. For example, the term Coro-nary Arteriosclerosis appears twice in the Metathe-saurus: firstly, as a process involved in coronarydiseases viewed as heart diseases, and secondly,as an arteriosclerosis localized into the coronaryarteries and causing arterial occlusive diseases. Thepresence of micro-thesauri translates the variouscontexts from which a same medical concept canbe viewed and, thus, the complexity of the medi-cal domain.

The Semantic Network associates types of medi-cal concepts with semantic relationships. The typesof concepts are organized in a hierarchy where,for instance, Physiologic Function and PathologicFunction are children of Biologic Function, and Dis-ease or Syndrome is a child of Pathologic Func-tion. There are about 30 different semantic rela-

andidate terms, which are those terms that bestharacterize a document. Two steps make up therocess: (1) for each pair of terms present in theist of terms, a cumulative weight of their relation-hips is computed, and (2) the weight affected toach term is then computed according to all theairs in which it is present. This approach has beenuccessfully experimented [30].

.5. Advanced search engine

he search engine is the hub of an application suchs WRAPIN. Pioneers like Salton and McGill [31]new how to use the nature of information itselfo create efficient models and technologies. Con-erences such as TREC [32] have helped further ournowledge of information searching in general andnline search engines in particular.

Within WRAPIN exist various types of documents;rom MEDLINE citations in XML format to HTMLages created by practicing physicians, there areany types in between, more or less structured,

ll of which need to be handled by one system.he goal is to get at the essence of each docu-ent, by the use of MeSH terms for indexing forEDLINE documents, analysis of formatting markup

or HTML, to identify key elements in the text.or reasons of efficiency and coherence in the cal-ulation of scores, all of the trustworthy medicalesources cited in this paper were indexed locally.nformation searching with WRAPIN is based on

Health search engine with e-document analysis for reliable search results 79

evidence developed over the past decades in thefield of classic information searching, as well asmore recent research into web-based informationsearching, and takes into account the specificitiesof the medical domain, making use of the numer-ous specialized resources available (terminological,semantic, etc.). Searching within WRAPIN use prin-ciples whereby: (1) the frequency of terms in eachdocument as well as the inverse frequency of doc-uments for each term (model known as TF-IDF);(2) MeSH are boosted; (3) synonyms of found MeSHterms are used; and (4) the spatial proximity of key-words is taken into account. The TF-IDF is appliedto the title, content and URL of the document.

2.6. Smart display results

Presentation of results is of the greatest importancefor search engines, many of which have chosena simple, sober design, while others have optedfor a complex interface which, while intriguing,may be daunting to new users. The inspiration ofthe WRAPIN interface comes from the simplicity ofclassic web search tools, with added functionalityappropriate to the medical domain. The most inter-embmtTsv

2sKLopat

key sentence from a MEDLINE abstract. Followingrecent developments in information retrieval [34]and machine learning [35], which show that con-clusions are the most content-bearing sentencesto perform related articles search and index prun-ing tasks in MEDLINE, we assume that conclusionsentences would be good candidates for such keysentences in scientific texts. Selecting argumenta-tive contents is formally a classification task: foreach piece of text the system will have to decidewhether it is a relevant conclusion or not. In textclassification tasks, two types of strategies are com-peting: expert-driven and data-driven approaches.While the former, which rely on a domain expert,are often time and labour-intensive, the latter aredirectly dependent on the availability of large train-ing sets. Fortunately, training data for our task canbe acquired in a cheap manner. Most abstracts inMEDLINE are unstructured (i.e. provided withoutexplicit argumentative markers, such as METHODS,PURPOSES, . . .), but fortunately, a significant frac-tion of these abstracts contain explicit argumenta-tive markers. Using PubMed and its Boolean queryinterface, we collected a set of 12,000 MEDLINEcitations containing strings such as ‘‘PURPOSE:’’,‘F

toqmatbsa(s8pm

stru

sting functionality is without a doubt the auto-atic identification of the conclusion, as proposedy Ruch et al. [33] for implicitly structured docu-ents, such as MEDLINE abstracts, to complement

he classic KWiC (KeyWord in Context) extraction.he basic hypothesis behind this is that the conclu-ion of a scientific article contains the most rele-ant information for the searcher.

.6.1. Key sentences with latent argumentativetructuringey word assignment has been largely used in MED-INE to provide an indicative ‘‘gist’’ of the contentf articles. Abstracts are also used for this pur-ose. However, with usually more than 300 words,bstracts can still be regarded as long documents;herefore we designed a system to select a unique

Fig. 3 Example of explicitly

‘METHODS:’’, ‘‘RESULTS’’, ‘‘CONCLUSION:’’ (cf.ig. 3).

We use a Bayesian classifier which has the advan-age of showing linear complexity [36], while mostther top performing algorithms tend to have auadratic complexity; therefore they are oftenore adapted for rapid application developments

nd exploratory studies [37,38]. Three types of fea-ures are linearly combined to get a final proba-ility ranking per class: stems; stem bigrams andtem trigrams. This approach has been evaluatednd finally for the CONCLUSION class, the F-scorei.e. the harmonic mean, with recall and preci-ion having the same importance; cf [39]) reaches4%. While recall shows excellent effectiveness,recision could still be improved: conclusion seg-ents are well classified, but some non-conclusion

ctured abstracts in MEDLINE.

80 A. Gaudinat et al.

Fig. 4 Automatic conclusion detection for the query‘‘Montelukast for children with asthma’’.

sentences are classified as conclusion (false posi-tives). This is problematic for RESULTS segments,which is found ill-defined by the classifier, but look-ing at the corpus, the distinction between RESULTSand CONCLUSION appears questionable, so thatmerging these two classes could be both benefi-cial and legitimate. So, Naive Bayes classifiers pro-vide an adapted framework to perform argumenta-tive classification and outperforming expert-drivenapproaches.

Fig. 4 presents an example of a conclusionreturned on a results page from WRAPIN. This exam-ple perfectly illustrates the usefulness of this func-tionality for the query ‘Montelukast for childrenwith asthma’ where the conclusion provides a clearanswer to the user’s query. However, it happensthat the conclusion, while providing an argumen-tative summary, is not directly related to the infor-mative content of the query. Therefore the selectedpassage is displayed to the user only when it sharesa minimal lexical similarity with the query. Other-wise, the classic KeyWord in Context (KWiC) displayis often a more appropriate choice for category-driven passage extraction.

attempts to find the segments that maximize thenumber of medical terms and other keywords (akind of ‘‘keyword diversity’’, including synonyms).For this search for segments, each term is weightedaccording to its weight in the initial query, wheremedical terms are favored (Fig. 5).

In both types of resume, each medical term(MeSH) is displayed in orange (with rapid access toits definition, for synonyms as well) and in red fornon-medical keywords.

2.7. Reformulation by concepts

According to Hearst [40] search engine users oftenhave only a vague idea of how to express the objectof their queries. A great number of queries sub-mitted to search engines contain only a word ortwo, according to Jansen and Spink [41], with theresult that the multitude of documents returnedmay be relevant to the (vague) query, but uselessto the searcher. In light of these considerations,an efficient search system should interact with theuser, offering meaningful suggestions to add preci-sion to the query. This would be especially valuablein the medical field where users may not have therdtVaqiWstpmt(

or th

2.6.2. Advanced Keyword In Context systemIn the case of non-structured documents such asweb pages, a classical solution was applied. Theuse of a KWiC algorithm offers users an approx-imate synthesis of the document. In our case, it

Fig. 5 Sample of WRAPIN results f

equisite knowledge or vocabulary to specify theesired results. Web search engines now use clus-ering methods (the best know of these is probablyivissimo [42]) based on automatic classificationpproaches that allow users to refine and specifyueries according to subject classes found automat-cally in the documents returned for a query [43].

RAPIN offers similar functionality based exclu-ively on the MeSH. Preprocessing uses the MeSHo categorize all documents prior to indexing, thenerform a query and return the MeSH terms com-on to the first 10 documents and propose a list of

erms related to: (1) the query (intrinsically) and2) the documents for the query.

e query ‘‘fever children aspirin’’.

Health search engine with e-document analysis for reliable search results 81

Table 2 Reformulation suggestions for the query‘‘fever children aspirin’’

Reye syndromeRespiratory tract infectionsAcetaminophenBacterial infectionsInfectionVirusesPneumoniaPharyngitisBronchitis chronicPharynxMucusCommon coldRespiratory sounds

Table 2 presents a list of MeSH terms proposed byWRAPIN for the query ‘‘fever children aspirin’’. Thefirst term returned, ‘‘Reye syndrome’’ is entirelyrelevant since the use of aspirin to treat fever inchildren can provoke this condition; as an alternatetreatment one can use ‘‘acetaminophen’’, which isoffered as a third MeSH term. A major differenceof this system is that it does not restrict as a resultof user interaction the user’s search space, insteadproposing concepts related to the original query, inorder to refine it.

2.8. Multilingual approach, Spell checkingand language guessing

Drawing on experience from HONSelect, WRAPINuses the MeSH as departure point for multilingualfunctionality. Analysis of the query allows recog-nition of the most relevant MeSH terms in thequery. These terms are then translated into theother languages for which MeSH has been madeavailable within WRAPIN (English, German, French,Spanish and Portuguese). The advantage of usinga thesaurus for the translation in the CLIR (CrossLanguage Information Retrieval) has been shown,

notably by Eichmann [44]. This researcher uses theOHSUMED corpus, with queries translated manuallyinto Spanish and French with the goal of findingthe same documents in all three languages follow-ing automatic translation of the queries into Englishusing UMLS (with various strategies) as the centralterminology. Very good results have been obtainedfor Spanish with less favorable results for French,which he attributes to linguistic differences. In ourcase, we use a stemmer for French, synonyms,and the 2005 version of the MeSH, which under-goes annual refinements. In WRAPIN, four queriesare performed in parallel in order to query thedatabases in the other languages.

Fig. 6 shows part of a page of results for the query‘‘fever children aspirin’’. This query (translated)produced 84 documents in French, 46 in Spanish and7 in German. Fig. 7 presents the same page with theMeSH terms translated into French. Note that thereformulation is now offered in French, since thequery has been translated into French.

Prior to handling of the query, a language detec-tor and spelling corrector are called. The latteris valuable for non-professional users who mayapproach the medical domain in an approximatewccwomtrrbttclsM

ery ‘

Fig. 6 Results for the qu

ay (for a discussion of the importance of spellingorrection in information searching, see [45]). Theorrection tackles the form of a single suggestion,hile the query continues to execute. This stylef correction is also known by the name, ‘‘Did Youean’’, popularized by Google. In his evaluation of

he correction tools used by NLM, Crowell et al. [46]eports that the tool GNU Aspell [47] gives betteresults than the tool GNU Gspell. Our corrector isased on Aspell and a classic DTW (better known forext under the name, Levenstein edit distance [48])o select the most similar candidate, where medi-al terms are favored. Aspell has the advantage of aarge panel of dictionaries (52 languages in the ver-ion we used). Medical terms were added using theeSH in the different languages used in WRAPIN,

‘fever children aspirin’’.

82 A. Gaudinat et al.

Fig. 7 Multilingual results for the query ‘‘fever children aspirin’’.

since Aspell allows creation of user-defined dictio-naries. The language detector is a hybrid, combin-ing a lexical approach based on a list of stopwordsand the MeSH with an ngram approach (cf. [49]),also derived from the MeSH.

3. WRAPIN evaluation

The WRAPIN search engine in exploitation today isthe result of the prototype tested and improved bythe evaluation outcomes.

The evaluation presented here has been con-ducted on the prototype in order to assess therelevance of the WRAPIN concept and functional-ities. The evaluation has been conducted at theend of the project (March 2004). The intent wasto expose various citizens, patients and individ-uals’ and medical professionals, to the concreteuse of WRAPIN’s functionalities and to analyzetheir perception in terms of ergonomic and per-ceived quality and usability of the replies givenby WRAPIN for a set of health questions proposed.These health questions have been extracted fromFAQ (Frequently Asked Questions) found on gen-eral public medical portals and disease specific sites

show that future developments should embodythe capacity for WRAPIN to cover a larger numberof specialized bases (as well as larger bases) andsubjects. Better information about the capacitiesof WRAPIN is essential. Some functions not avail-able with any other search engine, such as theURL evaluation need to be clearly explained to theusers.

Concerning the man/machine interface, and itsergonomic in general, WRAPIN needs more improve-ments at three different levels: general layout ofthe pages, better management of colors, fonts,logos and pictograms, light on line help, and a littlewritten documentation.

With a view to exploitation, the audience ques-tion remains central: WRAPIN has to be adaptedto an audience of citizens, or to differentiatebetween a ‘technical WRAPIN’ turned towardshealth professionals, and a ‘patients and individ-uals WRAPIN’. In order to take into account thisresult WRAPIN has introduced the ‘‘stethoscope’’pictogram to designate those more technicaldatabases.

In the view of the testers, WRAPIN will over-perform other engines when the replies are perti-natosm

torioi

bat

dedicated to the citizens.

4. Results

The use of WRAPIN has, on the whole, beenperceived as informative 65% (59% for the otherengine), reliable and trustworthy 72% (41% fora general-purpose search engine) by users. Yetin a number of cases, but this was also true forthe general-purpose search engine, the repliesgiven by the system were irrelevant. The rate ofirrelevance/impertinence has been scientificallyreckoned, and finally reduced. The evaluationresults show that the upfront lexical analysis ofthe queries, as well as the ergonomic of the refor-mulation left room for improvement. The results

ent. By providing a synthetic and reliable reply toquery, or an assessment of a document further to

he submission of an URL, WRAPIN goes beyond thether engines, and brings a valuable tool to userseeking trustworthy medical and health care infor-ation.As regards content, the evaluation has shown

hat users would tend to trust WRAPIN more thanther engines, due to the fact that certified mate-ial is considered, against lower quality material,f not obnoxious material, possibly some times inther cases. Nevertheless, trustworthiness is frag-le and needs to be constantly reinforced.

In the case of WRAPIN, trustworthiness can beroken down into at least three main components:lgorithmic trustworthiness, data and informationrustworthiness and organisation trustworthiness.

Health search engine with e-document analysis for reliable search results 83

5. Discussion

WRAPIN offers in many different ways innovatingfunctionalities related to the search of medical andhealth online information. WRAPIN specialized onthe health domain and with its features and know-how tries to remedy to the gaps of the most popu-lar Web search tools. These general-purpose searchengines, not really adapted to the quality of infor-mation, benefit of great success because of thelarge coverage of indexing and the rapidity of theprocessing. If the algorithms such as PageRankingmake it possible to integrate in an inherent way aconcept of quality of information yet it makes itdifficult to quantify it. Studies made on this sub-ject are too approximate in order to reach any finalconclusion especially in the health domain whereskepticism is required.

The alternative suggested by WRAPIN is foundedon sources of reliable information and on function-alities which are based on detailed terminologi-cal resources specifically related to the medicaldomain (not easily transposable to other domainsbecause less studied). The reliable resources beingfewer, the major disadvantage of the approach iswoWqfq

aeWwtbbabp

rast

tiOindef

and not the date of update of the document. So,this functionality is useless. On the contrary ref-erence sites such as PubMed, logically favor thelatest articles for a given subject. In pursuing thesame goal, HONs News allows an intelligent combi-nation between the relevance of the query and itspublication date. Why would relevant news fromthe point of view of the subject be useful if theinformation is obsolete? Or what is the interest toobtain the latest news if it does not correspond toits query? With regard to WRAPIN, resources such asMEDLINE or News authorize a search according tothe date. Moreover, principle 4 of the HONcode—–information must be documented: referenced anddated—–intrinsically allows to guarantee that theupdated information is at least available on thepages of accredited sites. WRAPIN is therefore notthe ultimate solution to the existing problems ofthe health Web. But with its innovating and mul-tilingual functionalities and its reliable resources,it is certainly the most powerful alternative in thehealth domain when it comes to navigate with truston this extraordinary resource that is the WorldWide Web.

We wish to dedicate this article to the memoryoo

A

Ts‘(twp

R

ithout doubt the lack of coverage. But at a guessf the redundancy of information present on theeb and the increased number of pages of pooruality, let us guarantee, following the example ofoodstuffs, that the users prefer a better quality touantity.

The evaluation enabled us to check that ourpproach was useful and that it answers to thexpectation of most users. Since this evaluation,RAPIN has greatly evolved and has still got someays to go. For resources reasons, the algorithm of

he PageRanking type was not integrated in WRAPINut its adaptation should make it possible to obtainetter results. However, it is difficult to quantify indvance the interest of this method within WRAPINecause of the particularity of the sub-networksroduced by quality resources.

The integration of new sources of information iselatively easy. With regard to the Web, the cover-ge of the tools such as MedHunt and HONcodeHunthould be improved soon while respecting at anyime the quality of this information.

In addition to the ‘‘relevance’’ of information,he quality is indeed an important dimension thats often excluded from most Web search engines.ne of the important points concerning quality, also

gnored among Web tools, is certainly the fresh-ess of the information (i.e. update date of theocument). Even if the advanced options of searchngines propose to search for a period of time, theeature is related to the date of the indexing date

f the Professor Jean-Raoul Scherrer, the initiatorf this new generation of search engine.

cknowledgements

he WRAPIN project, IST-2001-33260, has beenupported by the European Commission and the‘Office Federal de l’Education et la Science’’OFES, Switzerland). The authors wish to thank allhe WRAPIN members as well as the WRAPIN testerith their precious contribution in this ambitiousroject.

eferences

[1] National Library of Medicine, Medical subject headings,http://www.nlm.nih.gov/mesh.

[2] J.L. Borges, The library of Babel, in: Labyrinths, SelectedStories, & Other Writings, New Directions Pub. Corp., NewYork, 1964.

[3] B.J. Jansen, A. Spink, How are we searching the World WideWeb? A comparison of nine search engine transaction logs,Information Processing and Management, 2004.

[4] Nielsen NetRatings Search Engine Ratings: http://searchenginewatch.com/reports/article.php/2156451(June 2005).

[5] S.J. Darmoni, J.P. Leroy, F. Baudic, M. Douyere, J. Piot, B.Thirion, CISMeF: a structured health resource guide, Meth-ods Inf. Med. 39 (1) (2000) 30.

[6] C. Boyer, V. Baujard, V. Griesser, J.R. Scherrer, Rela, LinksHONselect: a multilingual and intelligent search tool inte-

84 A. Gaudinat et al.

grating heterogeneous web resources, Int. J. Med. Inform.64 (2—3) (2001) 253—258.

[7] S. Brin, L. Page, The anatomy of a large-scale hypertextualweb search engine, vol.3, in: Proceedings of the SeventhInternational World Wide Web Conference, Brisbane, Aus-tralia, 1997, ACM Press.

[8] L. Page, S. Brin, R. Motwani, T. Winograd, The pager-ank citation ranking: bringing order to the web, Technicalreport, Stanford Digital Libraries, 1998.

[9] B. Amento, L. Terveen, W. Hill, in: Proceedings of theTwenty-Third Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval, Does‘‘Authority’’ Mean Quality? Predicting Expert Quality Rat-ings of Web Documents, 2000.

[10] H. Borges, M. Cervi, P.T. Alvarez de Arcaya, G. Guardado, R.Rabaza, J. Sosa, Rate of compliance with the HON code ofconduct versus number of inbound links as quality mark-ers of pediatric web sites, in: Proceedings of the SixthWorld Congress on the Internet in Medicine, Udine, Italy, 29November—2 December 2001, http://mednet2001.drmm.uniud.it/proceedings/paper.php?id = 75.

[11] C. Boyer, M. Selby, R.D. Appel, The health on the netcode of conduct for medical and health web sites, 1997,MEDNET97—–European Congress on the Internet in Medicine,Brighton, UK, 3—6 November 1997.

[12] M.A. Mayer, A. Leis, R. Sarrias, P. Ruiz, Web Medica Acred-itada Guidelines: Reliability and Quality of Health Infor-mation on Spanish-language websites, vol. I, No. 1, in: R.Engelbrecht, et al. (Eds), Connecting Medical Informaticsand Bioinformatics, Proceedings of the 19th International

[23] Computation of related articles: http://www.ncbi.nlm.nih.gov/entrez/query/static/computation.html.

[24] A. Gaudinat, C. Boyer, Automatic extraction of MeSH termsfrom MEDLINEs abstracts, Workshop on Natural LanguageProcessing in Biomedical Applications, NLPBA, 2002, pp.53—57.

[25] A. Neveol, V. Mary, A. Gaudinat, C. Boyer, A. Rogozan,S. Darmoni, A benchmark evaluation of the French MeSHindexers, in: 10th Conference on Artificial Intelligence inMedicine (AIME 05), 23—27 July 2005, Aberdeen, Scotland.

[26] B.L. Humphreys, D.A.B. Lindberg, Building the Unified Med-ical Language System, in: Proceedings of the 13rd SCAMC,IEEE Computer Society Press, 1989, pp. 475—480.

[27] A. Burgun, O. Bodenreider, Methods for exploring thesemantics of the relationships between co-occurring UMLSconcepts, Proc. Medinfo. (2001) 171—175.

[28] M. Joubert, S. Aymard, D. Fieschi, F. Volot, et al., ARI-ANE: Integration of Information Databases within a HospitalIntranet, Int. J. Med. Inf. 49 (1998) 297—309.

[29] A.T. McCray, The UMLS Semantic Network, in: Proceedingsof the 13rd SCAMC, IEEE Computer Society Press, 1989, pp.503—507.

[30] A. Gaudinat, M. Joubert, S. Aymard, L. Falco, C. Boyer,M. Fieschi, WRAPIN: new generation health search engineusing UMLS knowledge sources for MeSH term extrac-tion from health documentation, Medinfo (2004) 356—360.

[31] G. Salton, M. McGill, The SMART Retrieval System—–Experiments in Automatic Document Retrieval, PrenticeHall, Englewood Cliff, 1971.

Congress of the European Federation for Medical Informat-ics, MIE 2005, Geneve, Switzerland, Munich-Heuherberg,Germany, 2005, pp. 1287—1292.

[13] G. D’Andrea, Health web site accreditation: opportunitiesand challenges, Manage. Care Q. 10 (1) (2002) 1—6.

[14] C. Boyer, A. Geissbuhler, A decade devoted to improvingonline health information quality, in: Proceedings of the19th International Congress of the European Federationfor Medical Informatics, MIE 2005, Geneve, Switzerland,Munich-Heuherberg, Germany, 2005.

[15] D. Fallis, M. Fricke, Indicators of accuracy of consumerhealth information on the Internet: a study of indicatorsrelating to information for managing fever in children inthe home, J. Am. Med. Inform. Assoc. 9 (1) (2002) 73—79.

[16] N. Miller, E.M. Lacroix, J.E. Backus, MEDLINEplus: buildingand maintaining the National Library of Medicine’s Con-sumer Health Web Service, Bull. Med. Libr. Assoc. 88 (1)(2000) 11—17.

[17] G. Eysenbach, G. Yihune, K. Lampe, P. Cross, D. Brickley, Ametadata vocabulary for self- and third-party labeling ofhealth web-sites: health information disclosure, descrip-tion and evaluation language (HIDDEL), Proc. AMIA Symp.(2001) 169—173.

[18] T.H. Haveliwala, Topic-sensitive PageRank: a context-sensitive ranking algorithm for web search, IEEE Trans.Knowl. Data Eng. 15 (4) (2003) 784—796.

[19] K.J. Sparck, Synonymy and Semantic Classification, Edin-burgh University Press, 1986.

[20] A.R. Aronson, T.C. Rindflesch, A.C. Browne, Exploiting alarge thesaurus for information retrieval, in: Proceedingsof RIAO, 1994.

[21] A.R. Aronson, O. Bodenreider, H.F. Chang, S.M. Humphrey,J.G. Mork, S.J. Nelson, T.C. Rindflesch, W.J. Wilbur, TheNLM indexing initiative, Proc. AMIA Symp. (2000) 17—21.

[22] A.R. Aronson, Effective mapping of biomedical text to theUMLS Metathesaurus: The MetaMap Program, Proc. AMIASymp. (2001) 17—21.

[32] Text REtrieval Conference (TREC): http://trec.nist.gov.[33] P. Ruch, R. Baud, C. Chichester, A. Geissbuhler, Latent argu-

mentative structuring for extraction of key sentences, vol.I, No. 1, in: Proceedings of the 19th International Congressof the European Federation for Medical Informatics, MIE2005, Geneve, Switzerland, Munich-Heuherberg, Germany,2005.

[34] I. Tbahriti, C. Chichester, F. Lisacek, P. Ruch, Usingargumentation to retrieve articles with similar citations:an inquiry into improving related articles search in theMEDLINE Digital Library, Int. J. Med. Inform., 2005, inpress.

[35] P. Ruch, R. Baud, J. Marty, A. Geissbuhler, I. Tbahriti, A.-L.Veuthey, Latent Argumentative Pruning for Compact MED-LINE Indexing, AIME (2005) 246—250.

[36] P. Domingos, M. Pazzani, On the optimality of the simpleBayesian classifier under zero-one loss, Machine Learn. 29(2—3) (1997) 103—130.

[37] Y. Yang, J. Pedersen, A comparative study on feature selec-tion in text categorization, in: Proceedings of the ICML, 14thInternational Conference on Machine Learning, 1997, pp.114—121.

[38] L. McKnight, P. Srinivasan, Categorization of sentence typesin medical abstracts, AMIA (2003).

[39] S. Teufel, M. Moens, Sentence extraction and rhetoricalclassification for flexible abstracts, AAAI Spring Sympo-sium on Intelligent Text Summarization, 1998, pp. 89—97.

[40] M. Hearst, User interfaces and visualization, in: ModernInformation Retrieval by R. Baeza-Yates and B. Ribeiro-Neto, Addison Wesley.1999.

[41] B.J. Jansen, A. Spink, How are we searching the WorldWide Web? A comparison of nine search engine transac-tion logs, Inf. Process. Manage. 42 (2006) 248—263, inpress.

[42] Vivisimo clustering engine, 2004, http://vivisimo.com,accessed Oct. 2005.

Health search engine with e-document analysis for reliable search results 85

[43] R.B. Allen, P. Obry, M. Littman, An interface for navigatingclustered document sets returned by queries, in: Proceed-ings of the ACM Conference on Organizational ComputingSystems, 1993, pp. 166—171.

[44] D. Eichmann, M. Ruiz, P. Srinivasan, Cross-Language Infor-mation Retrieval with the UMLS Metathesaurus. In Proceed-ings of the 21st Annual International ACM SIGIR Conferenceon Research and Development in Information Retrieval, Mel-bourne, Australia, 1998.

[45] P. Ruch, R. Baud, A. Geissbuhler, Evaluating and reducingthe effect of data corruption when applying bag of wordsapproaches to medical records, Int J Med Inform. 67 (1—3)(2002 Dec 4) 75—83.

[46] J. Crowell, Q. Zeng, L. Ngo, E.M. Lacroix, A frequency-based technique to improve the spelling suggestionrank in medical queries, J. Am. Med. Inform. Assoc.(2004).

[47] K. Atkinson, GNU ASpell, version 0.50.3, Produced bySourceForge.Net, available at: http://aspell.sourceforge.net/, accessed on: February 2004.

[48] A. Levenstein, Binary codes capable of correcting deletions,insertions and reversals, Soviet Phys., Doklandy 10 (1966).

[49] W.B. Cavnar, J.M. Trenkle, N-gram-based text cate-gorization, in: Proceedings of SDAIR-94, Third AnnualSymposium on Document Analysis and InformationRetrieval.