Semantic representation of consumer questions and physician answers

17
International Journal of Medical Informatics (2006) 75, 513—529 Semantic representation of consumer questions and physician answers Laura A. Slaughter a,, Dagobert Soergel b , Thomas C. Rindflesch c a Center for Shared Decision Making and Nursing Research, Rikshospitalet-Radiumhospitalet RH, Forskningsveien 2b, 4. etg., 0027 Oslo, Norway b College of Information Studies, University of Maryland, College Park, MD, USA c National Library of Medicine, Bethesda, MD, USA Received 29 October 2004; received in revised form 29 June 2005; accepted 17 July 2005 KEYWORDS Semantic processing; Public health; Unified Medical Language System; Information retrieval; Natural language processing Summary The aim of this study was to identify the underlying semantics of health consumers’ questions and physicians’ answers in order to analyze the semantic patterns within these texts. We manually identified semantic relationships within question—answer pairs from Ask-the-Doctor Web sites. Identification of the seman- tic relationship instances within the texts was based on the relationship classes and structure of the Unified Medical Language System (UMLS) Semantic Network. We calculated the frequency of occurrence of each semantic relationship class, and conceptual graphs were generated, joining concepts together through the semantic relationships identified. We then analyzed whether representations of physician’s answers exactly matched the form of the question representations. Lastly, we examined characteristics of the answer conceptual graphs. We identified 97 seman- tic relationship instances in the questions and 334 instances in the answers. The most frequently identified semantic relationship in both questions and answers was brings about (causal). We found that the semantic relationship propositions identified in answers that most frequently contain a concept also expressed in the question were: brings about, isa, co occurs with, diagnoses, and treats. Using extracted semantic relationships from real-life questions and answers can produce a valuable analysis of the characteristics of these texts. This can lead to clues for creating semantic-based retrieval techniques that guide users to further infor- mation. For example, we determined that both consumers and physicians often express causative relationships and these play a key role in leading to further related concepts. © 2005 Elsevier Ireland Ltd. All rights reserved. Corresponding author. Tel.: +47 23 07 54 63; fax: +47 23 07 50 10. E-mail address: [email protected] (L.A. Slaughter). 1. Introduction Recent research in medical information processing has focused on health care consumers. These users 1386-5056/$ — see front matter © 2005 Elsevier Ireland Ltd. All rights reserved. doi:10.1016/j.ijmedinf.2005.07.025

Transcript of Semantic representation of consumer questions and physician answers

International Journal of Medical Informatics (2006) 75, 513—529

Semantic representation of consumer questionsand physician answers

Laura A. Slaughtera,∗, Dagobert Soergelb, Thomas C. Rindfleschc

a Center for Shared Decision Making and Nursing Research, Rikshospitalet-Radiumhospitalet RH,Forskningsveien 2b, 4. etg., 0027 Oslo, Norwayb College of Information Studies, University of Maryland, College Park, MD, USAc National Library of Medicine, Bethesda, MD, USA

Received 29 October 2004; received in revised form 29 June 2005; accepted 17 July 2005

KEYWORDSSemantic processing;Public health;Unified MedicalLanguage System;Information retrieval;Natural languageprocessing

Summary The aim of this study was to identify the underlying semantics of healthconsumers’ questions and physicians’ answers in order to analyze the semanticpatterns within these texts. We manually identified semantic relationships withinquestion—answer pairs from Ask-the-Doctor Web sites. Identification of the seman-tic relationship instances within the texts was based on the relationship classes andstructure of the Unified Medical Language System (UMLS) Semantic Network. Wecalculated the frequency of occurrence of each semantic relationship class, andconceptual graphs were generated, joining concepts together through the semanticrelationships identified. We then analyzed whether representations of physician’sanswers exactly matched the form of the question representations. Lastly, weexamined characteristics of the answer conceptual graphs. We identified 97 seman-tic relationship instances in the questions and 334 instances in the answers. Themost frequently identified semantic relationship in both questions and answerswas brings about (causal). We found that the semantic relationship propositionsidentified in answers that most frequently contain a concept also expressed inthe question were: brings about, isa, co occurs with, diagnoses, and treats. Usingextracted semantic relationships from real-life questions and answers can producea valuable analysis of the characteristics of these texts. This can lead to cluesfor creating semantic-based retrieval techniques that guide users to further infor-mation. For example, we determined that both consumers and physicians oftenexpress causative relationships and these play a key role in leading to furtherrelated concepts.© 2005 Elsevier Ireland Ltd. All rights reserved.

∗ Corresponding author. Tel.: +47 23 07 54 63;fax: +47 23 07 50 10.

E-mail address: [email protected](L.A. Slaughter).

1. Introduction

Recent research in medical information processinghas focused on health care consumers. These users

1386-5056/$ — see front matter © 2005 Elsevier Ireland Ltd. All rights reserved.doi:10.1016/j.ijmedinf.2005.07.025

514 L.A. Slaughter et al.

often experience frustration while seeking onlineinformation [1—3], due to their lack of under-standing of medical concepts and unfamiliaritywith effective search strategies. We are exploringthe use of semantic relationships as a way ofaddressing these issues. Semantic informationcan guide the lay health consumer by suggestingconcepts not overtly expressed in an initial query.For example, imagine that a user submits a fullquestion to a search system in the health caredomain to find out whether exercise helps preventosteoporosis. The semantic relationship prevents inthe proposition representing the question, namely‘‘exercise prevents osteoporosis’’, can support thiseffort; prevents might be used with osteoporosisto determine additional ways of avoiding thisdisorder.

We present an analysis of semantic relationshipsthat were manually extracted from questionsasked by health consumers as well as answersprovided by physicians. Our work concentrateson samples from Ask-the-Doctor Web sites. TheSemantic Network from the Unified Medical Lan-guage System (UMLS) [4,5] served as a source forsemantic relationship types and this inventory

2. Background

2.1. Semantic relationships

A semantic relationship associates two (or more)concepts expressed in text and conveys a meaningconnecting those concepts. A large variety ofsuch relationships have been identified in sev-eral disciplines, including linguistics, philosophy,computer science, and information science. Someresearchers have organized hierarchies of semanticrelationships into meaningful but not formal struc-tures [6,7]. Others examine specific relationships indepth, for instance, subsumption [8,9], temporality[10], and meronymy [11]. In addition, ontologies(general purpose and medical) contain semanticrelationships that are elements of the overallsystem. WordNet, for example, contains these pri-mary relationships between concepts: hypernymy(superordination), antonymy, entailment (some-thing inferred), and meronymy (part—whole) [12].

A number of projects have involved the studyof semantic relationships specifically within thedomain of medicine. Work on the GALEN CommonReference Model examined part—whole rela-ttaafcSti

sowisEatcw

2s

Sotkai

was modified as we gained experience with rela-tionship types identified in the health consumertexts.

The objective of this study was to character-ize the semantics of consumer health texts inorder to develop better search functions withinsystems that provide health information to con-sumers. Our first task towards accomplishing thiswas to identify propositions within these texts andcalculate the frequency of occurrence of semanticrelationships. For example, we record the propo-sition ‘‘X <follows> bypass surgery’’ from thequestion, ‘‘What happens after (follows) bypasssurgery?’’ Our second task was to identify theways that questions are connected to answers,since this provides a useful start for construct-ing query strategies involving semantic informa-tion. For instance, we record when the physi-cian answers the question directly with ‘‘loss ofappetite follows bypass surgery.’’ We also lookfor cases in which the physicians’ answer con-tains an implied relationship such as the isa rela-tion between ‘‘bypass surgery’’ in the above ques-tion and ‘‘operation’’ in the answer, ‘‘If you’reundergoing a scheduled operation for debilitatingangina. . .’’. Our final task was to study the pat-terns of semantic relationships within answers. Werecord when the physician expands on the con-cept ‘‘debilitating angina’’, for instance, with anexpression such as ‘‘angioplasty treats debilitatingangina.’’

ionships [13] and other aspects of ‘‘tangled’’axonomies [14]. Other ontology projects, suchs the foundational model of anatomy (FMA),re central to the delineation of relationshipsor use in specific types of applications, in thisase representation of anatomical structures [15].mith and Rosse [16], for example, address howhe formal treatment of taxonomy and partonomyn the FMA can support alignment of ontologies.

The UMLS Semantic Network [5] contains 54emantic relationships, which serve as the basis ofur analysis of health consumer texts. This systemas chosen as a starting point for our study because

t contains a comprehensive hierarchy of relation-hips linking to a broad range of medical concepts.ach relationship type in the Semantic Network hasdefinition and is also connected to two seman-

ic types to form a binary proposition (this providesontext about what the relationship might mean asell).

.2. Semantic relationships in informationystems

everal researchers have proposed the integrationf semantic relations in information retrieval sys-ems as a way of addressing inadequate domainnowledge in users. For example, Chakravarthynd Haase [17] describe an application that explic-tly shows the semantic relationships within an

Semantic representation of consumer questions and physician answers 515

expressive query language used for retrieval. TheHIBROWSE [18] project illustrated the idea ofsemantic relationships in ‘‘view-based searching’’using ‘‘knowledge structure hierarchies.’’ Morerecently, Khoo et al. [19,20] explored techniquesfor using causal relationships for retrieval and pro-posed a way that causal semantic relationshipsextracted from Web documents can be ‘‘chained orconnected to give a conceptual map of the informa-tion available on a particular topic.’’ Miles-Boardet al. [21] illustrated how ‘‘ontological hypertext’’can be used to provide ‘‘principled and intelligentnavigation of knowledge’’ through exposing under-lying domain knowledge to users.

Several systems within the medical domain haveemployed the use of semantic relationships inorder to aid health-care professionals’ search. Men-donca et al. [22,23] used conceptual map rep-resentations (terms and their relationships) frompatients’ records to add contextual informationto searches of medical literature for clinicians.Brandt et al. [24] built a retrieval system for ananesthesiology information source that displayedsemantic relationships within the retrieved resultsthat linked to the searched keyword. The TAM-Bufiaatea

ttrchhse

altreJeNpvsc

We see our work as underpinning the constructionof tools that exploit semantic relationships to helplay users navigate medical knowledge.

3. Methods

3.1. Overview

Our goal was to characterize the semantics of con-sumers’ health questions and physicians’ answersfound on Ask-the-Doctor Web sites. We had threemain research questions: (1) what semantic rela-tionships exist in health consumers’ questions andin the answers provided by physicians, (2) how areconcepts in the answers connected to those in thequestions, and (3) how are the propositions withinanswers interconnected to form overall seman-tic structure. Our methods are described belowin three sections that correspond to the researchquestions: identifying propositions, identifying theways that questions are connected to answers, andidentifying patterns of semantic relationships inanswers.

3a

WATuItsiFpttoteoswqwo

3p

Wb

IS/STARCH projects [25] have demonstrated these of ‘‘ontology-driven’’ interfaces for queryormulation assistance when searching biologicalnformation sources. Detwiler et al. [15] created

query engine for the foundational model ofnatomy (FMA) that enables users to explore struc-ural relationships. The major goal of the FMA queryngine is to support educational computer-basednatomy programs.

Although semantic information used in informa-ion retrieval systems can improve performance,here is no direct evidence that explicit semanticepresentations help lay users understand medi-al texts and formulate effective questions aboutealth concerns. However, research in text compre-ension and cognitive models of health consumersuggests that such representations may provide anffective method for lessening user frustration.

Zeng et al. [1] state that ‘‘consumers employdifferent mental model from clinicians.’’ When

earners with low domain knowledge read texts,he solution to improve coherence is to ‘‘make theelations between items in the text or between gen-ral knowledge and the text fully explicit’’ [26].onassen et al. [27] suggests that technology-basedxternal representation tools (such as Semanticetworks) can have a positive effect on students’roblem-solving performance. In this study we pro-ide an analysis of some of the semantic repre-entations underlying dialogue between lay healthonsumers and professional information providers.

.2. Materials: characteristics of questionnd answer texts

e analyzed 12 question—answer pairs from sevensk-the-Doctor Web sites, such as Ask the Diabeteseam. All selected sites are designated trustworthysing American Medical Association guidelines [28].n order to avoid institutional bias and to maximizehe types of semantic relationships identified, weelected sites that cover a variety of medical top-cs. An example of question and answer is given inig. 1. The average length of each question—answerair in our sample was 275 words. The average ques-ion message sent to a physician contains four sen-ences, three assertions providing background andne actual question sentence. Each individual ques-ion sentence was labelled as a ‘‘subquestion.’’ Forxample, in Fig. 1, the question message containsnly one sub-question. The maximum number ofub-questions asked in a message was three. Thereere a total of 20 sub-question sentences in the 12uestion messages sampled. The average number ofords per answer was 199 and the average numberf sentences per answer was 10.

.3. Identifying propositions: the codingrocess

e completed several iterations of manual codingefore arriving at the final set of propositions from

516 L.A. Slaughter et al.

Fig. 1 An example of a question—answer pair from an Ask-the-Doctor Web site. The question message contains onesub-question and five background statements.

the question—answer texts. The first author wasthe main coder throughout the process of iterativecoding. The second and third authors were oftenconsulted while coding instances of semantic rela-tionships.

As the first step in the process, we created an ini-tial conceptual graph [29] to help sort out areas ofinconsistent coding and uncertainty in class assign-ment. Using the graphs as a guide, the relation-ships between the concepts were reviewed. At thisstage, the classes of relationships found in the UMLSSemantic Network were revised in order to capturethe perceived meaning underlying the textual data.When we observed instances that did not quite fit,we consulted other sources and adapted the inven-tory to the relationship classes identified in thehealth consumer texts. The final list of semanticrelationship classes is shown in Table 1.

The propositions were then represented as framestructures. Each class of relationship became aframe with multiple possible slots (now n-ary notbinary relationships). Slot names for the frameswere constructed, reviewed, and revised duringcoding iterations. In the final iteration, the propo-sitions were checked for errors. We also regener-

instances, for further analysis of the characteristicsof the inter-connections between the many con-cepts in each text.

The first author was the primary coder of seman-tic relationships within the question and answertexts for this project. To assess coding consistency,the first author recoded three of the questionsand answers several months after the last iterationwas completed without re-examining the previouslyidentified semantic relationships. Intra-coder reli-ability was calculated using percent agreement.

Two additional coders, a physician and a medi-cal librarian, coded one question—answer pair. Themain purpose of this was to assess differencesbetween coding styles. Agreement is difficult toachieve between coders because at least three (ormore) pieces must match: the relationship identi-fied must be the same, and the concepts selectedas arguments must also be the same. We obtainedmean percent agreement for the inter-rater coding.The main use of this data was, however, to refineour coding methods.

3.3.1. Guidelines for coding propositionsSelecting concepts from the texts requires somea

ated the conceptual graphs, based on the frame mount of subjective speculation on the part of

Semantic representation of consumer questions and physician answers 517

Table 1 Frequencies of semantic relationship types found in the questions (Qs) and answers (As)

Relationship Frequency (%) Relationship Frequency (%)

Q A Q A

0 associated with — — 3 temporally related to — —1 topologically related to — — 3.1 co-occurs with 6.2 2.11.1 part of — — 3.2 precedes 6.2 3.01.1.1 consists of 1.2 3.3 age of 2.1 —1.1.2 contained in — 3.9 3.4 cyclic frequency of — 0.61.1.3 ingredient of — 1.8 3.5 delays (also 2.1.2) 1.0 0.61.1.4 component of — 0.3 3.6 duration of 6.2 0.91.2 connected to — — 3.7 time position of 2.1 1.51.2.1 branch of — — 4 conceptually related to — —1.2.2 interconnected with — — 4.1 analyzes — 0.91.2.3 tributary of — — 4.1.1 assesses effect of — 2.41.3 location of 4.1 3.3 4.1.2 diagnoses 4.1 6.01.3.1 adjacent to 1.0 0.3 4.1.3 measures — 0.91.3.2 surrounds — 0.9 4.1.4 evaluation of — —1.3.3 traverses — 0.3 4.1.5 degree of 1.0 —2 functionally related to — — 4.1.6 measurement of — —2.1 affects — 0.6 4.1.7 compared to 2.1 5.42.1.1 absorbs — 0.6 4.2 property of 2.1 4.22.1.2 delays (also 3.5) 1.0 0.6 4.3 requires 1.0 1.82.1.3 complicates — 0.3 4.4 derivative of — —2.1.4 disrupts — 0.3 4.5 developmental form of — —2.1.5 facilitates — 0.3 4.6 method of — —2.1.6 increases — 1.2 4.7 issue in — 0.62.1.7 decreases — 1.5 5 isa 1 6.32.1.8 interacts with 1.0 3.62.1.9 manages — —2.1.10 prevents 5.2 2.72.1.11 treats 11.3 9.02.2 brings about 27.8 15.62.3 performs 1.0 0.32.3.1 carries out — —2.3.2 exhibits — 0.32.3.3 practices — —2.4 occurs in 4.1 1.22.5 process of 1.0 0.62.6 uses — —

Additional relationsdefinition of — 5.1has family relationship 3.1 0.9same concept syn term — 2.4relation x (unknown relationship) 4.1 —

For questions (Q) N = 97 and for answers (A) N = 334. The relationships occurring with a frequency greater than 5% are shown inbold italic type.

the coder. Within the data, the string chosen toparticipate in a relationship is usually the same aswhat has been specified in the text (or normalizedstring). In general, complex concepts were avoided,especially when they consisted of full phrases thatincluded verbs and prepositions. Noun compounds,such as ‘‘saliva protein,’’ were not broken down ifthey formed one distinct concept.

We recorded an anaphor by replacing it with itsreferent and writing the anaphor in brackets afterit. For example, the concept ‘‘flu’’ in the secondsentence, ‘‘I thought that I had the flu. Later, Ifound out it was bronchitis’’ appears in the dataas ‘‘flu [it].’’

The UMLS Semantic Network relationship classesare hierarchical and we coded for the most specific

518 L.A. Slaughter et al.

relationship in the hierarchy whenever possible.Therefore, many of the general categories of rela-tionship types, such as temporally related to werenever coded within the texts since we always madean attempt to identify the most specific relation-ship possible.

Inferred relationships recover from the textsomething that was not overtly asserted. Some ofthe identified relationships were inferred within thequestion texts or within the answer texts. For exam-ple, ‘‘He broke his arm and went to the hospital’’does not explicitly state a relationship that goingto the emergency room was the result of breakingan arm.

3.3.2. Illustration of coding process of semanticrelationships in health consumer textsIn this section, we illustrate the process of codingusing one sentence taken from a question state-ment:

‘‘I have had migraines frequently for the last twentyyears and during the last ten years I have had twoTIA’s (Transient Ischemic Attack).’’

‘temporal concept’). However, as seen in this exam-ple, the semantic type of migraines and TIAs, ‘dis-ease or syndrome’, does not connect to seman-tic type ‘temporal concept’ using the occurs inrelationship. We then reviewed some of the othercoded instances from our question—answer texts tofind that occurs in was mainly used to represent‘‘occurring in a population’’ subsumed under therelation functionally related to.

To untangle the multiple uses of occurs in, tem-poral relations were required to be subsumed undertemporally related to and we therefore modifiedthe definition of occurs in to: ‘‘Takes place inor happens in a given population. This includesappears in, comes about in, is present in, andexists in a population.’’ We created a new rela-tionship, subsumed under temporally related to,called duration of (inverse: has duration) with thedefinition ‘‘Related to the length of time an activitycontinues.’’

We then identified two instances of the relation-ship duration of that served the purpose of relat-ing the concepts ‘‘migraine’’ and ‘‘TIA (transientischemic attack)’’ to the time periods in which theywere experienced. The frame slots PhenomenonIn-Ttt

‘y

‘(

3rWtshttm

The concepts identified in this sentence are‘‘migraines’’ and ‘‘TIA’s (transient ischemicattack).’’ They are discussed with temporal con-cepts: ‘‘frequently,’’ ‘‘last twenty years,’’ and‘‘last ten years.’’ ‘‘Migraines occurring in the timeperiod of last 20 years’’ and ‘‘TIAs occurring inthe last 10 years’’ might be represented using theUMLS relation occurs in.

The occurs in (inverse: has occurrence) relationof the UMLS Semantic Network incorporates aspectsof time within the definition. ‘‘Takes place in orhappens under given conditions, circumstances, ortime periods, or in a given location or popula-tion. This includes appears in, transpires, comesabout, is present in, and exists in.’’ However,the main difficulty is that the occurs in relationappears subsumed under functionally related towhile other time relationships are placed undertemporally related to.

Although we did not code the semantic types ofthe concepts identified, we did look at the seman-tic types paired with each UMLS semantic rela-tionship as a method of improving our understand-ing of the definitions provided by the SemanticNetwork. A brief review of the semantic types inthe Semantic Network that can be related throughoccurs in illustrates that the relation is associatedwith populations (semantic type ‘group’), otherentities and processes (e.g. semantic types ‘organ-ism function’, ‘disease or syndrome’, and ‘injuryor poisoning’) in addition to time (semantic type

ime, FrequencyWithinDuration, and OverallDura-ion were added to hold the values identified in theext:

‘have had migraines frequently for the last twentyears’’

duration of-1PhenomenonInTime [migraines]FrequencyWithinDuration [frequently]OverallDuration [last twenty years]

‘during the last ten years I have had two TIA’sTransient Ischemic Attack)’’

duration of-2PhenomenonInTime [TIA’s (Transient IschemicAttack)]FrequencyWithinDuration [two times]OverallDuration [last ten years]

.3.3. Calculating the frequency of semanticelationshipse computed frequencies for each semantic rela-

ionship type in order to assess how often eachemantic relationship type appears in consumerealth questions and answer texts. We computedhese separately for propositions coded in the ques-ions and propositions coded within the answeressages.

Semantic representation of consumer questions and physician answers 519

Fig. 2 An example of coded semantic relationships in question and answer texts.

3.4. Identifying the ways that questions andanswers are connected

The 12 question—answer pairs we analyzed con-tained 20 sub-questions. Each of these sub-questions has at least one semantic relationshipinstance that represents the question itself, thatis, what information is lacking but desired by thequestioner. In the semantic relationship frames, theunknown could be one or more of the followingthree: an unknown concept in a slot, an unknownrelationship between two or more concepts, or arequest for verification (V) of the truth of the state-ment.

We assessed how the sub-questions are con-nected to answers through their semantic rela-tionships. Fig. 2 illustrates how the frames wereconnected for the analysis. In this example,we chose two frame instances, duration of andbrings about from a question statement and twoframe instances, precedes and brings about fromits answer. The slot value of PhenomenonIn-Time ‘‘transcient ischemic attack’’ in duration ofcontains the same concept as the Effect slot‘‘transcient ischemic attack’’ of brings about fromt

concepts. This forms a link between the questionand the answer. In addition to these question-to-answer connections, an implied link from questionto answer was identified using subjective assess-ment of the unstated relationships between ques-tion concepts and answer concepts. ‘‘Hemiplegicmigraine isa migraine’’ is the implied relationshipthat was not explicitly stated in the text itself.Co-referenced concepts within the answers, forexample ‘‘vessel constriction’’ are also of concernfor our analysis; these link semantic relationshipinstances to form large graph structures represent-ing answer texts.

3.4.1. Analysis of the question to answer linkWe first computed the frequency of semantic rela-tionship types containing co-referenced concepts.This means that we calculated the frequencythat each semantic relationship type occurredfor the subset of question messages’ propositionsthat contain concepts physicians repeated in theanswers. We also did the same for the subset ofanswer propositions containing concept argumentsfrom the questions. From this analysis, we wereable to tell what semantic relationship typesa

he answer. We refer to these as co-referenced re most often used in leading the questioner

520 L.A. Slaughter et al.

to concepts that might be new to them. Forexample, when a question states ‘‘Wolfram’ssyndrome is diagnosed by genetic tests . . .’’ andthe answer discusses, ‘‘these genetic tests (theco-referenced concept) analyzes nuclear andmicrosomal disorders,’’ the questioner did notexpress ‘‘nuclear disorders and microsomal disor-ders,’’ so these might be unknown concepts to thatperson.

We then calculated the frequency that theimplied relationships occurred between the ques-tions and the answers. These were the relationshipsbetween a question concept and an answer con-cept that are not explicitly written. They requireexternal knowledge to understand. For example,‘‘hemiplegic migraine isa migraine’’ was not explic-itly stated by the physician when he wrote, ‘‘Theheadache that accompanies migraine is due to dila-tion of superficial vessels. This is often preceded byvessel constriction and this narrowing, if severe,can cause TIAs and this is known as hemiplegicmigraine.’’ The questioner had only asked aboutmigraines in the question ‘‘I have had migrainesfrequently for the past twenty years. . .’’ and there-fore the reader must deduce that the physician is

4. Results

The results are organized according to the threemain research questions. In the first section,we discuss the types of semantic relationshipsexpressed in the question—answer texts and theirfrequency of occurrence. We then give the resultsconcerning how answer concepts are connectedto the questions asked by health consumers. Thelast section covers the discourse structure of thepropositions within answers. These results, takentogether, characterize the semantic structureof consumer health questions and physicians’answers.

4.1. Results of coding semanticrelationships

4.1.1. New relationships added to the UMLS SNThe changes to the Semantic Network includedrevisions to the definitions and the hierarchicalstructure as well as the addition of new semanticrelationships. The final set of semantic relation-ships is listed in Table 1. This section briefly reviewstttrob

uwcsrwafNrcabwlutApNIpw

discussing a more specific type of migraine, calledhemiplegic migraine.

3.5. Patterns of semantic relationshipswithin answers

We calculated the frequency of what we term‘‘direct answers’’ within the set of semantic rela-tionships in the answers. A direct answer is providedby physicians when they have repeated the seman-tic relationship of the questioner in a completedform; they fill-in the missing concept, relationship,or answer ‘‘yes, true’’ or ‘‘no, false’’ to verifica-tions requested. The only other method for receiv-ing an answer to questions is through inferencesto further concepts using the semantic relationshipconnections in the answers.

Our next step was to examine the graphs of theseintra-answer connections. The main reason for cre-ating graphs of the propositions was to examine thediscourse structure of physicians’ answers. Fig. 3illustrates how several sentences from an answerform a sub-graph (extracted from a larger com-plete answer graph). Due to the small sample sizein this study, it is not possible to do calculations onthe characteristics and patterns within the graphs.However, we were able to make several observa-tions concerning the 20 answers by looking at graphsize and the patterns that formed the discoursestructure.

he modifications made to the SN inventory of rela-ionships. Each change was essential to makinghe inventory functional; however, future work isequired to reach the ideal. A more detailed reviewf the changes made to the SN during coding cane found in [30].

The SN relationships subsumed under physicallyrelated to and spatially related to were movednder a new heading, topologically related toith three main subordinate relationships: part of,

onnected to, and location of. The relationshipsubsumed under the Semantic Network affectselationship were not modified substantially;e added facilitates, increases, decreases, andbsorbs as the children of affects. There areour causal relationships in the UMLS Semanticetwork: brings about, causes, produces, andesults in. Because these relationships are similar,auses, produces, and results in were removed,nd all the causal relationships were coded asrings about. The UMLS definition for occurs inas modified to remove references to time and

ocation. Additional time relationships were addednder temporally related to: age of, delays, dura-ion of, time position of, and cyclic frequency of.

significant number of relationships under thearent conceptually related to in the Semanticetwork addressed evaluation and measurement.

n coding the texts, it became apparent that com-arison relationships were necessary. Compared toas introduced and placed under analyzes.

Semantic representation of consumer questions and physician answers 521

Fig. 3 An example of a segment of answer text represented as a conceptual graph. The bolded relationship, treats,was a direct answer to the question statement ‘‘what treats Parkinsonian symptoms?’’.

4.1.2. Coding agreementIn our assessment of stability of coding, intra-raterreliability percent agreement was 88%. For theinter-rater coding, Coder 1 identified a total ofeight relationships in the question text and 38 in theanswer texts. Coder 2 identified a total of 12 rela-tionships in the question text and 119 in the answertexts. Of these, 24 instances matched between thetwo coders with both semantic relationship as wellas the arguments identified being exactly the same.The mean percentage of exact agreement betweenthe two coders was 35%. Differences between therelationships identified, such as the use of fre-quency of and property of led to re-examination ofthose relationships (including definitions and oftenstructural arrangement) within the inventory weused for coding.

We show the results of the coding performed byour two experts in Table 2. The table illustratesthe differences between coders’ coding styles and

selection of semantic relationships. From the text‘‘Most parents are quite familiar with middle earinfections, which occur behind the eardrum’’,Coder 2 and the authors both identified middleear infections having the location ‘‘behind theeardrum.’’ The difficult part of the sentence was tobe able to represent ‘‘Most parents are quite famil-iar with middle ear infections.’’ Coder 1 chose tomake ‘‘familiarity’’ or ‘‘commonality’’ a propertyof ‘‘middle ear infection’’ while Coder 2 inserteda new relationship into the coding scheme, fre-quency of. The authors felt that the relationshipexpressed, familiarity, is a cognitive state relation-ship (such as to know and to think) not representedin the UMLS.

4.1.3. Semantic relationship frequenciesIn total, we identified 97 relationship instances inthe questions and 334 instances in the answers.The frequency of relationship occurrence is shown

522 L.A. Slaughter et al.

Table 2 Semantic relationship coding: inter-rater comparison

Semantic relationships

Coder 1 Coder 2 Authors

location of location ofObject middle ear infections Object middle ear infectionsLocationSite behind eardrum LocationSite behind eardrum

property of frequency ofPhenonemon middle ear infection Event middle ear infectionProperty value common Frequency common

has family relationshipFamilyMemberA childFamilyMemberB parent

There are two main types of ear infection: otitis media (infection of the middle ear) and otitis externa (infection of the outerear). Most parents are quite familiar with middle ear infections, which occur behind the eardrum. The patient, usually ayoung child, is feverish, irritable and has ear pain. Children too young to speak often simply tug at their ears to indicate theirdiscomfort. Diagnosis, which involves viewing the ear canal and eardrum with an instrument called an otoscope, is quite simple.Treatment frequently involves a course of oral antibiotics. Note: The semantic relationship instances were identified within theitalicized sentence.

in Table 1. The relationships most often occur-ring in the questions were: prevents, treats,brings about, co occurs with, precedes, and dura-tion of. The most frequently identified relation-ships in the answers were: treats, brings about,diagnoses, compared to, and isa. An additionalrelationship, definition of, was created in orderto track segments of text that provide definitionsof health concepts, and these occurred with highfrequency in the answers. We used relationship xwhen the relationship asserted between two (ormore) concepts was the focus of the question, as in‘‘Do you know of any differences between Prilosecand Prevacid?’’ Four percent of all the semanticrelationship instances identified in questions wererelationship x.

4.2. Question-to-answer co-referencedpatterns

A total of 497 co-references occurred betweenthe 97 question propositions and 334 answerpropositions identified. Question relationshipsthat most frequently contained co-referencedarguments with answer relationship instances

In 70% (350/497) of the co-references fromabove, the answer relationship linked a questionconcept with a concept not stated in the sub-question. We calculated that 21% (74/350) of the70% were not stated in any part of the ques-tion text and are potentially new concepts to thequestioner. Some example concepts introduced bythe physicians are listed in Table 3 (in the lastcolumn).

The idea of an ‘‘implied’’ relationship is thatconcepts in questions and answers are linkedtogether through external knowledge, meaning thata relationship was never stated in the text. Thesearch for ‘‘implied’’ relationships was very sub-jective; however, it provided an account of some ofthe implicit relationships existing between questionand answer concepts. We identified 78 ‘‘implied’’semantic relationships between the questions andanswers; these are shown in Table 4. The mostprevalent ‘‘implied’’ relationship identified wasisa. Most often (70%), a broader concept is statedin the question and a more specific concept is pre-sented in the answer. In the other 30%, the oppositeoccurred.

4

4aA2mqa

were: brings about (127/497, 25.6%), duration of(45/497, 9.1%), co occurs with (44/497, 8.9%),relationship x (42/497, 8.5%), and treats (41/497,8.2%). Answer relationships that most frequentlycontained arguments that co-referenced con-cepts in the question relationship instances were:brings about (121/497, 24.3%), isa (62/497, 12.5%),co occurs with (48/497, 9.7%), diagnoses (42/497,8.5%), and treats (36/497, 7.2%).

.3. Related concepts in answers

.3.1. How answers relate to questions—–directnswerst least one direct answer was provided to 6 of the0 sub-questions (6 out of the 12 question—answeressages had a direct answer). For the 14 sub-uestions that do not have at least one directnswer, the questioner receives an answer either

Semantic representation of consumer questions and physician answers 523

Tabl

e3

Exam

ples

oflin

ked

ques

tion

-to-

answ

erpr

opos

itio

nsle

adin

gto

conc

epts

that

are

pote

ntia

llyne

wto

the

ques

tion

er

Exam

ple

Conc

ept

inqu

esti

onpr

opos

itio

nQ

uest

ion

rela

tion

ship

Co-r

efer

ence

dco

ncep

tAn

swer

rela

tion

ship

Rela

ted

conc

ept

inan

swer

prop

osit

ion

(fur

ther

sear

chte

rm)

1se

vere

spin

alst

enos

is→

prec

edes

dizz

ines

s→co

occu

rsw

ith

dise

quili

briu

m2

Wol

fram

’ssy

ndro

me→

diag

nose

s(i

nver

se)

gene

tic

test

s→an

alyz

esm

icro

som

aldi

sord

ers

3m

osqu

ito

bite

s→br

ings

abou

tal

lerg

icre

acti

on→

brin

gsab

out

light

head

edne

ss4

buzz

ing

soun

d→co

occu

rsw

ith

dizz

ines

s→br

ings

abou

t(i

nver

se)

acou

stic

neur

oma

Table 4 Frequency of implied relationships

Relationship type %

1 topologically related to1.1 part of 6.41.3 location of 3.92 functionally related to2.1 affects 4.32.1.11 treats 13.82.2 brings about 16.73 temporally related to3.1 co-occurs with 3.94 conceptually related to4.1 analyses 5.24.2 property of 1.04.3 requires 3.15 isa 37.4

Additional relationshipssynonym specified 4.2

The relationships occurring with a frequency greater than 5%are shown in bold italic type. N = 78.

through links to co-referenced answer frames orthrough implied relationships (or through both). Ofall the answer relationship instances summarizedin Table 2, those providing direct answers make uponly 3% (10/334) of all the relationship instances inthe answers. Not many answers contain a semanticrepresentation of the exact format specified by thequestioner. We found that only 16% (53/334) con-tain a co-referenced concept with a sub-questionand 7% (23/334) contain a co-referenced conceptwith a background sentence. The 74% (247/334)remaining answer relationship instances are notdirectly connected to a question concept but maybe linked with other answer relationships.

4.3.2. Discourse patternsIn the previous section, we found that the answerswe analyzed contained many semantic relation-ship propositions that are not connected at allto a question concept (74% of them). We nextanalyzed the structure of answers by connectingthe propositions as graphs. We identified semanticrelationships that involved anaphor participants,and so it was possible to connect statements fromsentence to sentence. Hence, we assumed andottigbwT

bserved that most of the concepts discussed inhese paragraphs were interconnected throughhe semantic relationships. In all answers, theres one main graph along with one or more smallerraphs. Overall, we see one large connected graphecause the answers physicians wrote consisted ofell-formed and conceptually related paragraphs.hese smaller graphs exist because there is no

524 L.A. Slaughter et al.

semantic relationship to connect them with anyof the concepts within the main (larger) answergraph. However, these smaller graphs shouldbe connected to the main graph because, in nocase, did a physician talk about two disparatetopics. On further inspection we found that therewere two possible reasons why they were notconnected; either a semantic relationship wasmissed in the analysis or making these connec-tions required external knowledge not explicitlystated in the answer. In Table 5, a summary ofthe number of graphs constructed from eachanswer is shown. We provide both the number ofconcept nodes and the semantic relationships inorder to give an idea of the size of the graphswithin each answer. Relationships that we didnot include as part of the semantic relationshipinventory, definition of, has family relationship,same concept synonymous term, and relation-ship x were not included in these graphs.

There are some concepts in answers that partic-ipate in numerous semantic relationship frames. Inthe 12 answer graphs, we observed that one or twoconcepts have a very large number of semantic rela-tionships connected with them, in comparison with

Table 5 Size and number of conceptual graphs gen-erated from answer frame instances

Answer number Graph |C| ||SR||1 a 16 20

b 9 8c 3 2d 2 1e 2 1

2 a 19 20b 2 1

3 a 14 16b 8 7c 2 1d 2 1

4 a 19 25b 2 1

5 a 10 14b 5 4c 4 3

6 a 14 16

7 a 13 12b 4 4c 3 2d 3 2

8 a 24 28b 3 2c 2 1

9 a 11 10b 2 1c 2 1

10 a 16 15

11 a 14 16b 6 5c 3 2d 2 1e 2 1

12 a 33 35b 3 2

Total 279 281

|C| denotes the number of concept nodes in the graph and||SR|| denotes the number of semantic relationships.Note: Only 281 of the 334 semantic relationships were rep-resented in the conceptual graphs because definition of,has fam relation, relationship x, and synonym relationshipswere not included. Each graph constructed from the answerrelationships was assigned a lower case letter according tograph size (the ‘‘main’’ largest graph is labeled ‘‘a’’).

ples, many cycles consisting of semantic relation-ships between 3 or 4 concept nodes were iden-tified within the larger graph structures makingit possible to see explanations in the discoursestructure.

the other concepts. For example, in Fig. 3 above,the concept ‘‘traditional anti-Parkinsonian medi-cation,’’ is found in eight propositions. Conceptswith relatively high number of semantic relation-ships connected to them, compared with the otherconcepts in the graph, can be thought of as the focalconcepts within the answer. We counted 18 of thesefocal concepts in the 12 answer graphs, 7 were con-cepts also expressed in the question statement, 10participated in a semantic relationship with a ques-tion concept (were neighbors of a question concept)and 1 was a distance of two semantic relationshipsfrom a question concept.

In the above cases, we found that sub-patternsin the shape of a star formed around the focal con-cepts. So, for example, in Fig. 3 a star forms aroundthe node ‘‘traditional anti-Parkinsonian medica-tion.’’ This focal concept is connected to manyconcepts and these connected concepts are notconnected to anything else. Larger star graphsthan the example above are found in the other 12answers, an example is shown in Fig. 4. Star pat-terns are often found in explanations that involveshort descriptions of concepts’ properties.

Most graphs, but not all, contained cycles.Cycles are closed paths that form a loop. Forexample, as in the closed loop cycle ‘‘levodopaincreases dopamine,’’ ‘‘dopamine affects Parkin-son symptoms,’’ ‘‘anti-Parkinsonian medicationtreats Parkinson symptoms,’’ and ‘‘levodopa isaanti-Parkinsonian medication.’’ In our 12 exam-

Semantic representation of consumer questions and physician answers 525

Fig. 4 An example of a star-shaped subgraph within an answer shows that there are many concepts discussed thatrelate to the concept lesions.

5. Discussion

Our first task in this study was to identify propo-sitions in question—answer texts and calculate thefrequency of occurrence of semantic relationships.A number of useful observations are based onthe frequency of semantic relationship occurrencewithin the texts. For one, we show which seman-tic relationships from the UMLS Semantic Networkare found within these texts. Also, we learned whatother relationships may be needed if these texts areto be represented in a retrieval system that makesuse of semantic metadata.

Semantic relationships expressing comparison,combined with frequently expressed treats, areessential to providing explanations of differenttreatment options and a frequent purpose ofa consumer question is to compare treatments[31]. We found that these physicians often makecomparisons in answers and this led to a modifi-cation in the UMLS Semantic Network inventory.The compared to, equivalent, similar to, and

different from (see Table 1, # 4.1.7) relationshipswere not part of the UMLS Semantic Networkbut we, along with other researchers, haveidentified a need for the addition [32]. Anotherobservation of the study is that some semanticrelationships occur predominantly in questions andothers occur predominantly in answers. Healthconsumers, for example, often use temporalsemantic relationships in explaining their infor-mation need. The work also provides evidencethat identification of causal relationships is fun-damental to question answering. The brings about(causal) relationship is the most frequentlyexpressed relationship in both questions and in theanswers.

The results from looking at the question-to-answer links provide a picture of how answers arerelated to the concepts asked in the questions.Co-referenced concepts often link brings about,duration of, co occurs with, and treats fromquestions to brings about, isa, co occurs with,diagnoses, and treats in the answers. These are

526 L.A. Slaughter et al.

the most common ‘‘entry points’’ into answerdocuments that lead to additional information thatthe questioner might not be aware of. Also, in theimplied relationship analysis results, we are notsurprised to see a large number of isa relationshipsthat are implied links from question concept toanswer concept. This shows that physicians assumethat they can introduce hierarchically related con-cepts without further explanation and that layper-sons will be able to understand, for instance, that‘‘leukemia isa cancer,’’ through their own externalknowledge. Lastly, in our analysis of whetherphysicians provide what we call direct answers,meaning that the physician repeats the questionproposition exactly as stated, we have shownthis does not always occur. Instead, we see thatphysicians expand on the question concepts in theiranswers to provide explanations, and these resultshave been briefly described in an earlier paper[31].

From the graph analysis of within-answersemantic relationships, we learned about semanticrelationship patterns within answers. Graphs givea visual representation of the discourse structureof answers. If, for example, a graph contains few

study we assumed that the questions posed inthese ‘‘Ask-the-Doctor’’ texts are representativeof lay information needs and that our analysisof the semantic relationships can be applied tofuture information retrieval/educational systemsfor health consumers. A second assumption is thatphysicians believe these answers are useful andappropriate for laypersons. This work can formthe basis for an interpretive layer to mediatebetween lay (illness model) and professional(disease model) as proposed by Soergel et al.[33]. Building techniques that help systems bridgeconsumer-level language to professional-levellanguage will require semantic interpretationcomponents.

A finding that is important for semantic-basedinformation retrieval research concerns thebrings about (causal) relationship; it is the mostfrequently expressed relationship in both ques-tions and in the answers. Our results might helpclarify why Khoo et al. [20] found that partialrelation matching in which one member (i.e.term) of the relation is a wildcard is especiallyhelpful for retrieval using causal relationships.They determined that the most successful typeoocttwatescacbrcs

tbOoptlacaaw‘

concepts and a large number of semantic relation-ships, we know that the physician has made manyconnections between very few ideas. However,our graphs contain an almost equal number ofconcepts and semantic relationships, meaningthat in most cases, the physician writes answerscontaining a variety of concepts and connectsthem with an almost equal number of semanticrelationships. Of interest are the star sub-patternsthat form around the focal concepts of answersand closed-loop cycles. One method of arriving atinformation that might prove helpful for lay userswould be to identify potential focal points andthen locate many different semantic relationshipswith that ‘‘focal concept’’ participant (e.g. whatare its properties using property of, what treats it,what prevents it, who diagnoses this). Searchingfor cycles might be a way to locate segments oftext providing ‘‘closed’’ explanations used for rea-soning and helping users understand inter-relatedconcepts.

5.1. Implications

One reason for completing the research reportedin this paper is to begin to explore howsemantic-based retrieval techniques and externalrepresentations of semantic relationships can beused within health consumer information anddecision systems. We have made two importantassumptions with this work. While conducting this

f causal relation matching occurred with the usef a wildcard (‘‘*’’). So, for example, ‘‘smokingause *’’ or ‘‘* cause cancer’’ was more effec-ive than the complete concept-relation-conceptriple ‘‘smoking cause cancer.’’ In our work,e saw that brings about relationships found innswers often contain concepts that are new tohe questioner, and hence could only be recov-red with a wildcard search. Brings about is theemantic relationship with the highest number ofo-referenced arguments linking the questions andnswers. We further observed that long causalhains exist in the answer texts and these mighte part of a retrieval strategy that providesesults for assisting users in understanding multipleausative factors leading to an illness or certainymptoms.

Causal relationships play a key role in theseexts, but extraction of other relationships can alsoe expected to improve retrieval performance.ur results provide a picture of the combinationsf semantic relationships that might be useful inartial matching techniques for lay health informa-ion. We found that co-referenced concepts oftenink brings about, duration of, co occurs with,nd treats from questions to brings about, isa,o occurs with, diagnoses, and treats in thenswers. For retrieval applications, if a questionersks ‘‘smoking brings about lung cancer’’, thene expect that it will be useful to search paths

‘lung cancer isa (*)’’, ‘‘(*) treats lung cancer’’,

Semantic representation of consumer questions and physician answers 527

‘‘(*) co occurs with lung cancer’’, and/or ‘‘(*)diagnoses lung cancer’’ in addition to ‘‘lung cancerbrings about (*).’’

5.2. Limitations

The major limitation of this study involves issuesrelated to coding propositions within texts. Thesemantic relationship instances coded from thetext went through numerous iterations beforearriving at a set that, although not perfect,reflects a usable representation. The first authormanually identified semantic relationships usinga set of coding rules that are documented in [30]and was consistent at applying the instructions.We feel that 88% intra-rater reliability is accept-able and did not warrant another iteration ofcoding.

Identification of semantic relationships, likeall indexing, is subjective and represents thereaction of human beings to the information theyare processing. The results indicated differencesin the granularity of coding for the two coders whowere asked to look at one question and answerpair. Coder 2, probably due to training as anirwotcsrsp

icatwWtr

otoosatatki

the construction of consumer health retrieval sys-tems.

5.3. Future work

The perspective taken from the onset was that theanalysis results and products (both the semanticrelationship classes and instances) would be appliedto future experimentation with systems such ashealth-consumer concept exploration and questionformulation interfaces. The next stage of researchmight focus on how semantic information presentedin the user interface affects laypersons’ cognitivestructures and abilities to express their informa-tion needs. Further work will determine whetherand how views of a knowledge structure help healthconsumers to understand the context of the search,aids in understanding, and facilitates constructionof correct mental models.

This work also outlines a methodology for ana-lyzing semantic characteristics of texts. We com-pleted a small sample of question—answer pairsfor this study due to the fact that manual extrac-tion is a difficult task and we did not knowwhether any useful results would emerge. FuturewosgAptmtarcr

6

TcqehrtRdrart

ndexer, was more exhaustive than Coder 1. Thisesulted in revisions to our own coding rules sincee realized that we needed to define the levelf representation of texts. We also found thathe coders differed on process of and property ofodings, resulting in modifications to our codingcheme definitions and a closer look at theseelationships. In all, the two coders contributedignificantly to improving the quality of the codingrocedures.

We felt that 35% mean percent agreement for thenter-rater reliability was low but expected. Havingoders identify concepts and relationships togethernd obtaining exact agreement is quite difficult. Forhis to happen 24 times in the texts they coded onceith minimal training was actually quite surprising.e felt that with additional training and prac-

ice, we could achieve a much higher agreementating.

How two or more concepts are related tone another can lead to philosophical questionshat can be difficult to resolve. The purposef this work was not to construct an ontol-gy of medical semantic relationships for con-umer health question answering. We made thessumption that the UMLS Semantic Network rela-ionships would be a valid set to build fromnd that it would be adequate for representinghe texts. Our intent was to create a useablenowledge base that will allow us to character-ze these texts in order to lay a foundation for

ork will concentrate on automating extractionf semantic relationship instances from a largeret of question—answer pairs. Use of natural lan-uage processing systems, such as SemRep [34] andQUA [35] that use the UMLS to recover semanticropositions from biomedical texts can facilitatehis process. Extracting semantic information fromany question—answer pairs will make it possible

o repeat our analysis and statistically determinedditional patterns that result in rules for findingelevant texts, for example, through search of spe-ific paths of connections within conceptual graphepresentations.

. Conclusions

he study identified some important findings con-erning the semantic representation of consumers’uestions and physicians’ answers in a quasi-formallectronic environment. We determined that bothealth consumers and physicians express causativeelationships and these play a key role in leadingo further related concepts in the answer texts.egarding the answering strategy of physicians, airect correspondence between the semantic rep-esentations of the questions and those from thenswers exists only about 30% of the time. Theseesults might be exploited to help determine theypical characteristics of answers to be used in

528 L.A. Slaughter et al.

retrieval and might be used to present related con-cepts to users who are browsing medical informa-tion.

Acknowledgements

The authors would like to thank Keith Cogdill,Stephen B. Johnson, Kent L. Norman, and MarilynD. White for their help with this study. This workwas supported in part by the Beta Phi Mu DoctoralDissertation Fellowship, the Eugene Garfield Doc-toral Dissertation Fellowship, and National Libraryof Medicine (NLM) Training Grant LM07079-11.

References

[1] Q. Zeng, S. Kogan, N. Ash, R. Greenes, A. Boxwala, Char-acteristics of consumer terminology for health informationretrieval, Meth. Inform. Med. 41 (2002) 289—298.

[2] Q.T. Zeng, S. Kogan, R. Plovnick, J. Crowell, R.A. Greenes,E.M. Lacroix, Positive attitude and failed queries: an explo-ration of the conundrums of consumer health informationretrieval, Int. J. Med. Inform. 73 (1) (2004) 45—55.

[3] J.L. Bader, M.F. Theofanos, Searching for cancer informa-

[15] L.T. Detwiler, E. Chung, A. Li, J.L. Mejino, A.V. Agoncillo,J.F. Brinkley, C. Rosse, L.G. Shapiro, A relation-centricquery engine for the foundational model of anatomy, in:Proceedings of the MedInfo’2004, San Francisco, CA, 2004,pp. 341—345.

[16] B. Smith, C. Rosse, The role of foundational relations in thealignment of biomedical ontologies, in: Proceedings of theMedInfo’2004, San Francisco, CA, 2004, pp. 444—448.

[17] A. Chakravarthy, K. Haase, NetSerf: using semantic knowl-edge to find internet information archives, in: Proceedingsof the 18th Annual International ACM SIGIR Conference onResearch and Development in Information Retrieval, Seat-tle, WA, 1995, pp. 4—11.

[18] A.S. Pollitt, M.P. Smith, M. Treglown, P.A.J. Braekevelt,View-based searching systems—–progress towards effectivedisintermediation, in: Proceedings of the 20th InternationalOnline Information Meeting, London, December, LearnedInformation Limited, 1996, pp. 433—446.

[19] C. Khoo, S.H. Myaeng, Identifying semantic relations totext for information retrieval and information extraction,in: R. Green, C.A. Bean, S.H. Myaeng (Eds.), The Seman-tics of Relationships: An Interdisciplinary Perspective,Kluwer Academic Publishers, Dordrecht, 2002, pp. 161—180.

[20] C. Khoo, S.H. Myaeng, R. Oddy, Using cause-effect relationsin text to improve information retrieval precision, Inform.Process. Manage. 37 (1) (2001) 119—145.

[21] T. Miles-Board, S. Kampa, L. Carr, W. Hall, Hypertext in thesemantic web, in: Proceedings of the HT’01, Aarhus, Den-mark, 2001.

[

[

[

[

[

[

[

[

[

tion on the internet: analyzing natural language searchqueries, J. Med. Internet Res. 5 (4) (2003) e31.

[4] D. Lindberg, B. Humphrey, A. McCray, The Unified MedicalLanguage System, Meth. Inform. Med. 32 (1993) 281—289.

[5] A. McCray, W. Hole, The scope and structure of the firstversion of the UMLS Semantic Network, Proc. Annu. Symp.Comput. Appl. Med. Care (1990) 126—130.

[6] D. Soergel, Some remarks on information languages, theiranalysis and comparison, Inform. Storage Retrieval 3 (1967)219—291.

[7] R. Green, Relationships in the organization of knowledge:an overview, in: C.A. Bean, R. Green (Eds.), Relationshipsin the Organization of Knowledge, Kluwer Academic Pub-lishers, 2001, pp. 3—18.

[8] N. Guarino, C. Welty, Identity and subsumption, in: R.Green, C. Bean, S. Hyon Myaeng (Eds.), The Semantics ofRelationships: An Interdisciplinary Perspective, Kluwer Aca-demic Publishers, 2002, pp. 111—126.

[9] R.J. Brachman, What IS-A is and isn’t: an analysis of tax-onomic links in semantics networks, IEEE Comput. 16 (10)(1983) 30—36.

[10] J.F. Allen, Time and time again: the many ways to representtime, Int. J. Intell. Syst. 4 (1991) 341—356.

[11] J. Odell, Six different kinds of composition, J. Object-Oriented Programm. (1994) 8.

[12] G. Miller, R. Beckwith, C. Fellbaum, D. Gross, K. Miller, Fivepapers on WordNet, Int. J. Lexicogr. 3 (4.) (1990), specialissue.

[13] J. Rogers, A. Rector, GALEN’s model of parts and wholes:experience and comparison, in: J.M. Overhage (Ed.), Amer-ican Medical Informatics Association Symposium, Hanley &Belfus Inc., Philadelphia, 2000, pp. 714—718.

[14] A.L. Rector, C. Wroe, J. Rogers, A. Roberts, Untangling tax-onomies and relationships: personal and practical problemsin loosely coupled development of large ontologies, in: Pro-ceedings of the K-CAP’01, ACM, Victoria, BC, 2001, pp.139—146.

22] E. Mendonca, S.B. Johnson, Y.H. Seol, J.J. Cimino, Ana-lyzing the semantics of patient data to rank records ofliterature retrieval, in: Proceedings of the ACL-02 Work-shop on Natural Processing in the Biomedical Domain, June11, Association for Computational Linguistics, Philadelphia,PA, 2002.

23] E.A. Mendonca, J.J. Cimino, Automated knowledge extrac-tion from MEDLINE citations, Proc. AMIA Symp. (2000)575—579.

24] C.A. Brandt, L.A. Fernandes, S.M. Powsner, K.J. Ruskin,P.L. Miller, Using coded semantic relationships to retrieveand structure clinical reference information, in: Proceed-ings of the 1997 American Medical Informatics Associa-tion Annual Fall Symposium, Nashville, TN, October 25—29,1997.

25] S. Bechhofer, R. Stevens, G. Ng, A. Jacoby, C.A. Goble,Guiding the user: an ontology driven interface, in: N.W.Paton, T. Griffiths (Eds.), UIDIS: Workshop on User Interfacesto Data Intensive Systems, Edinburgh, 1999, pp. 158—161.

26] W. Kintsch, Comprehension: A Paradigm for Cognition, Cam-bridge University Press, New York, 1998.

27] D. Jonassen, K. Beissner, M. Yacci, Structural Knowledge:Techniques for Representing, Conveying, and AcquiringStructural Knowledge, Lawrence Erlbaum Associates, Hills-dale, NJ, 1993.

28] Medical Library Association, MLANET: A User’s Guideto Finding and Evaluating Health Info on the Web.http://www.mlanet.org/resources/userguide.html.

29] J.F. Sowa, Conceptual graphs summary, in: P. Eklund, T.Nagle, J. Nagle, L. Gerholz (Eds.), Conceptual Structures:Current Research and Practice, Ellis Horwood, 1992, pp.3—52.

30] L. Slaughter, Semantic relationships in health consumerquestions and physicians’ answers: a basis for representingmedical knowledge and for concept exploration interfaces,Doctoral Dissertation, University of Maryland at CollegePark, 2002, Dissertation Abstracts International.

Semantic representation of consumer questions and physician answers 529

[31] L.A. Slaughter, D. Soergel, How physicians’ answers relateto health consumers’ questions, in: Proceedings of theAmerican Society for Information Science and Technology(ASIST), 2003, pp. 28—39.

[32] H. Yu, C. Friedman, A. Rhzetsky, P. Kra, Representinggenomic knowledge in the UMLS Semantic Network, Proc.AMIA Symp. (1999) 181—185.

[33] D. Soergel, T. Tse, L. Slaughter, Helping healthcare con-sumers understand: an ‘‘interpretive layer’’ for finding

and making sense of medical information, Medinfo (2004)931—935.

[34] T.C. Rindflesch, M. Fiszman, The interaction of domainknowledge and linguistic structure in natural language pro-cessing: interpreting hypernymic propositions in biomedicaltext, J. Biomed. Inform. 36 (6) (2003) 462—477.

[35] S.B. Johnson, A. Aguirre, P. Peng, J. Cimino, Interpretingnatural language queries using the UMLS, Proc. Annu. Symp.Comput. Appl. Med. Care (1993) 294—298.