A Logic-Based Approach to Thesauri as a Resource for Information Retrieval

10
A Logic-*rased l\pproach to Thesaurus lModelling ficbastillr Goeser llli\I l)cuts<'hlurrl I,,nlwicklrrng (JmhlI Al,l Rctriu'al lintwiclung 3 I lanlrs-Klernrrr-Str. 45 71034 lliiblingur ,'\bslract: 'l'his ltapcr tliscusscs a. logic-basecl approllch to thesatrrus mofletling. -l'hcsa.uri al'e $ecn as seura.ntic ressourccs {br so-<;allcd sel'nanlic iltforrnation rclrieval s;,stcms rvhich us,r conccptrral rlcscr"iption.s in thc rtratr:hirrg cornponcut of a, ret.r.icval s)/slctn.'['hc rclational moclel rrnrk:rlving tr;rrjitiorral lhesauri is discrrsstd.'l'he logic- t'astxl a.pl-'rctach extcnrls this rnodcl lu n l.hesaural sernanlic nctu,r;rk l;ased on a rclatittn type lticrarr:hy, that slrpports irfercntial rnct:hanisrns both in <lclilil.igtr zrrrcl tctrierval ol'thcsa.u.rus itctns. 'l'hc logic-lrased apJrroac:h is irnplcrnented in II]N,l's proclr.rct 'l'hcsaunrs Adrrrinistratoli 2, rvhose cditirrg ancl display capabilities ir.lr) e,ilpl;r.ined. trr !"r'eisJ ilctlnlrl 'l'lttrsauri :trc t.he terrnirtological ressources of scrnanl.ic in.ftlrmation rctricval (lilR) svslctns. .Iust its facl.ua.t experl. k.nolvlcdger cnnsists ,rf lcrminological "'I'-boxes" plus txtttlingcltt knowlcdge [5], SIR knotvlcxlgc consists of l>oth thesaurus dala str.uctures rr,:prcsenting a. dornain ternninology, ancl a se{. o{'<locrrrnernt contcnt desr:ripti61ps. Aga.in analogor-ts to cltssifir:ation of obsen'aliorrs in cxpcrf systems, SII{ klqrvlcdge is appliccl to tltc clatisifical.iotr ol'clocurncrrl sttrrcturc: usirrg rncthods florn cqug::Jrl- brs,'lcl autorrratic irrchxing. SII{ s1'stcrns a.r* characlt:t'izcd tht'ou1rh nral.clring scarch topics ancl docurnont ()()ulent <:n tlrc lcr'i:l of r:xplicit and senrautir,ally urrirlrrc rlcscriptiorrs Ill]..1'he basic . clt:rncnts in all kinds of SII{ dcs(.:riplions are conr:epts, i,c, abslractions lirxu nat.ural latrgrtagc rvil.tr a. r-rniqrxr tneaning r:onIr'ilrrrl,iorr 1o the rn;rt,clring prgccss. Ä tll:saurus is ll.txtlnplcxtlala siruclurc tlrat r:irn hrc usccl to rrrodr:i Sll{ urncepts.'l'fur: unic;ue rtrnFpilrg frorn tlocr.rnrcnt i;lrLrc:trrrtr to concelrts is a procr:ss of cqlnslrrrcting lhc rloctnnctrts's cttlt'letrt <lerscripl,iotr, N,lirpping clor:rrrncnts anri fopir.:s inlo onc-an<l-the lllltltc thr:sltttl-t.lri tnay s:olvc thc vocabulary g:r1> problcnr in inlbrnration r-ctricval, i.c. tltc probletn {hat tlrc sanle concepi's dcscripl.iorrs t,ry scart:hcrs ancl docurpenl. atrtlrors ma1l dif{i'r rnclically itr vocahulary. Wbilo tbesauri prescrl,c semarrtic distancer lxrtH'ccn conccpls, they irbstracf. away fium rrrany o{'thosc surfirce characterislics 9f docurnents tlm1 ncgal.ivcly irniract rclrioval effcctiverncss. '['r'aditional tltcsaud, as rJocumeulcd in a nurnbcr ol'inch.rstry stancla,rds [7], do rrot urvcr thr: rlcerls ol'rr Sll{ l.cssour.ce. 'f'hc consl.rur-:tion of traditional thcsauri con- titrt-tcs 1o Lrc vcry cxpc,ttsivc and to rcquirc dolnain cxpcrts and infrrrrnal.i1;ll spcr:iaL- isl.s t<; cooperate (set c.s. [2]) Nltany imtrxrrtant gcncralizaliq.ttrs oyer concepts crattnol be represr:nl.cr-|, ancl in spite of fheir :rppa,rctrt unifolrnity, not tpo rnuclr is l<ttttwlt ftllcluf. thcir ar.tlntnaJic aquisition. Neverlhclcss, lfue mere cxislclrr.r,r qf aroulrl A [ ,ngl<:-l'r;tsod ,'\ i-:$l-r:nc:lr to 'l'!rerialrri.rs i\trodr:lling

Transcript of A Logic-Based Approach to Thesauri as a Resource for Information Retrieval

A Logic-*rased l\pproach to Thesaurus lModelling

ficbastillr Goeser

llli\I l)cuts<'hlurrl I,,nlwicklrrng (JmhlI

Al,l Rctriu'al lintwiclung 3

I lanlrs-Klernrrr-Str. 45

71034 lliiblingur

,'\bslract: 'l'his ltapcr tliscusscs a. logic-basecl approllch to thesatrrus mofletling.-l'hcsa.uri al'e $ecn as seura.ntic ressourccs {br so-<;allcd sel'nanlic iltforrnation rclrieval

s;,stcms rvhich us,r conccptrral rlcscr"iption.s in thc rtratr:hirrg cornponcut of a, ret.r.icvals)/slctn.'['hc rclational moclel rrnrk:rlving tr;rrjitiorral lhesauri is discrrsstd.'l'he logic-t'astxl a.pl-'rctach extcnrls this rnodcl lu n l.hesaural sernanlic nctu,r;rk l;ased on arclatittn type lticrarr:hy, that slrpports irfercntial rnct:hanisrns both in <lclilil.igtr zrrrcl

tctrierval ol'thcsa.u.rus itctns. 'l'hc logic-lrased apJrroac:h is irnplcrnented in II]N,l'sproclr.rct 'l'hcsaunrs Adrrrinistratoli 2, rvhose cditirrg ancl display capabilities ir.lr)e,ilpl;r.ined.

trr !"r'eisJ ilctlnlrl'l'lttrsauri :trc t.he terrnirtological ressources of scrnanl.ic in.ftlrmation rctricval (lilR)svslctns. .Iust its facl.ua.t experl. k.nolvlcdger cnnsists ,rf lcrminological "'I'-boxes" plustxtttlingcltt knowlcdge [5], SIR knotvlcxlgc consists of l>oth thesaurus dala str.ucturesrr,:prcsenting a. dornain ternninology, ancl a se{. o{'<locrrrnernt contcnt desr:ripti61ps.Aga.in analogor-ts to cltssifir:ation of obsen'aliorrs in cxpcrf systems, SII{ klqrvlcdgeis appliccl to tltc clatisifical.iotr ol'clocurncrrl sttrrcturc: usirrg rncthods florn cqug::Jrl-brs,'lcl autorrratic irrchxing.

SII{ s1'stcrns a.r* characlt:t'izcd tht'ou1rh nral.clring scarch topics ancl docurnont()()ulent <:n tlrc lcr'i:l of r:xplicit and senrautir,ally urrirlrrc rlcscriptiorrs Ill]..1'he basic

. clt:rncnts in all kinds of SII{ dcs(.:riplions are conr:epts, i,c, abslractions lirxu nat.urallatrgrtagc rvil.tr a. r-rniqrxr tneaning r:onIr'ilrrrl,iorr 1o the rn;rt,clring prgccss. Ä tll:saurusis ll.txtlnplcxtlala siruclurc tlrat r:irn hrc usccl to rrrodr:i Sll{ urncepts.'l'fur: unic;uertrnFpilrg frorn tlocr.rnrcnt i;lrLrc:trrrtr to concelrts is a procr:ss of cqlnslrrrcting lhcrloctnnctrts's cttlt'letrt <lerscripl,iotr, N,lirpping clor:rrrncnts anri fopir.:s inlo onc-an<l-thelllltltc thr:sltttl-t.lri tnay s:olvc thc vocabulary g:r1> problcnr in inlbrnration r-ctricval, i.c.tltc probletn {hat tlrc sanle concepi's dcscripl.iorrs t,ry scart:hcrs ancl docurpenl.atrtlrors ma1l dif{i'r rnclically itr vocahulary. Wbilo tbesauri prescrl,c semarrtic distancerlxrtH'ccn conccpls, they irbstracf. away fium rrrany o{'thosc surfirce characterislics 9fdocurnents tlm1 ncgal.ivcly irniract rclrioval effcctiverncss.

'['r'aditional tltcsaud, as rJocumeulcd in a nurnbcr ol'inch.rstry stancla,rds [7], do rroturvcr thr: rlcerls ol'rr Sll{ l.cssour.ce. 'f'hc consl.rur-:tion of traditional thcsauri con-titrt-tcs 1o Lrc vcry cxpc,ttsivc and to rcquirc dolnain cxpcrts and infrrrrnal.i1;ll spcr:iaL-isl.s t<; cooperate (set c.s. [2]) Nltany imtrxrrtant gcncralizaliq.ttrs oyer conceptscrattnol be represr:nl.cr-|, ancl in spite of fheir :rppa,rctrt unifolrnity, not tpo rnuclr isl<ttttwlt ftllcluf. thcir ar.tlntnaJic aquisition. Neverlhclcss, lfue mere cxislclrr.r,r qf aroulrl

A [ ,ngl<:-l'r;tsod ,'\ i-:$l-r:nc:lr to 'l'!rerialrri.rs i\trodr:lling

60{} thcsauri in Iiuxrpcan languagcs (sec [12]), call.q for a prescn,ttion of trarlitionalllrcsiturus capabilitics.

Our thcsarrrus apJrt-oa.ch rcf<rrmulates traclil.ional tlresar-rri on thc basi-s of an abstractclata st.ructurc callccl tlre l-ogical Moderl (l,M), and extcnds thcm through a relationlypc hicrarchy, an atlributc conccpt, and thcorcm bascd dcfinition, clcrival.iol, anrlchccking of an arbitrary trttnrher of relations. 'l'hc LM conlairrs so-callcd a.xiornittstilnces, i.e' tnlth-{ilnctional rules allou'ing relation inferrence at rcl.rievill time thatcottsidcrably rcclrrce thesaurus construcfion effort. Our a.pproach clearty separatcsrclalio' dcf ili'n, rvherc rclatio' herha'iour is forr.vardly infcrrcrcl, from bai:krvarclapplicirt.ion of thcso dclinitions, thus <lbtailring mo<lifiability of any LIVI part at a.1ypoint il tirne.

'l'hc <trgarrizilticur of lhis papcr is as f<rllorvs: -l'he following chaptcr cliscusses thestruc:lurc of traditional thesauri alrd sornc problerns we lirund in its mq<Jq,,lization."'l'ltr: logic-trased thesaurus approach" on pagc 4 givr:s a detailed account of thel<tgic-basccl thcsaurus itpproach etncl ils inrplcmentation irr IIINt's tlrcsaurus proclg<;ts."Or]tlook: 'l'hesauri itt,sctnaltl,ic III" orr page t0 givcs a surnmary ancl an rrutlool<to frrrthcr SIR progess.

nnoOeiiing tr"ditidral thesauri _-.--

Ilislorical[5', seteral lexicon traclitions lcatl to r,vhat u,s call "thcsaur"us"toclay. WcItavc 1o distinguish a1 lcasl. stylistic, tcrminulogical ancl solnc more expcrimerrtalitrlirrmatiott rctricv:rl thcsauri. Ijrom thesc, the terminological thesaurus copceplc{rvcrs tllost existittg theszrurus data including rnany stylistic thcsauri. T'crrninjogicalthcsatrri arc mainly usetl in manual indexing a.ncl as a professional retricval aicl. 'I-heinlorrnal style of much thesaurus literature can be expiainecl b1r thc fact that it isttot knowleclge engincers but experienced inrlcxing practil.ionr:crs r.vhn arc adrcsscd.In lltc follou'ing, we assunlc thr: industrial standilrcl [7J as a kincl of lraselinethcsaurns cltaracterizalion. 'fhis excluclcs many on()-r'clalion collections fr6m beinslltcsltr-rri propcr, tlrorrgh telnnincllogical thcsauri arc, clcarly subsrrmccl ttrrougS tScscr:ollcr:l.ions,

Ilasic'all\', thesauri itre scmantic nctworks rxrnsisting ol'a set of (rnulti-wor<l) lcrnsthat are inl.ercontrcctecl through linl<s of cli{l'ercnt kirrds. links aic labellecl, ancl a setof lirrks with lhc satnc rclation labcl is callccl a relation. Vocabularl, control isr:cflc:ctcd in the t1'pcs ol thcsaurus lerrns, which arc cil.hcr dcscr.iptors thal nray <;l4s-sify docurncnts, or rron-descriptors. Ä non-<lescriptor is a l.cnn ;pointing,, to cxactlyollc clest:riplor througlt att asymm.ctric synonylny rclation rvhich is oftcn callecl"tJSli", rvhcrcas a clcscriptor may point to arr arbitral.y nurnbcr of tron_c{cscr.iptorstfrrorrglr thc convcrsc of thc USIj rclation, which is called IiS[i Irol{. Non-<Jcsc:rirrtors arc assumetl 1o pafticipate only in syrronyrny rclations.

lror a sct T of lerms, thesa.urus rclations are binarl' relations in the Czrrtesian productT x T'r (-icncrally, introcluction of n-ary relations secrns 1o [e a straightfor-waidcxl.cnsion of the binary case. Ilinary rclations arc eithcr uniclirectional orhitlirccliorral, i.e. sylppetlic:. 'I'ratlit.ional thesauri lravc exactly tlrree typcs ofrelul.ioils, na.mely the two unidireclional typcs "hicrarchic,'ancl ,,synonyrnr.rus,.(ser:

above), atrd the biclircctional "associative" type, rvherc hicrarc,hic an4 associative

2

n-T lv rclfllions witll n > 2 ltavc bcen ProPosccl il0] e.g. lhe relalion tt BEThIEEN t? ANtl t1 for place narncls tt. t2,and t.,, brrl. rcrnain mareirr;rl.

rclations may form gencral graphs. Figr:re I on page 3 shows some examplcs ofl.hcs,nurus relalions.

hi erarch ieal d,ig l,lar.rolrert,lhee I llarrot'rerAi 1:s ll,lSTance

assneiative petsfnrnilv planninrJ

synonym$ils Rrlvocat s

advocn t splriloloqy

T errn

Term Part it'iveof

Rel ated TermRei ated Term

Usa ForUse ForUSE

aninalscd l'5rnountain reqions

clogspoprllatinn contrt:l

lalvyerbarri sterI inguistics

I jigru"t: | , I f icral'clrical, nss(l(.:i;rtive arrd svnonvrnot.rs r"elntions

llrotn a knowlctlge represeufatiotr viewpoint, trvo diflercnt l.ypes of knowlcdgc areenrbodiecl in traditional thesauri. O$cct donr:rin knorvlcrlge is concer.ned with thccltarzrclcrization of a contrr:lled rrocabulary ancl of the tcrrninological rclati<tnshipsohscrvablc in it. Note tha't, iu classical thesauri as opposccl to {uccttcd 91cs, tfterclalions thcrnsc];rcs arc dornaiu-indcpendcnt (scc [7-J). Library scicnr:c knowlcdgc,u'hich is usualfrf left irrrplicit irr printcd thesauri, is conr:crned with the {irrm gftlcscri;rttrrs, tli'c rrrnal s.lcfinition af rcly{ons, and thc prcsentation frrrrrra.t(s) oflltcsauri. Ntodclling (object clotttain) terms explicitlv as first ordr:r terms bas tfie(tttuvt:l<rrttrc) effcct that -sta.tr:rncnts quantifying pvcr tcnns, as in |igrlrc 2, bcc6mcstcond ordcr constr-rrcts. 'l'hercfbrc, if library knorvlcclge is to be subjcct t<llhcsaurus ntodelization, a morc liLcral vierv of thcsaunrs tcrrns scerns aclvisablc.

tL-t

I! n-)rl

I

-tI

I

l

l'e[ ;;;.

I o.,,'

l'::'

is no llait" of t,er'rns in a

eaclr ljrrl< a[b, ä an(l h dre

srihnrts of' a Lhesaul'rlS are

tltesatlrtts tlrat has l:ottr I hierarchic Err(l an ass0ciative l"elatiorr

tenns in the thesarJrlfs

disjoint if their maxirnal errvironrnents ar'e differ"ent

I;'ip.rrrc 2.. Slal.etrrcllts rlu;rnlif\iit'lg ()vcr l.lresartntri l.errts

In lhc f<rllor'.'itrg, a. numbcr <tf algcbraic rclation propcrl.ics arc c,hcckcfl for theirr,alir.lity rvilh respcct lo Lhe relations prescnt irr lraditionerl thesauri.

'fransitivity

'I'he lransitivc hull of a hinary rel;rtion R is a rclation of all tenn pairs cgnnectc4througlt a noncrnpty chair o[ R. '['hrrt is, thc transitivc of a rclation is always asttpcrsct of it. Most thesaums rclalions will not bc transitivcs in ttrcmselvcssince thcl distancc trctwectr c,ac'lt pair of tcrms bcars inlcrrrnation on how closelytltcy arc rela.ted. rior synonyln relations, whosc clcpth clocs nol cxceecl l,nothing is gainccl thxrugh transitivity. Ilower,cr, frrr an assoc'iative or Iierar-chical rclation it rnay be reasonablc to havc another relation as its transitive.'l'his rvill add to a thesautus the capability to sce tcrnrs rclatecl to arrother tennjFindcpcnclcnt ol'their clistance frorn it.

Ilcllexivity

A lhcsaurus rclatio' is reflexivc if it rcla.tcs .,"*ru ,,'rn',.k*hkJl.1o irself.obviousl5', synonymy relal.ions arc rlevcr rellexivc. Arrd intuitively, one wolldnot expect any tcrm to be in the set of tcrrns associatcd or lücrarchically subqr-dinatecl to it. On the other lrand, it is easily verifierl that thc transitive of a cyclicrclation is a rcflcxive relation. 'I'hercforc, excluding reflcxivity means to excluclcany cy'cles frtlm relatictns thal have a transitivc. 'l-his sccms to be in accordancervittr relcvant litcrature (e.g. [10]).

A l,ogic-hnscd Approaclr l.o 'l'hesaunrs l\{oelclling 3

S.ynrrnclrv

A rclation is syrnmctric if, for an5, pair.a,b>, il. also cont:lins the inverse.<b,a>of tlris pair. so, a bidirccti<lnal relatirxr is always symrnctric, ancl a synonynryrclation is never sincc thc two sets it relates -dcscriptors and non-derscriplors- areclisitrnct. Ilowevcr, a (uniclirc<;tional) hiera.rchical relation can acciclentallybccorne syrnrnctric. 'I-his will be thc case f<rr all uuiclireclional rclations thatcnirrcide with their converses.

Lcft lrleintitivity

A rclation is lcft iclerntitive whcn it has at rnost one lcft argurnent for any rightargurncnt, (Anzrlogous for right idcntitivity). Ä bidirect.ional rerlation is lelticlcntitive ilIil. is right identitivc. Most associative relatir:ns rvill bc too'lo6sc"for this kirrd of | : I relationship. Synonym relations 'poirrting" to clcsc,riptorsare riglrt irJcntitive (and their converses lcft i<Jcntitivc), sincc no non-{escriptorshoulcl poittt to rnore than one clcscriptor. While hierrarchical rcla.tions gcrrerallyat'e of the n: m typc, some of tlrem arc left identitivc - think e.g. of a1 INSTancercl:rlion wlrcre cvcry object bclongs to at rrrost onc class.

(ionrrexit-v

A relation is cotrtrex w.r.t. il cerl.ain set i[all clcmcnts of rhis sct pa.rlicipatc in it.lior tlatry tlresauri, tlte tna.in hierarchical relation is cr.rnncx w.r.t. the sct ofclcrscriplors since tlrere ale no <lcscriptors outsirle the hictarchy" Änalogousl5r,tnclsl synonytn tcla.tions are councx w.r.t. thc sct of tron-dcscriptors. Associal.iverelations traclitionallv ltave a morc ac:ciclcntal status, different from the hierar-chical backbclnc of a thcsaurus, and are uol conncx thcrefore . Note howeverthat this fact docs not derive from formal propcrtics of this relation t1'pe, butIi'orn a traditiol of thcsaurus writins.

The logic-baserJ thesaurus approach

The lagical rnodel

In tlris scction, att approach 1o t.hesaums rnodcllirrg hascd on rclal.ional logic will bcoutlirrccl. l'rirst, ils bilsic rcprcscnr.atiolr construt:t, the logica.l rnorlcl (l,Ni), will bcttr:llitrctl.'l'ltc l,Nl is a static, expliqil, firrmal reprcscrrlation ol'both objcct domailrkrror'vlcclg,c, i.c. a scmatrtic netwr.rrk of tcrms an<J krbc'lc<l rclation lirrks (sce Il]), andlibrary kttorvleclgc allorving to descrilrc formal propcrtics of this networl<. Sccgnd,sr:r'cLal inlbtencc rnccha.nisrns will he describccl, that supporl, to it ccrtaiu extcnt,cxrnsistcncy chcc;king and the retrieval of cornplcx viuvs fr.orn a ttrcsaurus.

'l'hc LM is a sernantic: ttctwclrk moclcl for lhcsauri. A thcsnunrs Inrnagcnlcnt systcrrr('l'NIS) can bc defitred as the shcll prograrn intcrprcting this kind of mocfcl.'I'he LN,lis n:lational in thc sctrse that its acccss prirnilivcs a.re both thesaurus terms andlabcllccl links betwecn thesc lerms. 'l'his is in conrrasl. to clictionary-<lricntedthr:satrnrs aJrproaches (sec e.g. [9-]) where the acccss primitives are terrns a.ncl

rclation inlormal.ion is acccssecl thror-rgh term cntrics. Wc lhink gencrally a relalionalthcsaurus rnodel r.vill he less reclunclant and providc nlorc acccss pathes than aclic:tiorlary-ot'ietrtcd onc. 'I'hc principlcd diffcrr:nce hcrc is that (;onccptual) thcsaurihave rnatty accc$ri pal.hes to cach rrocle, but rot much infrrrmation associatecl with asinglc ttode, u'hereas (lexical) dictionaries acccss complcx infonnation through singlecntrics. 'l'he terms in a thesaurus are multiword cxprcssions (see [7]), wich may { \tal<c lhc typr: clcscriptor or non-clcscriptor. 'l'enns, relations, an{ attribudes arcrc1>rr:scnlcd inl.crnally through uniquc nurnbers rvhich are allocatecl when creatingthc respi:clive itcm and preserve idcntit)' ovcr maclilication of arry of its propcrtie.s,

4

A relation consists first of a rclation tlescriPtiorr cuntainiug its narne, uniclue nurnberarr<l itslypc ancl, ncconcl, a (possibly errrpty) set of links of this rclati<lu. A relatiorrlink is a. triple .R,tl1t2'where R is a rclation nurnbcx'and t, ancl t2 are tennrtutntrcrs. In the same way, an attributc consists first of an attdbute clescription wilhils Itatnc, unique nurnbcr, the terrn typc(s) it applies to and a clcfäult value whicfitnal' bc lcft unspccil-red. Note that thcrre is a clistincl.iorr betwcen "no value" (i.e. alson<l dcläult valuc) and "valuc unspecifiecl-. Scconcl, an attrillutc corrsists 6[zerg ornrorc a.ttritrute triples <a,t,v> whcre a is an attributc numher, t a term number anclv tlteir (trnique) vahle. 'l'et'ms, links arrd altrihutc triplcs arrd the rcspcctive num-bcrirrgs arc also referrecl to as itcm clata.

'l'he inf'crential part of the logic-lll rnodcl is a set of procluction rules calle{ axiolrinslanccs. An axioln, in our nornenclature, is a statcmcrrt schcma ranging ovcrrclation tupels, wlrich is parametrizcd for relatirlns (i.c. relation numbers). liormally,it. is an implication schemn.. Its basic expressions are rclation tupcls <R,A1 ,A2'lvfiereR is a relation variable ancl 41 and A2 are term variatrles. In the antccedcns o[ thcintplication, arbitrary ßoolean combinations ol'basic cxpressious arc allowcd, as rvellas tregation of basic expressions. Itr lhe conscclrcns eithcr a single basic exprcssionor ()llc of a nttmtrer of I'I{OLOG goals" '['he scmantics of axiom schema is tn.rth-fttttcl.ional, tro procedural side effects are allon'cd. Axiom sc;hcrnata are curcnlJy partoI thc -l'l\4s

shell codc, hor.vevcr we cousidcr to rnake thern LM parl.s. An axiornittstancc <l{'a schema, now, is a licenscrl bincling of cvcry unbound va.riablc of thes<:hetna to a relation trut.Irber. Scc "A -l'MS

infi:rcnce conccpt" for an explanirt.ion oftltc liccnsing procoss. z\xiom ittstances are part of thc LM (and henr:eibrth rrroclili-ablc), so that the c<lntcnt of a thesaurus can be elcrcribccl as thc inrplicaliorr closureof itcrn data under the sct of axiorn instancres.

'l'he LM also includes, of course, nutncrr)us global data on l.he tlrcsaurus as a whcllesuch as its inleruill nalne, full-qualificd files names, derscriptions of the thesaurus,$aclt itttributc aucl cach rclatiotr, the nurnber ol'tcnns c<lntairrccl, ancl, most notalrly,ther maximum inl'crence depth (MIt)). Wil.h the MlD, ttrcsaurus aclrninjs1rators capbalance thc tradeoff bctr.rtccn complcterrrcss ol'irrlcrcnce proLr(fsses aud thc per{orn'r-alrt:c oI Ihcir r:nmprltation.

A TntS inference concept'l'ltcte arc two l>asic types of iufercnce in onr'l'MS shell, rvhich are callc<l rclricvingan<l lictnsing. I{ctricvirrg is a lrncl<rvar<l application of axiorn insl.anccs to the effectthat thc closurc o[ tltc thcsaurus up to a certain MII) is urrnputed. 'l'he MII), pr-e-cisc:ly, is [hc tnaxitnum lengl,h o[ thc rJcrivation nccclecl to rclrievc a rclation item. Acolnnlon vierv ott clcrrivcd atrcl uon-dcrive<J rclalion l.upcls is Providecl to the end-uscr, with thc exccption Lhat clcletion <l[ clcrivcd trrpels <lr tcrrns tneans to {isablethcir dcrrivation. ('l'his can bc donc cithcr through drlction of tu1;cls they arc<Jcrivccl from, or <loletion of certain axiom insl.anccs). Liccnsing, on the othcr hancl,is a filrlvard inlbrcnce proccss basccl on a sct of thcorclns and on relation l.ypes.Ilcstrir:tive liccrrsing blocks the application r:f ccrtain theorcrns, if the resultingaxiotn instances arc ttot adrnissihle. (ltrncr:rtive liccnsing applics theorenrs to derivettcw itxi<tln ittstanccs frorn g.ivcn oncs. Liccnsing is perhlrrnccl a.t relation definitionlirnc antl is independcnt of thc itcm data of a rclation.

A I,ogic-tr;rscrj ,,\ppr"oaclr ln '['lrc$aLf nJs futoclr:lling 5

Axioms and retrieving inferencelrr or.lr cun-cltt '['N{,S shell, there arcfrrrrn of' l'R()[,(J(i tcrrns. It shcluldcxlr:nrlcd. Let. elac;h R, Rl, , . . , Rn be

t 'l'rAttsitivitS' *R, a , b> v (*R, ä, c>

thc follolvirrg {'crv axioms, ta"king thc cxplicit[:re menl.iorrccl lhat this sct c:arr be quite r:asilybin;r"ry rclatiorrs in Lhe set T of t*rrn( nurntrcr)s.

& "Rl,crll>) + *Rt,arll>

. (iompatibility fX a,b e T <Rlrdrb> & (<R,a,h> v "R,b,ao) + R cornpatible R1

. (lOnverse <Rrarb> - .Rlrbrat

. Inrplicaticln <R1 , a r b> +

"R, a, b"

. 'l'ranslalicln <R1 ,arb> & <Rrara't &.R,brbr> +.Rt a',b'>'I'ransitivity ittlroduccs a lelaf.ion folding an enlire bierarchy into one singlc l6,el.'l'his rclatiort catr be ncw, but is not neccssar:ily. Ciornpatibitity rneans that "ctoublelinl<s" bctrvcen pairs of terms are allolved. Note that, rvhiler incompalibility meansthat dor"rble links are rejcctcd. compatiblily docs not mean that thcre are <Joutrlelinks, bLrt that there coulcl bc some. A translation is a stnrcturc-preselirrg rnappingirtto another term-space. 'I-he instanliation of this axiom is r;rther restrictccl,

".g. tt

"R rclation is alwa5,s translation type ancl the R1 relation is never. 'l'hc facl that lxiopsusc inrplicaticln instead of cquivalcncc for introdtrction <lf nclv relations mcans thathancl-<;odcrJ links can always be arJdecl to zr rlcrivcd rclation rvithor.rt triggering anyinl'cr:cnce steps. On the olher hand, equivalenccs can he introclrrcecl throrrgh thoscpairs o[ axiont insLatrccs that are licenscd (sce belou,). Whcn retricving a clcrivcdrcla.tion, differcnt irrfcrcnce cngines a"pply deprencling on the vicw requesl.ecl (sectxrlolr''). I;or perf<rrlnance rcasons, infercnce is mostly perftrnnccl not on thc le,r,cl ofitcrns hut. on sutrtrce ler,cl.

Relation types and theoremsOrrr-I'MS shcll ernbeds thc three classical relation rclation typcs (see "Moclclling tra-clitionnl thcsauri" on pagc 2) in a rclation typrcr lrir:rarchy as shorvrr irr Irigurc 3 irnpflgc 7. llclations are orclercd by spccifity, i.c. thc lorvcr a rclation the ma;re specificis its dcscriJrtion. Lowcr rclatious are subsr,rmed by highcr oncs. 'l'hc nrorer srrccific arclitLion type, tltc nloro correctnr:ss chcckingcan hc pcr{nrnrecl Llunugft tfie systcrn(scc bclorv). 'l'he folkrlving types are knorvn to thc syst.rn, othcrs, e.g. abidircctional rclation o[ limited dcpth rnipfrt bc cxrnsidcrcd:

other - Most gencral type, other relations can bc bidirectional .runidirecti<lnal, arry numbers of terms can bc conncctcd, there is no restrictionon ch3ipi11g clepth, ancl the relations may hold Lrclwec:n clcscriptors or norrclc-sc:t-iptors.

tunidir -, A unidir rclatiorr is an unidirccticlnal other rclation

assoc iati ve - An assoc iati ve relation is a biclirectional other rclation

synonymous - A synonymous relation is an unidirectional rclation betwcen adesr;riptor ancl a. non-dcscriptor. It has alrvays a clcpth of l.hierarchic -- A hierarchic relation is an unidirectional tedatiolt belrveendcscliptors.

j nstance/general i zatj on * Än i nstance/general i zat'i on relation is ahierarchic relaliorr that is l: n (or n: I for generalizatious) ancl has a clepth of l.tt'anslation - Ä translation relation is an associalive rclalion that is l: l.ivilh nrhitrary (linire) deprh.

6

0[hcrlrlr ftirlir 0r trrriclir, any rlclrlh, desc nr nclesr:)

r.rnidir hidir

rlr:sr:/nrlnsr;

tlrpl[='l

Assoc ia ti r.re

desc

1: r)

rlcytl;lt-1

ceI i za t i orr

llclaliorr lypc hicrarchy

l,iccnsing by relation typcs mcans that 1he systcrn holcls so-callcd connection tablesrvith (maximally) n+1 qrturnns f<rr a maximr:m nurnber of n urrb<lund variablcs inirtt), sxiettt inclicatirrg lhc compalibility of ttre ty1rcs of thcse variilbles for a givenaxiom. ti.g. thc collverse o tlte transitivc of a givcn retal.ion urust have the sarlc orany higher type, and a translaticln rclation must not bc applicd to another or thesamc tratrslation relation, ctc.. I.iccnsing by relation tvpes is always restrictive.

['he scconcl factor in licensing is a set of thcorcrns, which are applierd lcrrrvardly toaxiom instatrces (see follorving list). 'fhe thcorcrns are part of the shcll code, brrtma;, bc cnhancecl very easily. Whilc axiorns have variablc,s ranging over tcrms, theo-l'ctrls rallge over axiom instancers ancl have variatrlcs ftrr rcrlations only.

As a. notation [<rr l.hcorctns, rvc atrt:rcviater thc lacf. that thcrc is an axiorl instalcc<relationlrarll> -+ <relation2,b,a> as relationl converse relatjon2, aqcl simi-larly for any othcr axiorns. Iiurthcrmorc, rvc assulne (irnplicit) universa.l

1 lr ecrrcu-r s

aIearc trans-

Sotnc of tlrer thcorcms imrncdiately follow lirrn 1hc axiorns <lcscribed ahovc plusotltcr theorcms, but nrost of thcm are stipulations intcrrclccl to ascritre desirablct:loscdncss propcdies to the LM. Iior irrstance, thcorem (12) staling that a transitivetclation is always impliercl, immccliately folkrws from thc axioms crn transitivitl'anclimplicatiorr. On the othcr hand, there are several thcorcms on right or leftidentitivity of an axiom Lnstancc (2,9, 10, l4), which clo not follow fnrrn anythinglrr:t are stipulatcd in order to enf<rrce r:losedrrcss of the LM. Li.g. theorem (2) stateslhat that thc cotrverse axiom instance is right idcutitive. Ily the axiom "Conversc", a

rclal.ion R rnight have two cliffcrcnt rclations as its converses rvhich will ha.vccotntnon links as long as R is not ernpty. 'l'heorcm (2) now cnforccs that therc is atlnost one convcrse for any givcn relation R. Ily thc converse axiorn, this rclationmay bc a supersct of the set o{'cr:nverscs of R.

Rl converse R2 --) R = R2

7

I 1: r

l"ranslatinn

ilI rr3 t.arr

<luantification of rclation variablos, unless explicitly quarrtificd. (ienerativearc tnarkccl with G. rcstrictivc otrcs with R. While the irlcntitivity l.heorems

.4 .. rr:sttictive, tltc:oretns like,(N\ which statcs that all tron-translatiorr relations' , -) lalr:d, alc gcneraliyc i1+ä-turc.

t. ft CONVCTSC R1

i\ Logic-bascd Appr$ärch [o 'l'lresaurus i\,Iodellirrg

2.R Rl convefseR & RlconrierseR2 +R=R23. lt R converse R --r bidirectional (R)4. G R converse Rl -+ R compatible R1

5. (; R compatible Rl & R1 compatible R2 -+ R cornpatible R2

6. (; R conrpatible Rl + R1 compatible R

7. R --, (R compatible Rl & Rt transtate R)

ft. R R transi ti ve Rl & Rl transi tr've R2 -+ R1 = R2

9. Il R transi ti ve Ri & R tr"ansi ti ve R2 --+ Rl = R2

10. lt R tr.:nsitive Rl & R2 transitive R1 -+ R = R2

ll.(;(ALLR'R2)(EXR1,Rl.')RtransitjveR1&R1converseR2<->R converse Rl, & Rl' transitive R2

12. (; R transitive Rl --' R inrpl ication Rl13. lt -(R1 transitive R2 & R1 converse RZ)

14. R irnplication is not right. or lcll identitive15. (J for all R e l42: R impiication R 2

16. G R1 impl ication R2 & R2 impl ication R.l --' R1 inrpl ication R3

17. G R irnpl ication Rl --' R compatible Rl18. ll - (R1 implication R2 & Rl converse R2)

19. G translatjon(R1) & --translation(R2) -+ R1 translate R2

The Thesaurus Administrator/2 TMS'l'ltc rerla.fional logic aJ'proach is inrpk:mcnted in IllM's'l'MS prorluct'l'hcsaunrsAdnrinistrator/2. 'l"wo implcmentation charactcristics of lhis systcnr, namcly, theulxlate plocess for scts of item clra.ngcs ancl thc vigv conccpt o['fhesaurusA<irrrinislrator/2, rvill trc cxplained in cletail in this scction. (icncrally, 'I'hesaurusAdrninistrator/2 is a lwo-process system where the background process is aPI{Ot,(X; prograrn which docs the krgical administration of the t,M, and the fore-grountl process nlns an object oriented netrvork cclitor, Whilc thc updale/verifyacl.iotr transfcrs user data to thc LM, views contain suhscts of LIvt itcm data ancl areretricvecl from the LM accorclinq to ccrl.ain naramelcrs.

Updating the LM'l'lrr:sartnts Adtuinistratorl2 allorvs to updatc atl LN.{ rlata, both glohal arrcl specificr)tl{ls, at anl' point iu tirne. 1'he update acl.iorr is 1o scnd a sct of user lnorlificatiousIirtrn lhc ftrregrotrtrd oclitorto thc bachgrorrnd, cher:l< thcir con-sistency with thc LM,atlcl t:hattge the LN{ accor<lingly. With rcspcct to itcrn daf.a, possible actigns ontcrrns a.re acld, deletc:, ancl rnodilj,. Link iterns can he adcled or dclctcd, but not mod_ifiecl. Äll itcm clata is updatcd so tlrat a sct oI'item morJifications i.e. pairs of iternsnncl acl.ions, is evaluat.cd a.gitinst the LM unclcr n. llesl-solution regirne. 'l'his mciurstltal. for each item to be changcd it is tried to fincl an intcrpretal,ion rvith rcspcct totltc l<lcal colttext 3 atrcl tlte curcnt LM srrch that application of ther item-action pairto the currcnt LM is a.clrnissible. As an (easy) ex;urrplc, ackling a link incompatibleu'ith anothcrlink ancl dclctingthis olhcr link within tlre same sct of moclificationsshould not edicit an irrcornpatibility messago {iom thc system.

Ifor gklbal data, i.c. rclation or attribute specifications athe updatc actions modify,dclete, antl acld arc applictl to single actions, 'l'hrce cascs havc to be clistinguislred:

2 | lolds. but. is rrot. assertccl cxplicitly for pcrlbr.mance rcasons

3 'l'lrc local c<ttttcxt is l.lre sel of all ilerrr-act.ion pairs that. have not yet been applied to t.he I.,M.4 Wc clo nol cottsidcr hcrc global changes that are unrclated to the inferenl.ial propo.l.ies o{'an LM

I

'l'rir,ial changr:s of the LM c.g. modifying a relation narne, do not affcct r.heir-rfcrential behaviour of the L,M. Nn chccking is pcrformcd.

l,orvcdrrg rsstrictions, e.g. clelcting a converse conncction, gcling up in the lypehicrarchy, or de{ining a clcrfault value for an attritrute , rnay indcccl alfectirrfcrent.ial behaviour of thc LM, hut cannot introcluce auy consistency conflicts.Ilsptccially, no conflict involving itcnr clata arises tlrrough lowering restrictions,so lltat chccking is perftrrmed on rclation clel-rnition lcverl, but not on item lcvel.

Raising restdctions may inl.roclucc consisl.ency conflicts, tlrough, of course , itdoes not neccssiu'ily. Iixarnplcs of rcstriction raising arc the inlroduction of anew, derivecl rclation, clelcting a rclalion,or going down in the type hierarcdry fora given rela.tion. l{aising rcstictions reqrrires chccking both at the itcm and therclal.ion levcl. At. the itern level, a.ll links of Ihe respcctive rclati<ln arc delctetlancl rc-addr:cl, so thal consistency with the l,i\4 is r*rcc:kccl, lhougd-r not on abcst-solution trasis, ancl mcssagcs can be brought up in case of consislenc), corl-llicts.

Thesaurus Administrator I 2 view concept'l-lrcsar-rnrs systcms oll.cu ha.ve bcen rneasur-ercl by the kincl of disl:lay they oil'cr (e"g.

[9]). 'l'lrosatrrtrs Administratcrri 2 organizcs its clisplavs on the hasis of subscl.s of thcLM rcl.rieved lvith thc irr{'erential capabilitics that lravc bccrr cxplaincd above. '['hese

subscts, erurichccl with clisplay infonnation (rnainly lectangr-rlar c:oordinates), arecalled vicrvs. A 't'ier.v is spccificcl through a sct ol'pararnctcrs inclr.rcling an optionalroot lel'm or pztrtial string, lrom trone to all relations, an optional clisplav depth anrJ,<lf coursc, the vicrv tvpe. (Jurrcnlly, tl.rcre aro lirr-rr vicw typcs supportecl in' l'hcsan rus Adrninistratorl2:

l. 'l'hc trce view shorvs one un.idirectional relation in its clirection, dcparting firom aroot fctm, or from all top levcl terms whcn no root is specificd. It is limited hya tree depth parameter, which indicatcs fhe maxinrtrm nurnber of lcvcls (wit.htoot=levcl l) to bc clisplayetl. T'hc tree vicw is intcndcd to pror,ide a quick,derrsc clisplay <ll'the hierarchical backbone of a thesaun.rs, as illrrstratcd inIiisure 4.

Iiig,rrr"r: 4. I)is1tlilv of'a lrr:r: r,ierrv ill'llresnilr-us Aclrninistralar12

2. 'l'hc nrea vitlrv shotvs the cnvironntcnt of a ccnter terrn, ovcr one 1o all relationsarrd again for a limiting area clcpth. I{elatiorrs n1e di.splayecl indepenclent of theirdirection iu ordcr to su1'rport rclation clcvclopment. Scvcral clifferent displayalgorithms are available for ttris view. l;or cxamplc, I;igr.rrc 5 shorvs a radialtlisplay of an arca comprising a hierarchical rclatiorr iu (icrrnan, translations 6fits terms into linglish, an<l tlte cclmputccl hierarchical rdation over the l:lnglishte:rm set.

I.

2

lrigrrre 5. l)ispl;ry of' an aroa vicw irr 'l''hesarirrrs Aclrnirristr-a t<tr 12

l)ifl'crent from trce and arca view, the level vierv does not lrave any links. It isclcfincd far rrnjdirecl.ional rclatiorrs: if a ccnter tcrm is givern, the lervel vicw dis-plays an alptratrctical sort of'all its sistcr terms, othcrwise the top levcl tems ofthe rclation.

A Logic-b;'rscrd Approaclr l,r>'l'hesatl'u*s i\,,tqrdcllirrg 9

4. -l'hc l'ind function uscs, graphisollt, a lcvcl view. [t trrings up a set 9f termsrratching a tcrttt sperciftcatiou which may corrlairr nra.sking charactcrs arrc{ anylloolcan ANI)ing of attributc-valuc pairs.

Outlook: Thesauri in $emantic lR'l lrc logic trased 'l'MS approach takes a step in gencrnlizing the traclitional thesaunrsc()ll(rcpts in the scnse of il]. Ref<rrrnulatiug thesauri as serna.ntic nctworks clarifiesthat thcsatrt'us terms have the epistemologiczrl slatus of conccpts. Once rcpresen-1:rtiott issucs itre scttled akrng thc eslablishcd lines of tcrminokrgical logic, iluesrionson thc aqr,risiticlrt ol sctnantic cla.ta and their application in SII{ arise. A considcrablelitcraturc has aclressed thcsauri as a critical ressourcc in slI{ c.g. [4,5,g,1l]. wecxpcct firrlltcr SIR tlcr''cloprncnt 1o signihcantly profit flrrn a u,cll-founqleil, logic-ba.scd thcsau rlr s rcl-)rescnt at ion.

Acknowleugä;ä;t --:-I rvould like to thank allrny collcagucs at IIINI nhosc olrgoing supporl fias becn aPct-tnanettl molivaliort for tne. Ilut cspecially I rvoulcl like to thank my fdencl anclcollcagrre lrranz Facrbcr, now at the SAIt compauv irr \ValldorfT(ierma.ny, who hasclouc nrost of the cocling ol IIINl's tlresanrus systcms an<l whosc cooper.al.iqn was aprcrcquisitc for this projcct to bccome tcal.

ReferencesIlrar,:lrnran, I{.: OrrIl l';rc:h rna n , I{ . a rr rl( | q8"])

2. L.Nl.Llhan: Subjtrc:t Analysis'l'ools Onlinc: 'l'he Ch;rllenge Ahead, Irrfrrrmati11p'l'r,chnokrgy ancl Librarics 9, 3, 1990

l.

_1.

thc tipistr:rnologir:al Stalr.rs of Scnrnntir: hJc:twrlr-ks,I cvcsqLle , I I. (ecls.) K rlo\','le:rlqc: tr{cprcserrrlaf.iclrr , Arnste r.rlarn

Ilnguc:h;rrcl Ch., jlIalvachc, I',,'['rigar]o, It.: Autgtnal.ic Na.turzl,Scnrarrtic Nr:twork ftrr lrrfornration I{ct.ri*val S),otcrn$, prcc;. ofr\r"fifir:ial Intclligcncrl ;\, Orlancler ß92I;ox, Ii.Ä.: I)r:r'clopmctrt o[1he (]ODlili, systcnr: A lcst.betl frrr arlificjal intelli-gcncc methods in inhlrrnalion rctrieva.l, Inf. I)rcc. an<J Mgmt, Z_\,4,1987L,arsen, ILl,. ancl Yagcr, R.R.:'I'he [Jse od liuzzy ll,erational'l'hesauri for

classificatory I\uhlcnr solving irr Infcrrmation Retrioval aucl lixpert systerns,llllili'I'ransactions on Systerns, Man, and Cyberbcl.ics, Vol 23, IMeycr, I. and Skucc, I).: 'I'owarcls a new gcncratio' of tcrminological

rcs$ourccs: An experinrcnt in huikling a tcnninological knowlcdge hase, pr6c.COf,lNG-92, Narrtes 1992National Inlonnation Standatds Organization: I)roposecl ANSI Guidclilcs frrr-l'hcsaurus

Structure, (]onstruction and [Jsc, ANSI Z3g.l9-1990I)air-:e, (lh.: A 'l'trcsaural moclcl of Inforrnation l{ctrieval, Inf. Proc,. an<J Msrntt99t, 2

Ibllard, R,: Ilyperlext presentation of thcsaud usccl in onlirre searching, plec-tronic Publishing 3, 3, 1990I{r:hou, (1.: La Gcstion Automatisee des 'I'lresaurus - lll.rrdc conrparative clesLogicicls, I)ocumentalistc 24. 3. l9B7

Acclrrisitiorl clf aApplic:atiorrs of'

4.

5.

6.

ft.

t0

l{)"