On Legal Texts and Cases

11
On Legal Texts and Cases Rosina Weber [email protected] RuaMaestro AldoKrieger, 138/403 Florian6polis, SC88.037-500 BRAZIL phone: 55-48-234-6574 Alejandro Martins [email protected] phone: 55-48-331-7055 Ricardo M. Barcia [email protected] phone: 55-48-331-7008 Federal Universityof Santa Catarina, BRAZIL Graduate Program of Production Engineering Laboratory of Artificial Intelligence Campus Universitfirio -Trindade CaixaPostal: 476 CEP 88.010-970 Florian6polis - SC Abstract The search employed by judicial professionalswhen seeking for past similar legal decisions is known as jurisprudence research. Humansemploy analogical reasoning when comparing a given actual situation with past decisions, noting the affinities between them. In the process of being reminded of a similar situation when faced to a new one, Case-Based Reasoning (CBR) systems simulate analogical reasoning. Judicial professionals havetwo sources of jurisprudence research: books and database systems.The search in books is time-consuming and imprecise due to the limitations of humans’ memory. Available text database systems do not guarantee the retrieval of useful documents. PRUDENTIA is the case-based reasoner tailored to the Brazilian system that confers efficiency to jurisprudence research. Judicial cases are described with natural language text, comprising a collection of textual documents. These texts are the experiences that require case engineering to be modeled in a structured representationof cases. We have developed an automatic means of performing the case engineering, that is, converting legal texts into structured representation of cases. Examples of PRUDENTIA demonstrate the power of similarity-based retrieval in a textual CBR systemagainst text database applications improving the usefulness of the documents retrieved. Copyright ©1998,American Association for Artificial Intelligence(www.aaai.org). All rights reserved. 1. Introduction The issue of textual case-based reasoning comes up when textual documents contain descriptions of experiences of a given domain knowledge. A case should express an experience, but cases also have to be manipulated within the Case-Based Reasoning (CBR) architecture. So, textual documents have to be represented as cases; texts shouldbe referred to as the original source of description of an experience, whereas cases are the entities in a case-basedreasoner that permit experiencesto be manipulated and retrieved. Allowing the proper handling of such experiences demands a more structured representation of the experiences. Eachexperience, or class of experiences, has to be evaluated in order to be properly associated with the goal task in a CBR system. Medical experiences might be related to diagnostic tasks, while legal experiences may relate to interpretation or classification. Once the reasoner’s task is defined, a knowledge engineer can evaluate whether one or more classes of experiences should be used as the knowledgesource of the system. After defining the scope and the task of the system, it is easier to envision a proper structure to represent the chosen experiences. Knowledge engineering deals with the acquisition and representation of domain theory, rules of thumb and any available knowledge that humanexperts can provide and express. The wayof facing an experience within a given domain knowledge in terms of what to highlight in a case representation is a controversial and polemic issue. Narrowing such abstraction into more specific problemsis an appropriate guideline. Hence, let us discuss the viewpoint of representing experiences with 4O From: AAAI Technical Report WS-98-12. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved.

Transcript of On Legal Texts and Cases

On Legal Texts and Cases

Rosina [email protected]

Rua Maestro Aldo Krieger, 138/403Florian6polis, SC 88.037-500 BRAZIL

phone: 55-48-234-6574

Alejandro [email protected]

phone: 55-48-331-7055

Ricardo M. [email protected]

phone: 55-48-331-7008Federal University of Santa Catarina, BRAZILGraduate Program of Production Engineering

Laboratory of Artificial IntelligenceCampus Universitfirio -Trindade

Caixa Postal: 476 CEP 88.010-970Florian6polis - SC

AbstractThe search employed by judicial

professionals when seeking for past similarlegal decisions is known as jurisprudenceresearch. Humans employ analogicalreasoning when comparing a given actualsituation with past decisions, noting theaffinities between them. In the process ofbeing reminded of a similar situation whenfaced to a new one, Case-Based Reasoning(CBR) systems simulate analogicalreasoning. Judicial professionals have twosources of jurisprudence research: booksand database systems. The search in books istime-consuming and imprecise due to thelimitations of humans’ memory. Availabletext database systems do not guarantee theretrieval of useful documents. PRUDENTIA isthe case-based reasoner tailored to theBrazilian system that confers efficiency tojurisprudence research. Judicial cases aredescribed with natural language text,comprising a collection of textualdocuments. These texts are the experiencesthat require case engineering to be modeledin a structured representation of cases. Wehave developed an automatic means ofperforming the case engineering, that is,converting legal texts into structuredrepresentation of cases. Examples ofPRUDENTIA demonstrate the power ofsimilarity-based retrieval in a textual CBRsystem against text database applicationsimproving the usefulness of the documentsretrieved.

Copyright ©1998, American Association for ArtificialIntelligence (www.aaai.org). All rights reserved.

1. Introduction

The issue of textual case-based reasoning comesup when textual documents contain descriptions ofexperiences of a given domain knowledge. A case shouldexpress an experience, but cases also have to bemanipulated within the Case-Based Reasoning (CBR)architecture. So, textual documents have to be representedas cases; texts should be referred to as the original sourceof description of an experience, whereas cases are theentities in a case-based reasoner that permit experiences tobe manipulated and retrieved. Allowing the properhandling of such experiences demands a more structuredrepresentation of the experiences.

Each experience, or class of experiences, has tobe evaluated in order to be properly associated with thegoal task in a CBR system. Medical experiences might berelated to diagnostic tasks, while legal experiences mayrelate to interpretation or classification. Once thereasoner’s task is defined, a knowledge engineer canevaluate whether one or more classes of experiencesshould be used as the knowledge source of the system.After defining the scope and the task of the system, it iseasier to envision a proper structure to represent thechosen experiences.

Knowledge engineering deals with theacquisition and representation of domain theory, rules ofthumb and any available knowledge that human expertscan provide and express. The way of facing an experiencewithin a given domain knowledge in terms of what tohighlight in a case representation is a controversial andpolemic issue. Narrowing such abstraction into morespecific problems is an appropriate guideline. Hence, letus discuss the viewpoint of representing experiences with

4O

From: AAAI Technical Report WS-98-12. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved.

cases within the legal domain and figure out some lessonsthat may be extended to other fields.

The distinction between domain theory and factsof life has been pursued by several authors in ArtificialIntelligence and Law (AI&Law) such as Smith (1987,1997a, 1997b), Branting (1991), Valente (1995). Tryingto discriminate between these two different aspects inlegal experiences has not proved efficient, as most ofthese researchers are still searching for approaches thatenable intelligent legal systems to succeed in real worldapplications. Hence, we suggest to interpreting the legalexperiences in a very human fashion in instead ofseparating domain theory from other aspects. Judicialprofessionals are the experts that interpret and deal withlegal experiences. The lawyers’ expertise is indeedinterpreting legal texts that describe legal experiences.Thus, this is the ability that might be represented in anintelligent legal system: the lawyers’ expertise ininterpreting legal experiences. Consequently, the casestructure should follow guidelines that comprise the legalexpert’s standpoints of the legal experiences, and theknowledge about this interpretation is the object to beelicited.

Once the structured representation of a case isobtained through a knowledge engineering effort, theproblem results in mapping the textual experiences thatembed the case content into the structured representationof cases. This conversion is what enables the developmentof the PRUDENTIA system, an interpretive case-basedreasoning system that retrieves the most useful cases tosupport jurisprudence research. The ultimate goal of thisproject is to provide judicial professionals with anintelligent research tool enabling a quick and efficientjudicial system. Cases in this reasoner are descriptions oflegal decisions that are originally written in naturallanguage text. PRUDENTIA searches for legal situationsthat can be useful in teaching lessons to a new situation.The system returns similar situations that are foundthrough analogical reasoning simulated by the CBRinference. This current version of PRUDENTIA comprisesdescriptions of 3,500 legal decisions that representknowledge source of jurisprudence research. The case-based reasoner performs analogical reasoning, comparinga new legal situation to the legal decisions in the casebase, and returns a set of similar situations.

2. Mapping Texts into Cases in the LegalDomain

Our specific problem is mapping a textualdescription of a legal experience into a structuredrepresentation under the guidelines provided by expertknowledge. The guidelines impose goals and constraintsto keep the structured representation consistent with theexpert interpretation of a legal experience. In practicalterms, the correct representation must result in legalexperts comprehending the same content when reading the

textual description as when reading the structuredrepresentation.

Employing expert guidelines does not excludeknown CBR guidelines to define an indexing vocabulary.On the contrary, they transcend them. One basicrequirement is that the expert is able to envision the wholecollection of experiences as to anticipate values for everycharacteristic. Let us describe the methodology applied inthe development of the PRUDENT|A system.

The system performs the same task as judicialprofessionals when searching for legal cases injurisprudence. When this task is performed by humanexperts, they conduct the search by comparing aninterpretation of a given legal situation to interpretationsof the legal descriptions. Experts seek for similar legalsituations that might provide insights to the new situation.Human experts employ reasoning by analogy (Durkin,1994) when performing this task. Therefore, CBR is theappropriate technology to accomplish the task of research,as the only intelligent paradigm that simulates analogy.The result of the development of this large CBR system isequivalent to furnishing a human expert with the memorycapacity and speed of a computer.

Jurisprudence research is employed and requiredin several activities within the legal domain. The legalprofession embodies different activities ranging fromadjudication and consultation to legal administration andeducation. Predominant activities of judicial professionalscan be categorized into different fields of legal activity:legal planning, argumentation, adjudication, legalmanagement, legal analysis, teaching and legislation,among others.

In the legal activity of adjudication, judges aresubject to a methodology supported by laws to buildingsentencing. In this methodology, one of the requirementsis that judges must use jurisprudence research as part ofthe process of stating and supporting their decisions.Meanwhile, defense attorneys and prosecutors attempt toprove their points, laying the groundwork ofargumentation on jurisprudence. Within most of theseactivities, jurisprudence research stands as a relevant toolthat augments the correctness of every task. Effectivejurisprudence research promotes a just society.

The focus of our research is Brazilianjurisprudence. The Brazilian system of law is civil law,which is derived from Roman law and which is practicedin most European countries. Civil law has organized codesas the main source of law. This systems differs from thecommon law system of America and England in which thebasic source is case-by-case judicial decisions. In theBrazilian system, decisions are consequently one sourceof the Law but it is not the most important one.

As the source for legal decisions, we are focusingon legal decisions produced for criminal appeals by theState Court of Justice (SCJ) of Santa Catarina - intermediate appellate court - in the period from 1990 to1996. SCJ records from this period consist of 17.2 Mb of3,447 machine readable complete descriptions of legal

41

cases (not only abstracts). These records comprise 2.5x106words, with 107 characters. These descriptions are thebasic entity of our application. They describe theexperiences that are the cases in the CBR system.

3. Mapping Legal Decisions into Cases in

PRUDENTIA

The presence of stereotypical substructures in thelegal texts facilitates the process of performing theautomatic mapping of the experiences into cases. Branting& Lester (1996) suggested rhetoric structures of legaldocuments what oriented us in the definition ofsubstructures that can easily relate to some aspects of eachexperience. Experts can associate each substructure tosome important information that can be used to valuefeatures in a formlike representation of cases.

Inew legal I

situation

SmJA’nONASSESSmeNT

targetcase

user

~s sorted by

L.~ similarity

TSIMILARITY ,I

ASSESSIVENT P_,~~

.= base

Figure 1. System architecture.

The conversion of textual experiences intostructured representations of cases is performed throughthe steps of organizing the textual experiences infunctional substructures and associating thesesubstructures with features in a formlike caserepresentation. Since we have small parts of texts wheresome information can be extracted, the problem is nolonger CBR related, but from this point is instead a naturallanguage problem.

PRUDENTIA’S basic architecture is laid out inFigure 1. The inference starts with the identification of anew legal situation. This occurs when a judicialprofessional performing usual legal activities encounters anew legal situation that requires jurisprudence research.The legal professional starts a session in PRUDENTIA withan interpretation of this new legal situation in mind. Thesystem attempts to elicit the new legal situation from theuser’s mind through the process of situation assessment.Situation assessment methods infer values to assign theattributes in the formlike representation of system’s cases,

modeling the new legal situation in the same fashion ascases in the case base. The system then compares the newlegal situation - henceforth referred to as target case - toevery candidate case in the case base. A similarity metricmeasures the value of each similarity that is used to sortcandidate cases to comprise the outcome of an iteration.The next sections present the participating processes andissues in the implementation of PRUDENTIA.

The case base comprises the collection of casesand the mechanisms used to connect cases to thearchitecture. Organizational structure in CBR theoryrefers to the way cases are organized in the case base. InPRUDENTIA we make use of a formlike representation ofcases that are organized in a flat structure. The flatorganization of cases is implemented through a relationaldatabase, allowing a great amount of cases in the casebase. Cases in this reasoner are modeled with a formlikerepresentation, i.e., a set of fields (attributes) properlyvalued.

The case engineering problem starts from atextual description of a legal experience that has to bemapped into a structured representation under theguidelines provided by expert knowledge. The guidelinesimpose goals and constraints that keep the structuredrepresentation in accordance with an expert interpretationof a legal experience. In practical terms, the correctrepresentation must result in most legal expertscomprehending the same content when reading the textualdescription as when reading the structured representation.

petition type The entire scope of the currentdomain comprises around 200 petitiontypes that vary in occurrence andimportance. Crim/nal appeals, cross-appeals, and habeas corpus areexamples.

number The number given to order legaldecisions.

reporter The name of the reporter who issueslegal decisions.

district The district where the original act thattriggered the lawsuit has taken place.

page Localizes the decision in the textualfile.

date Date of decision.foundation(1 ,n) Foundation is the basis on which an

appealisfounded.theme(1 ,n) Seconda~ legal issues and

circumstances.seconda~laws Secondary laws that may be brought

up to support formal actions.category The charges (felony or misdemeanor)

that correspond to an article of a lawor to the Constitution.

result The court decision, either positive,negative or neutral.

unanimity Unanimous decisions are not eligibleto be reviewed by a hiQher court.

Table 1. Attributes in PRUDENTIA.

The formlike representation consists of a set ofattributes that embody the content and context of the

42

experience in which the knowledge will be conveyed.Every attribute properly valued represent a descriptor thatsupports case representation. Attributes in PRUDENTIA are:petition type, number, reporter, district, page, date,foundation (1,n) [for 1 to n values], theme (1,n),secondary laws, category, result, unanimity. Theseattributes are described next in Table 1.

The indexing vocabulary is the essence ofretrieval. Essential indexes are those that areindispensable in guaranteeing similarity assessmentsuccess and consequently ensuring an efficient retrieval.Foundation and theme comprise the set called essentialindexes. The qualification proposed for such indexesstems from their nature and relevancies in retrieval.

Basic indexes consist of values for attributescategory and petition type and at least two values ofessential indexes. The basic indexes constitute theminimum subset of indexes needed to start the situationassessment process. Hence, we consider these basicindexes as the minimum values to represent an experiencewithin the current context.

4. Natural Language, InformationExtraction and Template Mining

Text-based applications of Natural LanguageUnderstanding deal with several issues, among them, theone of extracting information from texts. InformationExtraction (IE) is the area concerned with extractingspecific types of information from large volumes ofunrestricted text containing information in some domain(Lehnert, 1993, 1994, 1996). An IE system must be inputwith domain guidelines that specify what to find and whatto extract. Within this field of knowledge, one very simpleand easy-to-use technique that has proven efficient isTemplate Mining.

Template Mining is a NLP technique thatextracts data from texts when the text forms recognizablepatterns from the target to be extracted or itssurroundings. A template carries information on what tosearch in the text and it is triggered to extract the partsindicated (Lawson et. al., 1996).

Employing template mining to the rhetoricalsubstructures defined in the textual documents representsa solution to the conversion of legal textual experiencesinto structured cases. Template mining methods search thesubstructures of the texts extracting values that are used toascribe features in the structured cases, representing andindexing them. Next, we briefly describe the methodsemployed to extract values to the following attributes.

4.1 Extraction of Values for IndexingAssignment

Cases in PRUDENTIA are represented through aformlike representation that comprises the attributes asdescribed above. Most attributes are single, i.e., receiveone single value, while indexes theme and foundation can

have multiple (1,n) values. The methods that makepossible the indexing assignment of each attribute arepresented next.

The value for petition type is extracted directlywithout any expert knowledge because the second line inthe heading of legal texts begin with this value. Only aconfirmation that the complete expression is extracted isrequired. This is performed by checking the list thatcomprises 204 petition types. This same procedure isexecuted for other attributes in the substructure heading,that are reporter, date, district, number and page.

Category refers to the law that has originated thelawsuit, e.g., a felony or a misdemeanor. The format of theattribute is one of a list of words that represent the title ofthe article or the law. The possibility to value the attributewith the number of the article or law is future work.Meanwhile, we use the number of the law as a means ofeliciting the textual value for the attribute.

The method for category starts on thesubstructure body:categorization. This substructure is aparagraph that usually brings the specific article, law andsource. It is written in a sequence following the districtwhere the felony has been committed and after anexpression equivalent to "for infringing articles 26 & 97of Penal Code" (e.g., penal code).

The method extracts the law and its source andtranslates the information to the title of the category. Inthe current application, this process has ascribed values inabout 2,600 cases. The remaining texts do not contain thelaw specified and we have to turn to the following stage.

The second process in this method is held by asearch on the substructure abstract:main. This secondstage searches for the title of the category that is fed bythe results in first stage, complementing informationreused in former versions. This process resulted in another800 cases with values for category. The 100 left weregiven back to the experts for evaluation. Again, here weface a difficult problem: recognizing when a given valueis not present in the original text. This happens at times inappeals that are rejected without detailed considerations.The incremental process continues until every case isproperly valued.

Index foundation represents legal aspects ormaterial facts that substantiate an appeal or its decision. Adifferent strategy is deployed to extract values forfoundation: direct search for lists of expressions in therespective substructures. The most relevant values forfoundation are usually given in the abstract:main. Othersare found in substructure body:conclusion. Therefore, wemine first in abstract:main and afterwards inbody:conclusion.

The very interesting hint concluded from theknowledge acquisition is that some specific expressionssimply cannot mean anything else but a foundation,especially if they are mentioned in certain portions of thetext. Thus, experts come up with heuristics where we canassume that an expression such as first offender or

43

negligence necessarily indicates a foundation if it appearsin those substructures.

However, some expressions have the semanticmeaning guaranteed when extracted from the abstract thatdoes not hold for the substructure body. Hence, thismethod has an intermediate stage that treats some wordsdepending on the substructure they have been extractedfrom. Examples would be expressions such as blame anddifferent conjugations of the verb to confess that revealtheir relevance in that decision if they are mentioned inthe substructure abstract. These expressions within bodymay be simply part of an explanation, not necessarilyindicating an important issue within the context of thedecision.

The attribute foundation is multiple-valued. Thisis because it is an attribute responsible for representingevery aspect grounding the appeal. Moreover, we searchfor expressions in different substructures resulting in alarge number of values. However, it is possible that, insome cases, there are only one or two values forfoundation.

After extracting a number of values from acollection of cases, experts have noticed the resemblancebetween some words. The list of expressions had to bereviewed by experts who indicated expressions that workas synonyms within the legal context. For instance, thewords jail, custody, prison and penitentiary; andsometimes the verb to arrest. This list of synonymsimproves efficiency augmenting the retrieval of usefulcases in a human fashion.

In the current prototype, the process started fromthe reutilization of the list of expressions of the formerones. The incremental process took place until every casewas properly valued.

The index secondary laws refers to articles oflaws that are mentioned throughout the texts. This mayhappen when a different categorization is pursued or, forinstance, when no substantial matter is to be considereddue to an annulment caused by some error of law. The lawerrors are usually indicated by the respective law.

The law articles may indicate arguments used byone of the parts in validating assumptions. The differentsources of law demand a two dimensional valued attributerepresented at the level of the number of the article or lawand the source, such as Article 12 of Federal Constitution.

Values for secondary laws are not searched inany specific substructure, but in the whole text. Themethod is implemented by first extracting the articles thatrefer to the categorization, since they are the value of theattribute category. Then, we select only portions of thetext that contain numbers. Template mining is used to findthe valid sources after expressions, such as article and itsvariations, through wild cards.

In the substructure abstract, the last paragraphstarts with a sentence where the value for unanimity canbe extracted: abstract:result. In the occurrence of adissenting opinion, the text reads under majority of votes.There might be at most ten inflections to express such

characteristic, making it very simple. The value forunanimity is Boolean, because a decision is eitherunanimous or not.

There are different fashions to express if apetition has been affirmed or not, and these forms vary incorrespondence with the type of petition. The substructurein the texts is very stereotypical. The knowledgeacquisition was carried out for the second prototype forhabeas corpus and criminal appeals. As the forms ofindicating the result vary in accord with the petition type,we were able to implement only one method. The methodselects the substructure abstract:result and runs a kind of ademo rule verifying the petition type and orienting to aspecific knowledge base where the respective list ofexpressions is.

This is the only method that was implemented inProlog, as the use of logic programming turned out to bemore efficient than the use of wild cards for this attribute.Except for the rules and the design of different knowledgebases, this method is not amenable to template mining forsingle expressions. The expressions resulting from theknowledge acquisition process yielded expressions thatdemanded a NLP treatment.

The first requirement for the rules related to theresult is the petition type, because the result depends uponthe petition type, and the result may be expressed withdifferent terms. For instance, in petitions for habeascorpus, the verb used to express its acceptance isconceder (concede, affirm, accept), whereas the verbdenegar (refute, reject) is used to reject the petition. different types of petitions, other verbs are employed toexpress acceptance, such as the verb prover, that is asynonym of accept although it is not used in certain typesof petitions. This information is obtained by theknowledge acquisition step. It narrows the problem in asuch a way that we can draw rules as, "If petition type ishabeas corpus then search in the substructureabstract:result for the verbs conceder and denegar".

This example demonstrates the use of expertknowledge in orienting the search for the proper values inthe text. The system is designed to return a warning if avalue is not found. Whenever a new expression is used bya reporter avoiding the system to trigger any rule, thesystem informs this failure and a new rule is created. Thisdevice guarantees efficiency and aids the maintenance ofthe system.

The index theme refers to some secondaryaspects or circumstances that characterize cases. Thecomplexity of these indexes stems from the fact that theywere defined to complete the universe of the attributes indescribing the content and context of the experiences onlegal decisions. Values for attribute theme that may bepresent in a legal case can be grouped into classes of thesame nature: the class of tests required (mental healthevaluation required, evaluation of drug dependencyrequired); the class of application (application forabatement, application for suspension, application forabatement); the class of external context (traffic accident,

44

strikes, penalty reduction). The assignment of values forthis index completes the task of automatic indexassignment, as the definition of this attribute hascompleted the task of case representation.

In the current prototype, this index has beenvalued via template mining applied to the selectedportions of substructures abstract:main and body. Theincremental process of knowledge acquisition was used on5% of the cases. This is the only attribute to which analternative method was conceived, i.e., the reuse of caseswith elaboration.

5. Retrieving and Reusing ExperiencesCase retrieval results from the similarity

assessment performed between each candidate case andthe target case. Similarity assessment follows expertguidelines in terms of comparing and contrasting relevantvalues to the proper interpretation of the content of thelegal experiences. The fact that similarity assessment isemployed at the level of the values indicated by expertsensure a reliable comparison. The legal expert is the onlyone who knows what makes a legal experience similar toanother. Following experts’ guidelines ensures an effectivesimilarity assessment.

Besides indicating at what dimensions tocompare experiences, experts also specify how to comparethem in terms of a range of similarity. For instance, if twovalues are absolutely different or similar, the valuesassigned are 0 and 1, respectively; if they are very similaris assigned a value .8.

The advantage of comparing legal experiencesunder expertise guidance is that it reduces the gapbetween usefulness and similarity. Useful experiences aremore likely to be reused. Experts can indicate what typesof values better index cases in order to explore theirusefulness for further reuse. The representation ofexperiences through structured cases results in an abstractinterpretation of the experience, another advantage incomparison to database approaches.

6. Textual CBR vs. Text DatabasesThe structured representation provided by the

knowledge engineering effort results in an interpretationof the legal experiences proving another advantagebesides enabling the similarity assessment betweenexperiences. The valued features comprise aninterpretation - an abstraction - that aims at providing tothe expert the same information as the original text does.This results in a huge time savings, as experts and usersdo not need to read the whole textual description tounderstand and evaluate the usefulness of a givenexperience to the current problem.

Text databases employ statistical indexing thatserves exclusively the purpose of retrieving cases. Textdatabases retrieve the whole texts from each query,

forcing the users to read each text to decide their utility.The low precision of these systems causes the retrieval ofmany useless documents increasing difficulty of the task.Text databases evaluation of efficiency can be performedby two parameters: recall and precision Salton (1975).Recall is the proportion of useful documents that areactually retrieved from the base. Precision refers to theratio of actually useful documents that are retrieved.Database systems have been found to be limited to a recallof 25% of relevant cases (Blair and Maron, 1985),meaning that the user has to read all the texts retrieved toconclude that only 25% are useful. The low quality of theretrieval may be dangerous in the legal domain where realrelevant issues are under question.

The low accuracy of text databases might resultfrom the use of statistical methods of indexing. Statisticalmethods do not use knowledge, i.e., they select termsdepending upon their frequency of occurrence. Bycontrast, the similarity-based retrieval employed in CBRsystems is essentially based on knowledge. A knowledge-based indexing process guarantees more efficiency,because precision increases as the indexes guidingsimilarity and retrieval are chosen with expertiseknowledge increasing the chances of retrieving usefulexperiences. A statistical indexing might select twoindexes that are two versions of the same expression,increasing the importance of the documents that use thetwo versions and decreasing the chances of an equallysimilar document that may have used only one version.Moreover, knowledge-based indexing avoids low levels ofprecision since the chances of retrieving uselessexperiences decrease.

7. ExamplesA new session in PRUDENT1A begins with a

research issue brought up by a judicial professionalperforming any legal activity. This legal expert becomesthe expert user that uses PRUDENTIA to perform anintelligent jurisprudence research to figure out lessons andsolutions to a given legal situation carried in mind.

The first goal of the system is to elicit this newsituation from the expert user’s mind. The user is firstasked to fill out values for petition type and category andthen asked to write down a brief summary of the situationthat originated the session.

7.1 Example possessionIn this example, the summary of the legal

situation reads as follows: "The defendant wants toappeal from his convictions of illegal possession of drugsbased on the small quantity of cocaine confiscated thatindicates the absence of commercial purposes".

From the summary written by the user, thesituation assessment process in PRUDENTIA is able toassign values for indexes as follows:petition type = criminal appeal

45

category = illegal possession of drugsfoundation (1) commercial purposefoundation (2) confiscatedtheme (1) quantity of drugtheme (2) cocaine

In the present example, the system considers theindexes assigned to be sufficient and uses these values tocreate the target case. The retrieval results in two casesvalued with 100% similarity. The system offers an optionwhere you can see all descriptors of the selected case.This is a very important feature because one of thecomplaints of users of available database systems is thenecessity of reading the whole legal decision in order toidentify the its usefulness. The way cases are modeledprovides to the user the same result as a brief reading ofthe text.

If the user is motivated by a client who isresearching for an appeal, one of the aspects to check iswhether both cases scored with 100 had a positive result,meaning that the appeal was affirmed. Usually, a positiveresult is more likely to indicate a direction for the userwho wants an affirmative result, while negative results canwarn of possible failures. In this case, the first ranked caseis the only positive, thus this is the natural choice ofconducting research.

This example brings up the issue of whether theattribute result should be used as an index. A retrievalentailing result as an index would cause retrieval withhigher scores in cases in compliance with the resultdesired in target case. These are two combinations ofindexes that should be left to the user to decide.

Besides the fact that the quantity of the drugisIundersized, there are the lack of other elements toIauthorize the conviction in terms of the article 12 fromIthe Law number 6.368/76, such as the identification oflany witnesses that could have or had purchased the drugas well as devices usually used in drug traffic.

Figure 2. Excerpt from legal decision text.

The view of the formlike representation of theselected case allows the user to understand and interpretthe case without reading it. With such a knowledge-basedinterpretation of the experience, the user can decidewhether or not to reuse such experience. In the currentexample, the user considers the context of theinterpretation worthy of further reading.

From reading the legal decision, the user selectsthe excerpt laid out in Figure 2, since it teaches animportant lesson. The excerpt in Figure 2 is a paragraphthat states that besides the tiny quantity of drug, there arestill other elements missing that are necessary to form aconviction in terms of the applicable law. The missingelements mentioned could be the identification of awitness who has actually acquired drugs from thedefendant or even a device (such as a precision scale)used for commercial purposes.

The user should not stop the research processyet. There are two ways to continue: either searching for asecond similar case with an affirmative result, or selectingvisually by looking at the attributes of the first rankedretrieved cases.

Looking at the attributes that summarize thecontent of other cases, the user notes the fifth best rankedalso has a positive result and has a value for foundation(2) = annulment. This suggests taking a closer look other attributes and at the text. The interface showing theretrieved cases keeps a small window with part of thedecision of the selected case. The fifth case in the rank,describes an explanation for an annulment, i.e., lack ofconsciousness of the intention to sell the drug. Taking asecond look at the case attribute values, it is found that thevalue of theme (6) is violation of principle. From thedecision, the user extracts another lesson to supportargumentation, that is, "the violation of the correlationprinciple between sentencing and indictment nullifiesdecision".

The example demonstrates how easy it is for thejudicial professional to perform effective jurisprudenceresearch using PRUDENTIA. The example above alsohighlights the usefulness of keeping values for attributesthat are lessons and not necessarily only indexes to guideretrieval. Let us now compare a similar search in a textdatabase.

7.2 Comparison I

The first point of comparison refers to theselection of keywords to compose a query. Theknowledge-based interpretation in PRUDENTIA provides alexicon that allows words to be identified automatically bythe system. That is, when the user types a small abstract ofthe situation, the system is able to recognize if there arewords that are part in the lexicons. The closest procedurein a text database would be a manual search in the fieldsof words that occur in each field.

In the text database, the user has to compose aquery. Let us build a query with the same words as thevalues assigned in the example in PRUDENTIA. The queryis as shown in Figure 3:

"criminal appeal" cocaine confiscated commercialIpurpose illegal possession of drugs ’ quantity of drug" I

Figure 3. Query for example "possession".

This query resulted 23 documents. The user nowhas 23 legal decisions to read in order to decide on theirpossible usefulness to the initial situation. According toBlair and Maron’s evaluation, these 23 results indicate thatthere might be as many as 100 useful decisions in thisbase. We can agree with this estimate because:

¯ . Possible misspellings have not beenconsidered;

°. Documents from other types and categories arenot retrieved, since there is no partial matching;

46

°. Keywords are used at the same level ofimportance;

°. Mistakes are possible in building the query.One solution to possible misspellings is the use

of wild cards. However, even when available, wild cardsdelay the time of search significantly. The databasesystem illustrated does not allow more than one wild cardat the same query. Moreover, even if the system allowedas many wild cards as necessary, the user would have tolook at the field of words manually, searching for everyvariation of occurrence of each word, a time-consumingjob that few users would bear.

The query comprises an AND connector causingretrieval of only those documents that actually carry everyword in it. This avoids the retrieval of any document thatmight have at least one alternative value for an index.According to expert interpretation, a document containingthe same list of expressions with even two or three swapsmight be also useful. An alternative would be the use ofqueries with other connectors such as OR, and XOR thatare also available. The use of XOR is also exclusive andthe connector OR would result in a cost-benefit paradox,since the more documents are allowed to be retrieved, thelower is the precision.

The problem is indeed the equivalence in theparticipation of indexes in the query. The expressions donot contribute equally in building a content and this iswhat prevents retrieval of a set of documents sorted bytheir relevancy.

Finally, we have to consider that even one wrongdigit is enough to lower recall of a query. The chances ofmaking mistakes increases with larger queries. The textdatabase system showed in the example, for instance, doesnot have enough space to show intermediate results inlarge queries. We conclude that this system was notengineered with this purpose.

7.3 Example desertion

Another example refers to the legal situation thatoriginates the research has the following basic indexes:petition type = criminal appealcategory = child desertionfoundation (1) good causetheme (1) civil imprisonment

Retrieved cases are laid out in Figure 1. The fifthcolumn is indicates the category. The first seven cases arecategorized with child desertion except for the fifth one.This emphasizes the question upon the reasons that mighthave caused this case to be retrieved. Even before viewingthe attribute values or the legal decision, we conclude thatthis might be one case where the content is so similar thatits importance grows in relation to the category. This isenough to assume that this case might be useful to the newsituation.

........ ~7 fr ..................IF[I"TI ..............]1 ............................................................IIIII ..........

2B.4!y I’. j 10Q.00...::I/~pela~EoC,I,nhleIiAOANDONOtqLATERIAL extln;|odopunlhllldedcfzlted~mp,,24.6E0:21 i 84.00 ilApele~Eo Criminal ~.ABANDONO ~TERIALonlecedenles Imulel~llo d| lie2B 451 31 ! 84 00 iiiApelefmo C m ell iARANDONO MATERIALcertcledze;lo conlltale27.502:4P i GO.O0 !iiApelefllo Cr m nap ~ABANDONO MATF.RtALeonfhlelo cl;’n/urrte plebe

2E.6416’ G0.O0 iiApele;l~o Cdmlnel IAOANOONO MATERIAL32.387it i 60.00 iilApel,~l* Criminal AI~ANDONO MATERIAL eulpahllldade

:’E LAC~O CRIMINAL. ARGUW..~O DE NULIDADE PROCESSUAL. INEXI~’I~NCIA. RETORNO O0g AUTOII A ,JI~ROCURADORIA GERAL DE JUgTI(~A PAPA AN~LISE DO MI~ IRTO 00 RECURRO.q~toe, ,¢ietndoe ¯ dleculldos el~lee lule~ do ap~laQgo cflmlnel n. 20.~1. de ~om~lce de (~rld(Jml |21, Vale], em ~lUe ~ epelenle Deilton Joe~ Lutz. etndo ep©lede a Justt;i. per leo Promoter:

ACORDAM, em Segunde Cimace Cdmlnel, pot declaRo un&nlme, reiellar ̄ p~ellrnlnar ergUlda ¯ remeler ot autos |doute Procurndodo Gerel de J,stl~a. pace qua ee mnnlleste sable o m/~dlo.Co,tee ne foeme de lel.Ne eomeeee de Cddfimn. D/dLTON JOS~ LUIZ Iol denuncledo ¯ proceeeado pole prMIce do mime do ~d. 111. I2o. Vl. de C~dlgo Penal. rondo ~m ~nte qua no die 29 de lethe de 1987. emilio eem a deride provleh dt rondo|,o cheque n, 678373-2, do BESC daquela cldade, oo valor de Ct9 4O, OOO, OO, caulando dane econEmleo t emplesll

I I I I I.,IJJ ... J ill IUJIII

Figure 4. Retrieved cases from the example "desertion".

The fifth case ranked is labeled with number28.271 in the first column (henceforth referred to as legaldecision 28.271). This case is categorized asembezzlement. Reading the text we find that the decisionaffirms the defendant’s appeal for nullity because thedefendant had not been subpoenaed since he was alreadyin custody due to a civil imprisonment. This is the lessonthat justifies this case as a useful one to the research anddemonstrates the case deserves the position among thebest retrieved cases.

7.4 Comparison

Supposing that the same situation from theexample desertion that originated this research hadmotivated a similar research in a text database system.The user has to build a query, and the same values thatwere assigned in PRUDENTIA are used as keywords,namely criminal appeal, child desertion, good cause, civilimprisonment.

There are two ways of using keywords in textdatabases: as primary indexes that partition the base; andas simple keywords, that search for every occurrence inthe base. Values of the category, foundation, and thememay occur in any document associated to any type ofpetition. Hence, it seems that the type of petition shouldalways be used as a primary index in order to reduce thepossible number of documents retrieved. Conversely, ifone tries to use the type of petition as a keyword, it willcause retrieval of every occurrence of this expression,even in documents of different types.

Using special designated fields as primaryindexes always decreases precision in favor of a betterrecall. Therefore, to have a similar result for the researchas the one obtained in PRUDENTIA, the search could notexclude other values of category, for instance, becausethat would avoid the retrieval of useful case such as thelegal decision 28.271 described in the previous example.

47

8. Concluding Remarks

Initially, we have developed a prototype usingonly court decisions on habeas corpus petitions in murderlawsuits to demonstrate the potential of a case-basedreasoner to retrieve legal cases. The descriptors thatindexed the cases were chosen attempting to capturestrengths and weaknesses of the texts to provideusefulness in retrieval. The first prototype was developedin an application development tool and was tested with 22cases.

The response from legal experts motivated us todevelop a reasoner able to embody all types of legaldecisions. The legal experts suggested relevant descriptorsand some features to the interface. They also suggested afeature to perform new retrievals based on a smaller set ofdescriptors to be chosen by the user. The requirements ofdomain expert knowledge became evident in thedevelopment of the CBR problem areas such as similarityassessment and situation assessment. The implementationof the reasoner is essentially guided by expert domainknowledge.

The second prototype was then developed fortwo petition types in the criminal area: criminal appealsand habeas corpus. The case base comprises 138 casesthat have been modeled semi-automatically.

The third prototype, PRUDENTIA Prototype IIIembodies a collection of 3,500 cases that have beenautonomously converted into cases. The cases representthe experience of all criminal appeals that were submittedin the State Court of Santa Catarina in the period from1990 to 1996.

The next stages will be to generate a case base,first for all habeas corpus petitions in this same period andnext for all other petitions in the criminal area. This stepwill increase the size of the case base to approximately10,000, and the knowledge already elicited will be reused.The required knowledge acquisition will focus on legalaspects related to the new categorizations of the remainingsub-domains. These next stages focus on improvingexecution time and precision.

The next big effort in knowledge acquisitiontakes place in the beginning of the modeling of decisionsunder the civil area. The inclusion of civil decisions in thecase base comprises a new implementation of themethodology. As explained in section 5.2, the incrementalprocess starts from reusing knowledge from the previousimplementations and performs new knowledge acquisitionprocesses while it is necessary. In the assignment of thefoundation, most formal principles are reused. Concerningcircumstantial values, new issues must be considered sincethere will be nothing such as murder weapon. Howeverintention to cause harm is the same intention to commit acrime.

The knowledge embedded in texts has to bemade available for future use. The problem of retrievinginformation and knowledge from texts stems from theincreasing amount of information and knowledge thathumankind must deal with. Storing such information and

knowledge was facilitated by the advent of writing and ithas become even easier with the computing technology.Hence, the problem of accessing information has onlycome up as larger amounts of information were stored.Retrieving the information demands a computationalsolution, but this solution should consider human needsand approaches to retrieving information and knowledgeIt seems appropriate to consider a solution that embedsthe representation of some aspect of human cognition suchas the analogical reasoning.

Textual CBR systems outperform text databasesystems in efficiency in retrieving knowledge andinformation from texts. The advantage of CBR systemsstems mainly from the knowledge-based approach toindexing that is the essence of similarity-based retrieval.Since statistical methods select only indexes with amedium ratio of occurrence, terms that appear very oftenor very seldom may not be selected. In addition, termswith similar meanings might be selected misleadingretrieval results. This seems to be the main reason of thelow efficiency of such systems that supports the use ofknowledge-based systems to the legal domain.

The use of a case-based reasoning system toretrieve textual documents is a means of representing theknowledge-based part of the search through AItechnologies. In comparison with the use of a textdatabase system, parts such as the construction of a queryand the definition of the relative importance of thedocuments are automated. This increases efficiency sincethe search is not subject to human errors. Moreover, theintelligent system is consistent concerning domainknowledge.

9. References

Ashley, K. D. 1990. Modeling Legal Argument:reasoning with cases and hypotheticals. ABradford book. The MIT Press, Cambridge,Massachussetts.

Blair, D. C. and Maron, M.E. 1985. An Evaluation ofRetrieval Effectiveness for a Full-TextDocument-Retrieval System. Communications ofthe ACM 28(3):280-299.

Blair, D. and Maron, M. A. 1990. Full-text informationretrieval: further analysis and clarification.Information Processing and Management26(3):437-447.

Blair, D. 1996. STAIRS Redux: Thoughts on theSTAIRS Evaluation, Ten Years After. Journal ofthe american society for information science47(1):4-22.

Branting, L. K. 1991. Building explanations from rulesand structured cases. International Journal ofMan-Machine Studies 34:797-837.

Branting, L. K. and Lester, J. C. 1996. JustificationStructures for Document Reuse. In Advances inCase-Based Reasoning: third European

48

Workshop; proceedings/ EWCBR-96, 14-16,Lausanne, Switzerland, Ian Smith; Boi Faltings(ed.), Berlin: Springer.

Daniels, J. J. and Rissland, E. L. 1995. A Case-BasedApproach to Intelligent Information Retrieval. InProceedings of the SIGIR ’95 Conference.Seattle WA, USA: ACM.

Gelbart, D. and Smith, J. C. 1997. FLEXICON: AnEvaluation of a Statistical Ranking ModelAdapted to Intelligent Legal Text Management.In the Fourth International Conference onArtificial Intelligence and Law, ICAIL-93, 142-151. Vrije Universiteit Amsterdam, TheNetherlands: ACM.

Hahn, U. and Schnattinger, K. 1997. Deep KnowledgeDiscovery from Natural Language Texts. InProceedings of the 3rd Conference onKnowledge Discovery and Data Mining,KDD’97.

Lawson, M.; Kemp, N.; Lynch, M. and Chowdhury, G.1996. Automatic Extraction of citations from thetext of english-language patents -an example oftemplate mining. Journal of information science22(6):423-436.

Lehnert, W. and Sundheim, B., 1991. A PerformanceEvaluation of Text-Analysis Technologies. AIMagazine:81-94.

Riloff, E. and Lehnert W., 1992 Classifying Texts usingRelevancy Signatures. Proceedings of the TenthNational Conference of Artificial Intelligence,329-334.

Riloff, E. and Lehnert W., 1994 Information Extraction asa Basis for High-Precision Text Classification.ACM Transactions on Information Systems12(3):296-333.

Rissland, E.; Skalak, D. and Friedman, M. 1996. UsingHeuristic Search to Retrieve Cases that SupportArguments. In Case-Based Reasoning:Experiences, Lessons, and FutureDirections, Ill-124. David Leake, ed.:AAAIPress/The MIT Press.

Salton, G. 1975. Dynamic Information and LibraryProcessing. Englewood Cliffs, New Jersey:Prentice-Hall, Inc.

Smeaton, A.; Kelledy, F.; O’Donnell, R.; Quigley, I.;Richardson, R. and Townsend, E. 1995. LowLevel Language Processing for Large ScaleInformation Retrieval: What TechniquesActually Work. In Proceedings of the Workshopon Terminology, Information Retrieval andLinguistics, CNR, Rome, October 1995. In AlanSmeaton’s Online Publications. [Online]Availablehttp://lorca.compapp.dcu.ie/~asmeaton/pubs-list.html, [September 12,1996].

Smeaton, A. 1997. Information Retrieval: Still ButtingHeads with Natural Language Processing? InLecture Notes in Computer Science, Pazienza(ed.):Springer-Verlag. In Alan Smeaton’s OnlinePublications. [Online] Availablehttp://lorca.compapp.dcu.ie/~asmeaton/pubs-iist.html, [May 21,1997].

Smith, J.C. 1997. The Use of Lexicons in InformationRetrieval in Legal Databases. In Proceedings ofICAIL-97, 29-38. Melbourne, Australia:ACM.

Smeaton, A. F. 1995. Low Level Language Processing forLarge Scale Information Retrieval: WhatTechniques Actually Work. In Proceedings of aworkshop Terminology, Information Retrievaland Linguistics:CNR, Rome.

Smith, J.C. and Deedman, C. 1987. The Application ofExpert Systems Technology to Case-Based Law.In The First International Conference onArtificial Intelligence and Law, ICAIL-87, 84-93. Boston, MA:ACM.

Smith, J.C. 1997b (a work in progress) An Introduction Artificial Intelligence and Law: or, CanMachines Be Made to Think Like Lawyers? InArtificial Intelligence and Law, The Focus of MyResearch and Writing by JC Smith. [Online]Availablehttp://www.flair.law.ubc.ca/jcsmith/logos/noos/machine.html.

Smith, J.C., Gelbart, D. and Graham, D. 1992. BuildingExpert Systems in Case-Based Law. ExpertSystems with Applications 4:335-342.

Sundheim, B.M. (ed.) 1992. Proceedings of the FourthMessage Understanding Conference (DefenseAdvanced Research Projects Agency. Los Altos,CA: Morgan Kaufmann.

Turtle H. R. 1991. Inference Networks for DocumentRetrieval. Ph.D. diss., Center for IntelligentInformation Retrieval at the University ofMassachusetts, Amherst. Available onlinehttp://ciir.cs.umass.edu/info/psfiles/irpubs/ir.html¯ [March 12, 1997].

Uyttendaele, C., Moens, M.F. and Dumortier, J. 1996.SALOMON: Automatic Abstracting of LegalCases for Effective Access to Court Decisions. InJURIX.

Valente, A. 1995. Legal Knowledge Engineering: AModelling Approach. (Amsterdam) and Omsha(Tokyo): lOS Press.

Weber-Lee, R.; Barcia, R.; Costa, M.; Rodrigues Filho, I.;Hoeschl, H. C.; Bueno, T.; Martins, A; andPacheco, R. 1997a. A Large Case-BasedReasoner for Legal Cases. Lecture Notes inArtificial Intelligence: 2nd Int. Conference on

CBR, ICCBR97. David Leake, Enric Plaza (ed.)-Berlin: Springer.

49

Weber, R. 1997b PRUDENTIA: Enabling a real worldapplication of Case-Based Reasoning tojurisprudence research. Qualifying Examinationapproved by the Graduate Program of ProductionEngineering at the Federal University of SantaCatarina. In Activities. Available onlinehttp://www.eps.ufsc.br/~rosina/html/activities.html, [October, 1997].

Weber, R.; Barcia, R., Pacheco, R.; Martins, A.; Hoeschl,H.; Bueno, T.; Costa, M.; and Rodrigues Filho, I.1997c. Representing Cases From Texts In Case-Based Reasoning. III Congresso Internacional deEngenharia Industrial e XVII ENEGEP. Canela,RS, Brasil.

Weber, R.; Martins, A., Mattos, E.; Bueno, T.;Hoeschl, H.; Pacheco, R., von Wangenheim, C.;and Barcia, R. 1998. Reusing Cases to theAutomatic Index Assignment from TextualDocuments. In Proceedings of the 6th GermanWorkshop On Case-Based ReasoningFoundations, Systems, and Applications.’249-255. Berlin.

5O