Potential methods in pattern recognition Part 2. CLUPOT —an unsupervised pattern recognition...

13
Analytica Chimico Acta, 134 (1932) 139-151 Eisevier Scientific Publishing Company, Amsterdam - Printed in The Netherlands POTENTIAL METHODS IN PATTERN RECOGNITION Part 5. ALLOC, Action-orientated Decision Making D. COOMANS and D. L. MASSART* Farmaceutisch Instituut. Vrije Wnkersiteit Brussel, Laarbeeklaan IO3, B-l 090 Brussels (Belgium) I. BROECKAERT Dienst Castro-Enterologie. Sint Pieter Hospitaai, Hoogstraat 323, B-I 000 Brussels (Belgium) (Received 20th July 19Sl) SUMMARY The possibilities of action-orientated pattern recognition with the supervised pattern recognition technique, ALLOC, are discussed_ The emphasis is on the importance of the definition of overlapping regions between ciasses as a way for obtaining more information about the separation between classes. Action-orientated classification and feature selection with ALLOC are discussed using the results obtained for two data bases concerning the characterization of the functional state of the thyroid and the determination of the origin of milk samples. In a previous article of this series [l] , classification of objects with the supervised pattern recognition method, ALLOC, of Hermans and Habbema [23 was discussed. A single boundary between each pair of related classes is then needed in the pattern space. The boundary corresponds with the minimum a posteriori probability of error [ 1, 23. This means that an object is classified in the class to which it seems to belong with the largest a posteriori probability (as obtained with the Bayes equation). The use of the minimum a posteriori probability of error rule is based on the assumption that it is not worse to misclassify an object from one class than to misclassify an object from another class. The usual way to evaluate the performance of such a classification rule is to calculate the correct classification rate. Such a classification problem is often called an identification problem [ 21 _ In analytical chemistry, classification problems are often pure identification problems. For identification problems, one often uses non-probabilistic methods such as KNN (k-nearest neighbour rule) [3--51 and LLM (linear learning machine) [ 3, 4, 6, 71 _ In the related domain of medical decision- making based on the results of laboratory tests, however, non-probabilistic methods seldom find application_ The reason is that a medical decision is never related to a simple identification but to an action which may have important consequences. For instance, a physician will not consider a patient ~~~~-2~~~/~2/0~~~-~~~0/$02.75 0 1982 Bisevier Scientific Publishing Company

Transcript of Potential methods in pattern recognition Part 2. CLUPOT —an unsupervised pattern recognition...

Analytica Chimico Acta, 134 (1932) 139-151 Eisevier Scientific Publishing Company, Amsterdam - Printed in The Netherlands

POTENTIAL METHODS IN PATTERN RECOGNITION Part 5. ALLOC, Action-orientated Decision Making

D. COOMANS and D. L. MASSART*

Farmaceutisch Instituut. Vrije Wnkersiteit Brussel, Laarbeeklaan IO3, B-l 090 Brussels (Belgium)

I. BROECKAERT

Dienst Castro-Enterologie. Sint Pieter Hospitaai, Hoogstraat 323, B-I 000 Brussels (Belgium)

(Received 20th July 19Sl)

SUMMARY

The possibilities of action-orientated pattern recognition with the supervised pattern recognition technique, ALLOC, are discussed_ The emphasis is on the importance of the definition of overlapping regions between ciasses as a way for obtaining more information about the separation between classes. Action-orientated classification and feature selection with ALLOC are discussed using the results obtained for two data bases concerning the characterization of the functional state of the thyroid and the determination of the origin

of milk samples.

In a previous article of this series [l] , classification of objects with the supervised pattern recognition method, ALLOC, of Hermans and Habbema [23 was discussed. A single boundary between each pair of related classes is then needed in the pattern space. The boundary corresponds with the minimum a posteriori probability of error [ 1, 23. This means that an object is classified in the class to which it seems to belong with the largest a posteriori probability (as obtained with the Bayes equation). The use of the minimum a posteriori probability of error rule is based on the assumption that it is not worse to misclassify an object from one class than to misclassify an object from another class. The usual way to evaluate the performance of such a classification rule is to calculate the correct classification rate.

Such a classification problem is often called an identification problem [ 21 _ In analytical chemistry, classification problems are often pure identification problems. For identification problems, one often uses non-probabilistic methods such as KNN (k-nearest neighbour rule) [3--51 and LLM (linear learning machine) [ 3, 4, 6, 71 _ In the related domain of medical decision- making based on the results of laboratory tests, however, non-probabilistic methods seldom find application_ The reason is that a medical decision is never related to a simple identification but to an action which may have important consequences. For instance, a physician will not consider a patient

~~~~-2~~~/~2/0~~~-~~~0/$02.75 0 1982 Bisevier Scientific Publishing Company

140

as healthy when the probability that he is healthy is only slightly larger than the probability that he is not; more certain diagnosis is needed because of the risk associated with a misclassification_ Such a risk is usually not equal for the classification of an ill patient as normal and a healthy one as ill. All this leads to the use of action-orientated decision boundaries and the construc- tion of overlapping regions (regions of doubt) between the diagnostic classes. >Ioreover, it is often seen that patients belonging to the overlapping regions

require an action distinct from those where the diagnosis has been established with a high degree of certainty.

On the same basis, some classification problems encountered in non-clinical analytical chemistry may also need an action-orientated approach_ For instance, the detemlination of the origin of milk samples [S, 91 may be considered action-orientated. This happens when the decisions are associated with foocl quality control where for instance pure samples have to be differen- tiated from adulterated samples.

The aim of this paper is to discuss the action-orientated possibilities of _-1LLOC on the basis of the THYROID (l, lO-12) and MILK (S, 9) examples used in previous papers. It is also shown that, even for identification problems, an action-orientated approach, and more especially the evaluation of the amount of overlap between classes, gives rise to a more complete picture of the performance of a probabilistic classification technique.

ACTION-ORIEXTXTED PROCEDURES

Action-orientated classification

Ari action-orientated classificat.ion procedure starts with the determination of the a posteriori probabilities of class membership_ This is done on the basis of the Bayes equation [ 13, 141 _ In a previous article of this series [ 11, it was seen that XLLOC differs from other probabilistic techniques in the way in which the probability densities in the Bayes equation are estimated_

Action-orientated classification with ALLOC can be carried out according to two different philosophies. First, the minimum probability of error classi- fication rule is replaced by the minimum overall risk rule. This means that

the boundary between two classes is displaced so that it is closer to the class with smaller misclassification risk (meaning that it is considered a smaller risk that a patient who belongs in reality to this class is classified into another one)_ Alternatively, a region containing doubtful cases is defined, which means that the boundary is now a zone instead of a line_

The minimum overall risk rule. The minimum overall risk classification corresponds with the classification of an object in the class with the smallest conditional risk. The conditional risk of an object i with measurements xi for a class w,, out of K classes is given by the equation

R (o~/x~) = f ‘pqP (Wq/Xi) q=1

(1)

The minimum conditional risk classification rule reduces to the minimum a posteriori probability of error rule or in other words the action-orientated approach reduces to the identification approach. If one assumes that the prior probabilities in the Bayes equation are the same, the decision boundary between the classes is situated in the position where the two probability density curves cross each other in Fig. 1.

A boundary region can be obtained by considering two different ratios a/b. The boundary which is closest to the w 7- class is determined by considering b >a. This means that a misclassification of an object belonging to class o 1 is considered to be worse than a misclassification of an object belonging to w2_ Therefore the decision boundary with this loss matrix (III in Fig. 1) is displaced to the right of the identification boundary II_ The second boundary can be obtained by taking b < a. This boundary, I, is closer to w 1 than the identification boundary II. The region between boundaries I and III is the overlapping region o ll_ On the basis of these boundaries, it is possible to obtain an estimate of the degree of overlap of classes o 1 and o 2. The number of objects situated in the boundary region is counted and the correct classifi- cation rate is calculated. This refers to objects which are classified in the correct class (not in the overlapping region). A more complete picture is obtained when the degree of overlap is explored in a dynamic way by varying the boundaries I and III. Up to now only two adjoining learning classes have been considered_ However, the concept of overIapping regions can be extended to more adjoining classes and to the multivariate case.

ALLOC defines for the boundaries I and III of the two-class problem of Fig_ 1 a threshold value 6 for the 4 posteriori probabilities. The decision rule for a KcIass problem then becomes: classify i in class oP if

P(uJxJ = max [P(oJx,)] ; q = 1, _ . ., K (8)

and

I? (Oplxi) > 6 (9)

WI uJl2 w2

Fig. 1. CIassification boundaries between two classes w, and o1 according to the loss matrix given by Eqn. (6). Boundaries: I. o > b; II, a = b; III, a < b.

143

If object i is not classified in any class by application of expressions (S) and (9), it is a boundary zone case.

-4 close relationship exists between the loss matrix and S _ It can be shown for the binary decision problem that the relationship between Q, b and 6 is given by

6 = b/(a + b) for P (w ,/si)

and

(10)

6 = a/(a + b) for P (w -/xi) (11)

When symmetrical overlap regions are used, two boundaries are considered so that for the first boundary the loss matrix of Eqn. (6) is used (loss matrix L with 6 > a) and for the sec_ond boundary a loss matrix where a is replaced by b and b by a (loss matrix L with a > b). In this situation the decision rule as formulated in espressions (S) and (9) for two classes is equivalent to: classify i in class 0 1 if

R (w */xi) < R (w&J

and in class o Z if

(12)

E (w JSi) < R (cd ,/Si) (13)

If object i is not classified. it is a boundary zone case. R and E indicate that the loss matrices L and z, respectively, are used. The latter decision rule (espressions 12 and 13) can be considered as an extension of the rule given by espression (5).

Further extensions of the concept of overlapping regions have been dis- cussed by Habbema et al. [16] and Hermans et al. [ 171, but they have not been implemented in ALLOC.

Action-orientated feature selection It has already been shown [9] that the forward feature selection procedure

of ALLOC is directly based on classification rates. Xn accurate way to express a classification rate in an identification problem is the probability of error estimated by means of the leave-one-out classification of the members of the learning set. The probability of error is then given by

P (error) = f P (w,) n,(error)/n4 q=l

(14)

where for wq, 4 = 1, . . . . K (the I< learning classes), P (w,) is the prior pro- bability of wq (for explanation, see [ 9]), n&error) is the number of objects belonging to wq, misclassified on the basis of the leave-one-out procedure, and n, is the total number of objects belonging to wq_

The feature selection procedure described earlier [ 91 may be applied also in action-orientated feature selection on the condition that P be replaced by

R, the overall conditional risk estimated by means of the leave-one-out classi- fication of the members of the learning set. The overall conditional risk is given by

P (w,) 1,, nJerror)/n 4 (15)

xvhere n,(error) indicates the number of objects belonging to wq but which <are misclassified in wp_

The selection- procedure of ALLOC proceeds untii a subset of variables is selected that gives rise to a decrease of R which is smaller than a t.hreshold value T. However, the selection of the subset that minimizes R seems to be a better choice (see also 191).

DXTX BXSES

Differentiation of pure milk from different species and nzixttcres (~IIILK) Statistical linear discriminant analysis was applied to these data by Smeyers-

Verbeke et al. [S] and the same data were used earlier [ 91 for discussion of the feature selection procedure of _4LLOC. The data base consists of different milk classes, each of which has 20 samples characterized by 15 gas chromato- graphic peak measurements. Besides the three pure milk classes - cow (C), goat (G) and sheep (S) -- synthetic binary mixtures are also considered_ In the present study, the same binary differentiations as studied earlier [ 91 are investigated, i.e., the differentiation between a pure milk class and another pure milk class or a related 9/l misture class.

The XILK differentiation problem is a typical exampIe in analytical chemistry where an action-orientated approach may beof interest in practice.

Functional state of the thyroid gland (THYROID) This data base consists of three learning classes :EU, HYPER and HYPO.

For each patient (1~100~1 sample), five laboratory results (RTSU, T4, T3, TSH, fiTSH) are available_ From the clinical point of view, the differentiation EU/HYPER and EU/HYPO is of interest_ Studies on this data base have been reported earlier (1, lo-12)_

RESULTS AND DISCUSSION

Evairtation of the overlap between classes For this purpose, symmetrical overlap regions are used. For symmetrical

overlap regions, the methods discussed above under Action-orientated pro- cedures are the same, and so the method corresponding to expressions (S) and (9) is used. The threshold probability 6 defines the symmetrical overlap region_ In the MILK example as well as in the THYROID example, the overlaps are evaluated on the basis of the percentage of objects which are correctly classified with a 6 degree of certainty, i.e., objects that are classified

1.15

in the correct class (not in the overlapping region). In a binary decision pro- blem, the first value (0.500) corresponds to a single identification boundary.

For the milk esample, the optimal subset of variables selected by the ALLOC selection procechre [9, 181 was xed initially. Table 1 shows the number of samples which are correctly classified with a (5 degree of certainty as a function of 8. It can be seen that more information can Ix obtained

about the separation of the classes by considering different 6 values instead of one single identification boundary with S = 0.500. For instance, although the pure milli classes seem to be separated completely from each other on the basis of 6 = 0.500, a further comparison can be made on the basis of the degree of certainty of the classification; for the differentiation S/G (sheep! goat), no overlap is observed up to a 6 value equal to 0.950 but for the other differentiations C/S (cow/sheep) and C/G (cow/goat) the separation is not so clear-cut. It can be concluded that S/G can be better distinguished than C/G and C/S, and C/S better than C/G. Furthermore, it can be seen that differcn- tiation of the mixture C9Gl from C is not so obvious as indicated by the straightforward identification results. hloreover, the results obtained with large 6 values show that the separation C/CSGI is somewhat more difficult than the separation C/G. Although this was expected, it could not be observed from the classification rate of the simple identification approach. \Ylien a safe overlap region is created by taking, . e.q., h = 0.93, then Table I sholvs that very few, if any, samples can he identified with certainty for the dif- ferentiations C/CSSl, S/ClSS and G/SlGS. This means that, althoug!l a relatively good classification rate is obtained for 6 = 0.300, in practice pre- dictions are very difficult and doubtful in these situations_

The THY XOID example also shows clearly that t!le evaluation of overlap regions gives more complete information about the separability of the

TABLE 1

MILK differentiations showing the number of samples (the maximum, i.e., 1005 correct classification is 40) correctly classified with h degree of certainty

‘5 C/G (1)” C/S (1) S/G (1) C/CSSl (3) S/clSS (3) C/CSGl (3) G/SlGS (2)

0.500 0.600 0.750 0.800 0.900 0.950 0.975 0.990 0.999

40

39

38 38 37 37 35 32 19

40 40 40 40 -10 40

40 40 39 40 3s ‘10 37 39 32 39 20 37

3i 29 20 15

7 2 2 2 0

32

22 12

6 3 1 1 0 0

40 32 39 26 37 10 35 4 3-i 1 18 1 12 1

6 1 4 0

“The number in brackets is the number of variables used in the particular differentiation problem.

146

learning classes and about the confidence that one may have in the classifi- cation of new objects For the ALLOC classification on the basis of the

originaI set of 5 laboratory tests, Table 2 reveals that a better classification rate is obtained with 6 = 0.500 and with the 5 original laboratory tests for the differentiation EU/HYPER than for the differentiation EU/HYPO. How- ever, as the 6 values are increased, it can be seen that for the HYPO class, in contrast to the HYPER and El-l classes, there is no decrease of the correct classification rate. This means that a larger number of HYPO cases can be detected with a higher degree of certainty (see S > 0.95).

This way of presenting results is much more useful in (medical) decision- making. In fact, non-probabilistic pattern recognition techniques such as the linear learning machine (LLM) are not useful at all in medical decision making and in many problems in analytical chemistry where a degree of certainty is required for a decision, even when excellent classification results are obtained, such as have been shown earlier [ l] _

Another example concerns the difference between an ALLOC classification on original data or after preprocessing by statistical linear discriminant analysis (SLDA + ALLOC); the latter has been discussed [12] _ Table 2 shows that on the basis of a simple leave-one-out classification of the learning classes (6 = O-500), no important difference is obtained between ALLOC and SLDA + ALLOC for both the EU/HYPER and EU/HYPO differentiations. However, SLDA + ALLOC performs less well than ALLOC does, because the correct classification rate decreases with increasing 6 value faster for SLDA + ALLOC; the EU cases are usually not detected with a high degree of certainty on the basis of SLDA + ALLOC. A reason for the worse results of SLDA + ALLOC is certainly the loss of information. In the case of binary decision problems, the SLDA + ALLOC procedure supposes that the different probability levels (chosen by means of the 6 values) are parallel hyperplanes in the pattern space. In reality, this is seldom the case. The original ALLOC classification proce- dure, however, is able to discover the real form of the decision boundaries. For the esamples given in Table 2, the loss of information was observed only when higher 6 values were considered instead of simply 0.500. The loss for EU/HYPO is larger than for EU/HYPER. For practical reasons and in situa- tions discussed earlier [ 121, however, SLDA + ALLOC may still be preferred to ALLOC applied on ‘the original variables although the performance is somewhat less good.

Feature selection in the case of symmetrical overlapping regions The feature selection procedure discussed earlier [9] was based on the

maximum correct classification rate without considering overlapping regions. The criterion for the selection of the optimal subset of variables is given above by Eqn (11). The question whether the selected optimal subset of variables also minimizes the overlap between the classes, i.e., maximizes the correct classification rate according to “classification with 6 degree of certainty”, is of some importance_ In the MILK example this was evaluated by calculating

147

TABLE 2

THYROID example showing the classification rates with 6 deLTee of certainty

0 500 99 9i 9s 9; 9; 9; 0.600 99 - .s 1 93

O.GOO 99 9i 9s 97 9; 9; O.GOO 99 s3 91

0.750 99 94 9G.S 96 9-l 95 0.7.50 99 53 91 o.soo 35 9.: 3G 94 91 9 9 .5 0.500 99 s3 91 0.900 97 59 93 .ss 91 sz).:i 0.900 9s .s 3 91

0 950 93 i7 .s 5 .<.: d9 Stj.5 0.950 9; s3 90

0.9i5 33 71 PI.5 $1 d3 s5 0.9;s 95 .s3 59

0.990 90 ti9 79.:; 71 7.1 7 2 5 0.990 89 :: . ‘; dG o.L199 SG 63 7.1.5 4: ti3 .i .i 0.999 71 .s 3 77

at each step of the feature selection procedure the number of samples which were correctly classified with 6 degree of certainty for 6 = 0.500, 0.750, 0.950 and 0.999. Figure 2 shows the diaqams for the binary differentiation of the pure milk samples C/G, C/S and S/G, aad Fit. 3 shows the diagrams for binary differentiations between mistures and the related pure milk class. Figure 2 shows that the pure milk classes are completely distinguishable for each degree of overlap, but higher 6 values require more variables than con- sidered in the optimal subsets of Table 2. This is clearest for the differentiation C/S_ For a simple classification, one variable suffices, but at. least 4 and S variables are respectively necessary in order to obtain certain ties of 0.950 and 0.999. For the differentiation between a pure milk and a mixture (Fig. 3) the behaviour of the curves is more complicated_ When the four binary dif- ferentiations are observed the following conclusions can be reached. First, an important overlap is obsewed between the classes, in each step of the selection procedure; even where a complete separation is obtained with 6 = 0.500 (see C/CSGl), the number of samples which are classified with a high degree of certainty is rather small. Secondly, in order to optimize the classification rate for a higher degree of certainty than 0.5, more variables are needed, i.e., more information is required_ Thirdly, the negative influence observed on the classification rate for low 6 levels when the number of variables used for ALLOC classification is increased up to the ori$naI set of 15 variables, is not observed for higher (5 leveis. This means that lower 6 level classifications are more sensitive to noise, possibly because the more doubtful (i.e., closer to the borderline) the objects are, the more sensitive they are to noise.

The feature selection in the THYROID esample was done in the same wa> as in the MILK esample. Figure 4 shows for the EU/HYPER and EU/HYPO differentiation the relationship for 6 = 0.500, 0.750, 0.950 and 0.999 be- tween the percentage of correctly classified patients (leave-cne-out procedure) and the number of selected variables in each step of the procedure. It can be seen for the EU/HYPER differentiation that at all levels of certainty, a set of 3 variables (i.e., T4, ATSH and T3) is the most appropriate choice. No important change was obtained when the other variables were added to the selected set. The separation is very acceptable : on the basis of the three vari-

40

30

G 2 20

20

5 IO n

--.-- --7.

Y --n. --‘> -

5 IOn

.- ..- _ -_.. . ._ ._ . .._. i_ --+-i)- __ .._=----1 >-

I

::

:k+’

5 IOn

Fig. 2. Differentiation between pure hlILK classes showing the relationship between the number of correctly classified samples with 6 degree of certainty and the number of vari- ables (n) selected by the ALLOC feature selection procedure according to Eon. (1-I). (0)6 = 0.500; (X )S = 0.750; (:)5 = 0.950; (:;)a = 0.999. The final lines shown up to 11 = 10 continue to n = 15.

30

m 0, 20 ”

G IO

30 ./*,

-_._. \.._ . . ____

20 ,__.._ I-.-. ._--:.\._._ -_.___.:

Fig. 3. Differentiation between a pure MILK class and a 9/l mivture class showing the relationship between the number of correctly classified samples with 6 degree of certainty and the number of variables (n) selected by the ALLOC feature selection procedure according to Eqn. (14). (*)6 = 0.500; (x)6 = 0.750; (@)a = 0.950; (a)6 = 0.999.

119

ables 74% of the cases can be dia.gnosed with a certaintv of 0.999. For the EU/HYPO differentiation, it was concluded in previous papers [9, 101 that the combination of two variables (T4 + ATSH) provides the best classification rate and that a further addition of variables causes a slight decrease of the classification rate. This is true for the simple identification (b = 0.300) or with 6 values such as d = 0.750. However, Fig. 4 shows that for the karcer 6 values, a slight increase of the classification rate is observed ivhen more variables are added to the proposed set of two. It can therefore be concluded that the first two variables are essential for a good separation between the classes; further addition of variables produces greater certainty about the classification. This may not be trivial for medical diagnosis. In this respect, the set of five variables is theoretically the most appropriate choice because 77% of the cases can be diagnosed with a certainty of 0.999. _-Is in the 3IILK example, a decrease of the classification rate for small h values toEether with an increase for the larger d values is observed in the KU/I-lYP0 differentia- tion. However this phenomenon is here less ~3ronouncecl.

_Action-orientated feature selection

In the previous section, the feature selection procedure for simple classifi- cation given by Fkln. (14) was evaluated in view of minimizing the amount of overlap between classes. In this section the action-orientated feature selection procedure &en by Eqn. (15) is discussed on the basis of the THYROID example. It is assumed in this case that. the losses associated with a correct diagnosis are zero. Five different ratios between Q and b of the loss matris of Eqn. (6) were considered, i.e., u/b = 100/l, 20/l, l/l, l/20 and l/100. The value b is the loss associated lvith classif\rinc an individual of the EU class in the pathological class and the value a is the loss associated with the inverse situation_ In this way, five single boundaries are assumed. For the ratio l/l, the feature selection is equivalent to the simple classification by Eqn. (14), the ratios 100/l and 20/l correspond to a selection of variables which are essential for screening pathological cases, and the ratios l/20 and

Fig. 4. EU/HYPO and EU/HYPER differentiation of the THYROID example showing the relationship between % correct classification with 6 degree of certainty and the number of variables selected by the ALLOC feature selection procedure according to Eqn. (1-I). (0)s = 0.500; (X)fI = 0.750; (0)6 = 0.950; (i’!)6 = 0.999.

150

TABLE 3

Action-orientated selection of features for the EU/HYF’O differentiation according to dif- ferent loss ratios a/b

Step alb

100 20 1 0.05 0.01

1 T4(O_026)a T4(0.213) T4(0.087) TSH(0.183) TSH(O.012) 2 TSH(0.015) TSH(0.057) ATSH(O_027) T4(0.167) T4(O.G10) 3 ATSH(O_005) RT3U(O_O50) T3(0.040) ATSH(0.150) T3(0.010) 4 RT3U(O.O05) 5 T3i0.005j

T3(0.060) RT3U(O.O53) ATSHiO.677j TSHiOe073j

RT3U(0.150) RT3U(0.010) T3i0.150j ATSHi0.042j

=Numbers in brackets are the overall conditional risks obtained in the particular step of the selection (see Eqn. 15)_

l/100 correspond to a selection of variables essential for screening the EU- cases. Table 3 shows the sequence in which the variables were selected for the EU/HYPO differentiation according to the different loss ratios; the appropriate subset of variables is shown separated from the superfluous variables by underlines. It can be seen that the optimal subset of variables differs according to the ratio a/b. This means that screening of HYPO cases is preferentially done by measuring the variables T4, TSH and 4TSH (a/b = 100) or T4, TSH and RT3U (a/b = 20). For screening of EU cases, only TSH and T4 are needed. It is surprising that T4 is most important for the HYPO screening while TSH is most important for the EU screening_ These results can be explained on a biochemical basis, but this is outside the scope of this paper. Furthermore, it is possible to define overlapping regions on the basis of two screening boundaries (e.g., the ratios 100/l and l/100 which in fact correspond to 6 = 0.99). However, this approach differs from the one dis- cussed in the previous section because this leads to a classification based on two different subsets of variables, one for each boundary_ However, with this approach the classification ratio found was only slightly better than with the single optimal set of variables proposed above (see Fig. 4). For the EU/HYPER differentiation, the influence of the loss ratio on the selection of variables was similar.

CONCLUSIONS

The action-orientated approach can be utilized in two ways involving either the construction of a decision boundary which takes different risks associated with the decision into account or the construction of overlapping regions. Even in identification problems, a more detailed picture will be obtained about the separability of the classes when overlapping regions are taken into account. In this respect, ALLOC is a very attractive software package for

151

pattern recognition. It combines an accurate method for probability estima- tion with the possibility of action-orientated classification and feature selec- tion. A disadvantage, however, is the absence of criteria for the detection of ou tliers.

REFERENCES

1 D. Coomans. I. Broeckaert, -4. Tassin and D. L. Massart, Anal. Chim. Acta, 133 (1981) 215.

2 J. Hermans and J. D. F. Habbemn, nlanual for the ALLOC-discriminant analysis program, Department of AIedicaI Statistics, University of Leiden, P-0. Bos 2060, Leidcn, Netherlands, 1976.

3 D. L. Duewer, J. R. Koskinen and B. R. Kowalski , _4RTHUR (available from B. R.

Kowalski, University of Washington, Seattle). 1 A. RI. Harper, D. L. Duewer and B. R. Kowalski, in B. R. Kowalski (Ed.), Chcmo-

metrics, Theory and Practice, Am. Chem. Sot. Symp. Ser.. No. 52, 1977. 5 B. R. Kowalski and C. F. Bend-r, Pattern Recognition, S (1976) 1. 6 N. J. Nilsson, Learning Machines, McGraw-Hill, New York, 1965. ‘7 D. R. Preuss and P. C. Jurs, Anal. Chem., -l6 (197-1) 520. S J. Smeyers-Verbeke, D. L. %_ssart and D. Coomans, J. _-1ssoc. Off. Xnnl. Chem.,

60 (19’77) 1383. 9 D. Coomans, bl. P. Derde, I. Broeckaert and D. L. hIassart, Anal. Chim. Actn, 133

(1961) 241. 10 D. Coomans, I. Broeckaert, XI. Jonckheer, P. Blockx and D. L. 1Inssart. Anal. Chim.

Xcta, 103 (1978) -109. 11 D. Coomans, L. Kaufman and D. L. Massart, Anal. Chim. .-\cta, 112 (1979) 97. 12 D. Coomans, I. Broeckaert and D. L. Massart, Anal. Chim. Acta, 132 (19Sl) 69.

13 G. F. Bos and G. C. Tiao, Bayesian Inference in Statistical Analysis, .4ddison-\\‘esiey, New York, 1953.

l-1 R. 0. Duda and P. E. Hart, Pattern Classification nnd Scene Analysis, IWey-Intelscience, New York, 1973.

15 E. ~1. Patrick, Decision _4nalgsis in lledicine: .\lethocis and Xpplications, CRC Press,

Boca Raton, Florida, 19i9. 16 J. D. F. Habbema, J. Hermans and A. T. van der Burgt, Biomeirika, 61 (197-I) 313. 17 J. Hermans, J. D. F. Habbema and A. T. van der Burgt, Bull. I.S.I., -I5 (19’73) 523. 1s J. D. F. Habbema and J. Hermans, Technometrics, 19 (19'77) -1Si.