Multiple classifier decision combination strategies for character recognition: A review

29
Digital Object Identifier (DOI) 10.1007/s10032-002-0090-8 IJDAR (2003) 5: 166–194 Multiple classifier decision combination strategies for character recognition: A review A. F. R. Rahman 1 , M. C. Fairhurst 2 1 BCL Technologies, 990 Linden Dr., Suite #203, Santa Clara, CA, 95050, USA; e-mail: [email protected] 2 Department of Electronics, University of Kent, Canterbury, Kent, CT2 7NT, UK; e-mail: [email protected] Received: 3 February 2002 / Accepted: 9 October 2002 Published online: 6 June 2003 – c Springer-Verlag 2003 Abstract. Two research strands, each identifying an area of markedly increasing importance in the current development of pattern analysis technology, underlie the review covered by this paper, and are drawn together to offer both a task-oriented and a fundamentally generic perspective on the discipline of pattern recognition. The first of these is the concept of decision fusion for high- performance pattern recognition, where (often very di- verse) classification technologies, each providing comple- mentary sources of information about class membership, can be integrated to provide more accurate, robust and reliable classification decisions. The second is the rapid expansion in technology for the automated analysis of (es- pecially) handwritten data for OCR applications includ- ing document and form processing, pen-based comput- ing, forensic analysis, biometrics and security, and many other areas, especially those which seek to provide online or offline processing of data which is available in a human- oriented medium. Classifier combination/multiple expert processing has a long history, but the sheer volume and diversity of possible strategies now available suggest that it is timely to consider a structured review of the field. Handwritten character processing provides an ideal con- text for such a review, both allowing engagement with a problem area which lends itself ideally to the perfor- mance enhancements offered by multi-classifier configu- rations, but also allowing a clearer focus to what oth- erwise, because of the unlimited application horizons, would be a task of unmanageable proportions. Hence, this paper explicitly reviews the field of multiple classifier decision combination strategies for character recognition, from some of its early roots to the present day. In order to give structure and a sense of direction to the review, a new taxonomy for categorising approaches is defined and explored, and this both imposes a discipline on the presentation of the material available and helps to clarify the mechanisms by which multi-classifier configurations deliver performance enhancements. The review incorpo- rates a discussion both of processing structures them- selves and a range of important related topics which are essential to maximise an understanding of the potential of such structures. Most importantly, the paper illustrates explicitly how the principles underlying the application of multi-classifier approaches to character recognition can easily generalise to a wide variety of different task do- mains. 1. Introduction In recent years, techniques for multiple classifier decision combination have been reported extensively in a multi- tude of task domains. These include various text and doc- ument analysis problems and cover isolated and cursive handwritten and printed character or word recognition (e.g., Ho et al. [1], Xu et al. [2], Suen et al. [3], Fairhurst and Rahman[4, 5], Yuan et al. [6], Yaeger et al. [7]), medi- cal imaging (e.g., Kittler et al. [8]), biometric verification problems such as face, signature or fingerprint recogni- tion (e.g., Bajaj and Chaudhury[9]), robot vision, speech recognition (e.g., Chen et al. [10]), information retrieval (e.g., Larkey and Croft[11]), and expert systems (e.g., Cherkauer[12]), online searching into image databases and many more. It has generally been found that multi- ple expert (classifier) decision combination strategies can produce more robust, reliable and efficient recognition performance than the application of single expert classi- fiers. It is also noted that a single classifier with a single feature set and a single generalised classification strategy often does not comprehensively capture the large degree of variability and complexity encountered in many prac- tical task domains. Multiple expert decision combination can help to alleviate many of these problems by acquir- ing multiple-source information through multiple features extracted from multiple processes, introducing different classification criteria and a sense of modularity in system design which leads to more flexible recognition systems. Although some of these decision combination ap- proaches are task-specific, most are generic and usually it is easily possible to apply the same technique to a variety of tasks. Nevertheless, it is very difficult to review all the

Transcript of Multiple classifier decision combination strategies for character recognition: A review

Digital Object Identifier (DOI) 10.1007/s10032-002-0090-8IJDAR (2003) 5: 166–194

Multiple classifier decision combination strategiesfor character recognition: A review

A. F. R. Rahman1, M. C. Fairhurst2

1 BCL Technologies, 990 Linden Dr., Suite #203, Santa Clara, CA, 95050, USA; e-mail: [email protected] Department of Electronics, University of Kent, Canterbury, Kent, CT2 7NT, UK; e-mail: [email protected]

Received: 3 February 2002 / Accepted: 9 October 2002Published online: 6 June 2003 – c© Springer-Verlag 2003

Abstract. Two research strands, each identifying anarea of markedly increasing importance in the currentdevelopment of pattern analysis technology, underlie thereview covered by this paper, and are drawn together tooffer both a task-oriented and a fundamentally genericperspective on the discipline of pattern recognition. Thefirst of these is the concept of decision fusion for high-performance pattern recognition, where (often very di-verse) classification technologies, each providing comple-mentary sources of information about class membership,can be integrated to provide more accurate, robust andreliable classification decisions. The second is the rapidexpansion in technology for the automated analysis of (es-pecially) handwritten data for OCR applications includ-ing document and form processing, pen-based comput-ing, forensic analysis, biometrics and security, and manyother areas, especially those which seek to provide onlineor offline processing of data which is available in a human-oriented medium. Classifier combination/multiple expertprocessing has a long history, but the sheer volume anddiversity of possible strategies now available suggest thatit is timely to consider a structured review of the field.Handwritten character processing provides an ideal con-text for such a review, both allowing engagement witha problem area which lends itself ideally to the perfor-mance enhancements offered by multi-classifier configu-rations, but also allowing a clearer focus to what oth-erwise, because of the unlimited application horizons,would be a task of unmanageable proportions. Hence,this paper explicitly reviews the field of multiple classifierdecision combination strategies for character recognition,from some of its early roots to the present day. In orderto give structure and a sense of direction to the review,a new taxonomy for categorising approaches is definedand explored, and this both imposes a discipline on thepresentation of the material available and helps to clarifythe mechanisms by which multi-classifier configurationsdeliver performance enhancements. The review incorpo-rates a discussion both of processing structures them-selves and a range of important related topics which areessential to maximise an understanding of the potential of

such structures. Most importantly, the paper illustratesexplicitly how the principles underlying the applicationof multi-classifier approaches to character recognition caneasily generalise to a wide variety of different task do-mains.

1. Introduction

In recent years, techniques for multiple classifier decisioncombination have been reported extensively in a multi-tude of task domains. These include various text and doc-ument analysis problems and cover isolated and cursivehandwritten and printed character or word recognition(e.g., Ho et al. [1], Xu et al. [2], Suen et al. [3], Fairhurstand Rahman[4, 5], Yuan et al. [6], Yaeger et al. [7]), medi-cal imaging (e.g., Kittler et al. [8]), biometric verificationproblems such as face, signature or fingerprint recogni-tion (e.g., Bajaj and Chaudhury[9]), robot vision, speechrecognition (e.g., Chen et al. [10]), information retrieval(e.g., Larkey and Croft[11]), and expert systems (e.g.,Cherkauer[12]), online searching into image databasesand many more. It has generally been found that multi-ple expert (classifier) decision combination strategies canproduce more robust, reliable and efficient recognitionperformance than the application of single expert classi-fiers. It is also noted that a single classifier with a singlefeature set and a single generalised classification strategyoften does not comprehensively capture the large degreeof variability and complexity encountered in many prac-tical task domains. Multiple expert decision combinationcan help to alleviate many of these problems by acquir-ing multiple-source information through multiple featuresextracted from multiple processes, introducing differentclassification criteria and a sense of modularity in systemdesign which leads to more flexible recognition systems.

Although some of these decision combination ap-proaches are task-specific, most are generic and usually itis easily possible to apply the same technique to a varietyof tasks. Nevertheless, it is very difficult to review all the

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 167

research undertaken in all task domains, given the va-riety and sheer volume of publications available. Hence,it is prudent to choose a specific task domain (such ascharacter recognition) which has generated enormous in-terest and used many decision combination techniques,in order to demonstrate the various approaches whichcan be taken in combining decisions provided by multi-ple classifiers.

This paper reviews the advances made in this respectexplicitly within the character recognition community,and presents a view of the state of the art of multipleexpert decision combination techniques. The approachesdocumented examine a problem that has a direct prac-tical application in many different processes, such asautomatic processing of documents, cheques, vouchers,conversion of paper documents to electronic formats, vi-sual inspection, pen-based computing, forensic examina-tions, biometric evaluation systems, security, access con-trol, archiving and so on. As already noted, however, thedecision combination approaches described here are notlimited to character recognition applications alone, andthe review also aims to illustrate the essentially genericnature of many of the techniques which are representedin the literature.

In reviewing this field, a new classification strategyis proposed to categorise the numerous multiple expertapproaches reported. This is based solely on the struc-tural design of the information exchange channels exist-ing among the experts cooperating in a multiple expertplatform, irrespective of the type of algorithms the indi-vidual experts employ, the type of decision summationalgorithm applied to form the final decision or the typeof software or hardware platforms on which the multipleexpert configurations are implemented. In addition, thevarious diverse approaches are analysed in the frameworkof a generalised decision fusion strategy, enabling a com-parison of these approaches from a completely genericpoint of view, and thus providing a solid foundation forthe understanding of the different approaches available.The approach adopted for comparing and analysing dif-ferent multiple expert approaches will lead to a betterunderstanding of the underlying themes and of the pro-cess of information fusion which implicitly defines a setof formal criteria to identify the nature and extent ofperformance enhancement achievable with a given set ofindividual expert components.

2. Multiple expert decision combination:Problem statement

The problem of combination of multiple experts shouldbe expressed formally before a detailed analysis of varioussolutions is considered. If n classifiers (experts), workingon the same problem, deliver a set of classification re-sponses, then the decision combination process has tocombine the decisions of all these different classifiers insuch a way that the final decision improves the decisionstaken by any of the individual experts. Hence, the deci-sion fusion process has to take into account the individ-

ual strengths and weaknesses of the different cooperatingclassifiers and must build on these to deliver a more ro-bust final decision. As noted by Xu et al. [13], there canbe three distinctly different types of problem for mul-tiple expert classifier combination based on the type ofindividual classification response delivered. These can besummarised as:

– The cooperating classifiers deliver the classificationresponses in the form of absolute output labels. Eachof the classifiers identifies the character in questiondefinitely as belonging to a particular class and noinformation other than this assigned label is available.The combination method must make its final decisionbased solely on this information.

– The cooperating classifiers deliver the classificationresponse in the form of a sorted ranking list. Each ofthe classifiers gives a preference list based on the like-lihood of a particular character belonging to a partic-ular class. The previous category is seen to be a spe-cial case of this solution, the output label being thetop choice of the ranking list. Here, however, muchmore information is available to determine the finalresponse of the combined classifier.

– The cooperating classifiers deliver the classificationresponse in the form of confidence values. Each of theclassifiers gives a preference list based on the likeli-hood of a particular character belonging a particularclass, together with a set of confidence measurementvalues generated in the original decision-making pro-cess. These responses represent a special case of rank-ing list or top choice responses, and are the most gen-eralised form, from which both the ranking list andthe top choice response can be generated. However,these responses are difficult to utilise, as the mea-surement values need to be converted to a normalisedscale before any incorporation of information involv-ing a comparison of the individual cooperating classi-fiers can take place.

It is clear that decision combination of multiple ex-perts needs to consider the information related to differ-ent classifier types and their individual responses beforea suitable combination scheme can be formulated. Thefollowing sections will review different methods whichtake into account this information management issue andwhich utilise this information to deliver an optimisedcombined decision.

3. Organisation of multiple expert approaches

The number and diversity of multiple expert decisioncombination approaches encountered in the literaturemake the task of defining a comprehensive structuringmethodology very difficult. The traditional way of dis-cussing the various methods is to analyse their inherentdesign approach. This leads to the classification of thevarious techniques in terms of the five different categories:

168 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

– Analytical methods: Development, formalisation andimplementation of formal methods to combine multi-ple experts.

– Pseudo-analytical methods: Development and imple-mentation of quasi-formal methods to combine mul-tiple experts.

– Empirical methods: Development and implementa-tion of specialised and task-oriented tailored methods.

– Neural network-based methods: Development of for-mal or informal methods for combining multiple ex-pert decision combination by employing neural net-work techniques.

– Support methods: Development of formal or informalmethods supporting decision combination from mul-tiple independent sources.

The view adopted here is that this information man-agement issue is vital in classifying various multiple ex-pert decision combination approaches. For example, theapproaches reported by Chi et al. [14] and Lam andSuen[15] can have elements of analytical and empiri-cal methods embedded inside the decision combinationframeworks. Before details are discussed, it is importantto introduce some basic terminology. Multiple classifierconfigurations are required to generate classification de-cisions based on the combination of individual classifiersor experts. Each expert performs classification as directedby the overall decision combination algorithm. A numberof processing elements are employed to implement thealgorithm, the implemented algorithm providing a deci-sion combination framework. There can be two differenttypes of processing elements in a decision combinationframework, active elements and passive elements. Theformer specifically perform recognition-oriented compu-tations, directly produce or alter an overall classificationdecision as a result of their classification strategy and areidentified with the experts previously introduced and de-fined. On the other hand, the passive elements performthe tasks of data re-routing and establishing structuralpathways among the different active elements in the con-figuration. The passive elements, therefore, alter classi-fication decisions based on the information produced byother active elements (experts). For example, passive el-ements might be used for resolving ambiguous decisionsor confusions based on a priori information about the re-liability of the active elements. The presence of passiveelements in the combined framework, therefore, indicatesa greater degree of organisation within the decision com-bination algorithm. In any decision combination frame-work, there are interconnections between the processingelements as dictated by the decision combination algo-rithm. These interconnections are physical pathways cre-ating bridges between the processing elements and car-rying information between them. They can be concep-tually visualised as information exchange channels. Fig-ure 1 presents a simple illustration of the terminologiesintroduced here, with two experts participating in a de-cision combination framework, which act as active ele-ments. There are also two passive elements in the frame-work. The interconnections among these four processing

Fig. 1. Combination of multiple experts: the terminology

elements can be envisaged as bi-directional informationexchange channels.

Although the extent of information extraction and itsincorporation in the combined decision-making process isreflected in the number of processing elements employedby a decision combination framework, a more powerfulindicator of the organisation of the decision combinationprocess is the way the information exchange channels areformed between various processing elements. Hence, theincorporation of extracted information is reflected notonly in the number of processing elements, but more soin the design of the physical structure of the combinedframework. These ideas are pursued in the next section,and will form the primary method for classifying variousmultiple expert decision combination techniques.

3.1. Information exchange pathways: The topology

The information exchange channels between differentprocessing elements represent the interlinks connectingthe different experts in the overall framework, dictat-ing the structural configuration of the combined config-uration. There are three different decision combinationtopologies to consider, as follows:

– Class I: Vertical combination scheme: in this case,the information exchange channels connect the var-ious processing elements in a way which facilitatesinteraction in a vertical fashion, implementing a one-to-two connection. This basically implements a typeof connection where each processing element can beconnected to a maximum of two other processing el-ements, receiving information from one and supply-ing information to another. This type of interconnec-tion translates to a physical structure where the pro-cessing elements are applied sequentially. A simplifiedschematic of the vertical combination scheme is pre-sented in Fig. 2.

– Class II: Horizontal combination scheme: in this case,the information exchange channels connect the vari-ous processing elements in a way that facilitates a

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 169

Fig. 2. Vertical combination of multiple experts

Fig. 3. Horizontal combination of multiple experts

type of connection where each processing element isindependent of the others. Each processing elementparticipating in a horizontal combination scheme isconnected to a single passive element implementinga one-to-one connection. The passive expert usuallymanipulates the responses of the various experts inthe horizontal formation to produce the final solution.This type of interconnection translates to a physi-cal structure where the processing elements are ap-plied concurrently and independently. A simplifiedschematic of the parallel combination scheme is pre-sented in Fig. 3.

– Class III: Hybrid combination scheme: in this case,a combination of Class-I and Class-II combinationsis constituted. A simplified schematic of the hybridcombination scheme is presented in Fig. 4.

3.2. Organisation of the paper

Taking the classifying strategies introduced in Sect. 3.1 asa starting point, the paper now reviews the various mul-tiple expert decision combination strategies in a system-atic way. Figure 5 describes this organisation. It is notedhow each of the primary classes, the vertical, horizontaland hybrid decision combination approaches, can be fur-ther subdivided into several sub-classes based on otherclassifying characteristics specific to these approaches.

Fig. 4. Hybrid combination of multiple experts

We make a conscious decision not to focus on the dis-parate data sources available to researchers in this field,and interested readers are referred to [16-24]. In addition,there are many important issues relating to multiple ex-perts approaches which are not directly concerned withthe framework identified above, but which neverthelessare essential for a comprehensive understanding of thewhole field. Examples include research from related dis-ciplines (e.g., psychology), the comparative evaluation ofindividual approaches, and so on. Figure 6 identifies andstructures these related research, which are reviewed inSect. 4.3.

4. Review

4.1. Processing environments

4.1.1. Decision combination engines. The basic aim ofmultiple expert systems is to combine decisions deliveredby individual classifiers. There are some fundamental de-cision combination frameworks (engines) that have beenused extensively in all decision combination configura-tions, and it is important to have a brief review of thesebefore a survey of the overall field is made.

Bayesian systems

The Bayesian System is one the most useful systemsfor combination of multiple experts. Assuming P (θi =ωj |x, y) the probability with which an object θi is as-signed to class ωj given the measurements x and y that

P (θi = ωj |x, y) =P (x, y|θi = ωj)P (ωj)

P (x, y)(1)

characterise the object can be expressed by Bayes Theo-rem:

Using Bayesian systems, it is possible to model vari-ous types of ambiguity where the relative importance ofthe contributing factors can be expressed by the proba-bility P (θi = ωj |x, y), errors in the measurement system

170 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

Fig. 5. Organisation of various multiple expert approaches

Fig. 6. Organisation of associated multiple expert research

can be expressed by P (x, y), interclass variability can bemodelled by P (θi = ωj |x, y) and understanding of thereal world can be expressed by P (ωj).

A detailed description of Bayesian systems can befound in Fairhurst[25] and Xu et al. [13].

Rule-based systems

These are systems based on supervised or unsupervisedrules generated by training modules to encompass thevariability of the dataset and by reflecting this variabil-ity in designing heuristic classification frameworks. These

systems work well in situations where there is no error inthe measurements and no errors in knowledge elicitation.Unfortunately these rule-based systems are often too sim-plistic to be of real use in solving problems involving“live” data. Nevertheless, they have been very successfulwhen the data variability is small and the task domainremains comparatively constant. Examples include pro-cessing background processing on bank cheques, wherethe account number (for example) is printed using a verywell-defined and constrained character set.

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 171

Neural systems

It is possible to use neural networks as a generalised en-gine for classifier combination. Neural network combina-tion is, however, a very wide theoretical field in its ownright. In general, there are two types of decision com-bination using neural networks, ensemble and modularapproaches (Hansen and Salamon [26]). In the ensem-ble approach, each of the neural nets provides a solutionto the problem, and the task is to combine them usingan appropriate framework. In the modular approach, theproblem is decomposed into several sub-problems, andeach is then solved in turn by a neural network. The fi-nal result is derived by combining the outputs of thesesubnets. The various types of approaches in the field ofneural combination are well represented in [27, 28]. An-other excellent source is Sharkey [29] which also deals indetail with various types of decision combination usingneural networks and presents a theoretical backgroundto the entire field.

Fuzzy neural systems

Fuzzy logic deals with situations when class membershipis not certain, and there is a probability that a particularpattern might belong to more than one class. This uncer-tainty is expressed in terms of a membership function.Fuzzy neural networks use fuzzy combination instead oflinear functions of the input variables and the result isfed through a sigmoid for the final decision. This is arelatively new approach and encouraging levels of perfor-mance have recently been reported (Sung-Bae and Kim[30,31,32]).

Expert systems and artificial intelligence

Interestingly, recent analysis has shown that the basicbuilding blocks involved in constructing systems such asthose described above - the cooperating classifiers - sharemany basic characteristics with the experts defined inthe Artificial Intelligence (AI) literature, which are gen-erally rule-based systems that operate by exploiting aknowledge base built on past experience, and which arestrictly optimised within a specific task domain. Thus,although there are obvious differences in terms of con-ceptual and implementational details between such AI-oriented systems and the engineering approaches used bythe character recognition community (cooperating clas-sifiers are independently trained classification moduleswhile the AI experts are primarily implemented as infer-ence engines exploiting a given task-specific knowledge-base), there are also some striking similarities.

Encouragingly, in work reported to date there havebeen significant advances in understanding the nature ofthe combination of these experts. Wolpert[33] has intro-duced the concept of stacked generalisation, where expertcombination yields results with improved accuracy com-pared with individual elements. This work lays some im-

portant foundations for further work in analysing the na-ture of decision combination based on heuristics or empir-ical reasoning. Kearns and Mansour [34] have addressedthis point by demonstrating that some popular and em-pirically successful heuristics surprisingly “meet the cri-teria of an independently motivated theoretical model”.However, their investigation was restricted to a class oftop-down algorithms for decision tree learning, and theimplications of this study for other decision combina-tion approaches are still largely unexplored. Similarly,Bagging (Bootstrap Aggregating) techniques for generat-ing multiple versions of a predictor and exploiting theseto obtain an aggregated predictor (Breiman [35]) havebeen proposed, but it has been shown that Bagging doesnot improve accuracy for all types of experts, becauseone precondition for improvement is that perturbing thelearning set must cause significant changes in the predic-tor constructed.

Knowledge-based systems

If K classifiers, working on the same problem, deliver aset of classification responses, then the decision combina-tion process can exploit three distinctly different sourcesof information. These can be summarised as:

– Sample confidence index : The classifiers deliver theclassification response in the form of confidence val-ues. This can be expressed by αijk, which denotes thevalue assigned by the ith expert to the kth samplecoming from the jth class in the test set.

– Preference lists: The classifiers deliver the classi-fication response in the form of a sorted rankinglist. Assuming individual classifier responses to beζ1, ζ2, ......, ζK , they can be sorted in descending or-der of magnitude and normalised as η1, η2, ......, ηK .The corresponding class labels can be expressed asχ1, χ2, ......, χK .

– Absolute labels: The classifiers deliver the classifica-tion responses in the form of absolute output labels.If the preference list is expressed as χ1, χ2, ......, χK ,then the top choice from that list, χtop, is assigned asthe absolute label.

These three sources of information are directly de-rived from the test samples during the recognition phaseand therefore are defined as “First-Order Information”.

It is also possible to extract additional informationabout data diversity, the relative performance indices ofthe individual experts on specific databases and specificdomain information about the characteristics of a par-ticular target dataset. These are collectively defined as“Second-Order Information”. All this information is ameasure of the a priori knowledge about the specifieddatabase, the performance of the chosen classificationmethods on that database and the relative strengths orweaknesses of these methods in dealing with a particularproblem domain. In order to extract and evaluate thesesources of information, it is necessary that an indepen-dent evaluating dataset is utilised. Thus, in this case, the

172 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

available data needs to be partitioned into three mutu-ally exclusive subsets, one for training, the second forevaluating the information indices, and the final subsetfor testing. The key information sources are then:

– Overall confidence index : The Overall Confidence In-dex is directly related to the average of all the in-dividual recognition rates on each individual class inthe complete database. If K is the number of partic-ipating experts, then the overall confidence index isexpressed by γk, 1 ≤ k ≤ K, representing the rank-ing of the experts (γk = 1, 2, ...,K) based on overallrecognition rates.

– Class confidence values: Class confidence values pro-vide information about the confidence of classificationon a class by class basis. The Class Confidence Index,βij , (1 ≤ i ≤ K, 1 ≤ j ≤ N), where N is the numberof classes under consideration) denotes the ranking ofthe different experts (βij = 1, 2, ...,K) on a class byclass basis.

– Data diversity : The degree of similarity and dissim-ilarity among the classes define the overall data di-versity of the target character set. Extracting quanti-tative information about the data diversity can be avery powerful tool in designing multiple expert deci-sion combination approaches.

– Specific consideration: It is possible to exploit lo-calised characteristics of a target set in designing spe-cial algorithms when attempting to combine decisionsdelivered by multiple experts. Incorporation of thisinformation can be reflected in the overall decision-making process.

These indices can be used to form an overallknowledge-base. Various decision combination method-ologies can be designed to exploit this extensiveknowledge-base.

Consensus systems

Consensus-based schemes are very important in decisioncombination, the simplest of which is the majority vot-ing system. If there are n independent experts having thesame probability of being correct, and each of these ex-perts produces a unique decision regarding the identityof the unknown sample, then the sample is assigned tothe class for which there is a consensus, i.e., when at leastk of the experts agree, where k can be defined as:

k =

n

2+ 1, if n is even,

n+ 12

, if n is odd.(2)

Assuming each expert makes a decision on an individ-ual basis, without being influenced by any other expert inthe decision-making process, the probabilities of variousdifferent final decisions, when x+ y experts are trying toa reach a decision, are given by the different terms of theexpansion of

(Pc + Pe)x+y (3)

where,Pc is the probability of each expert making a correct de-cision, andPe is the probability of each expert making a wrong de-cision,and obviously Pc + Pe = 1. Bernoulli [36] is creditedwith first realising this group decision distribution. Theprobability that x experts would arrive at the correctdecision is

(x+ y!)x!y!

(Pc)x (Pe)

y (4)

and the probability that they arrive at the wrong decisionis

(x+ y!)x!y!

(Pc)y (Pe)

x (5)

Thus, in general, the precondition of correctness(Condorcet [37]) of the combined decision can be eval-uated as a ratio of (2) and (5) provided (x > y) and canbe conveniently expressed as

κ =(Pc)

x (Pe)y

(Pc)x (Pe)

y + (Pc)y (Pe)

x (6)

Reordering Eq. 4 and assuming the fraction of theexperts arriving at the correct decision to be fixed, (e.g.,x and y to be constant), it is possible to show that,

∂κ

∂Pc= κ2 (x− y)

(Pe)x−y−1

(Pc)x−y−1 (Pc + Pe) (7)

Since (x−y−1 > 0), ∂κ/∂Pc is always positive. Thuswhen x and y are given, as Pc increases κ increases con-tinuously from zero to unity.

This demonstrates that the success of the MajorityVoting Scheme (like most decision combination schemes)directly depends on the reliability of the decision confi-dences delivered by the participating experts. It is alsoclear that as the confidences of the delivered decisionsincrease, the quality of the combined decision increases.There are a huge number of variations based on the ma-jority voting scheme, and all try to reach a consensusbased on the decision emphasis of a particular problemdomain. Some of the variations are Ranking (e.g., Hoet al. [1]), Committee (e.g., Mazurov et al. [38]), andRegression-based models (e.g., Ho et al. [39]).

Dempster-Shafer approach

For a given number of exhaustive and mutually exclu-sive propositions αi, where i = 1, 2, ..., N , and N is thetotal number of classes, the Dempster-Shafer Theory ofEvidence represents the belief in a proposition by a nu-meric value between 0 and 1 inclusive. This value, de-noted bel(α), expresses the degree to which evidence ηsupports the proposition α. bel(α) can be estimated bya function known as the basic probability assignment(BPA) which is a generalisation of a probability mass

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 173

distribution. Given that a subset α represents the dis-junction of all the elements in α, the truth of β ⊂ α im-plies the truth of α. bel(α) can, therefore, be expressedas

bel (α) =∑β⊆α

BPA (β) (8)

If there are multiple sources of evidence, they can becombined by

BPAcombined = BPA1 ⊕BPA2 ⊕ ...⊕BPAN (9)

Additional details on Dempster-Shafer rule can befound in Mandler and Schurmann [40], Jian-Bo and Singh[41], and Lingras and Wong [42].

4.2. Multiple expert approaches

4.2.1. Vertical decision combination. The principal fea-ture of this configuration is that the individual expertsare applied sequentially. Hence, at each stage (layer)there is only one classifier (expert) operating and pro-cessing the patterns. Depending on the principle whichis employed in formulating the decision combination al-gorithm, there can be two basic approaches to this gen-eralised serial decision combination configuration.

Reevaluation approach

In this case, the underlying principle is to incorporate ascheme that ensures re-evaluation of those patterns whichare either rejected or recognised with very low confidenceat any layer by experts appearing at subsequent layers.The input character (pattern) is presented to the first ex-pert in the configuration. Instead of generating a list ofpossible recognition indices indicating the class to whichthe current character might belong, the first expert gen-erates a confidence value corresponding to the strengthof its decision. Based on this confidence three differentscenarios might arise:

– The confidence is higher than a threshold and the de-cision is accepted, rendering subsequent re-evaluationunnecessary.

– The confidence is such that it falls within the rangedefined by the first and a second threshold, indicatinga degree of uncertainty in the decision.

– The confidence is lower than the second thresh-old, and the decision is rejected. In this case, a re-evaluation is necessary.

These three categories indicate that, in the first case,no re-evaluation is carried out and in the third case, a re-evaluation is necessary. What is done in the intermediatecase depends on local configuration restrictions. In caseswhere a re-evaluation is necessary, the subsequent expertdoes not limit further investigation to the domain of theclasses handed down by the primary expert, but rathersearches for a solution in the complete class domain. It

Fig. 7. Vertical combination of multiple experts: Reevalua-tion approach

then generates another confidence value indicating thepossible class identity, and this process continues until adecision with a sufficiently high confidence is found byan expert or the final expert picks up the ultimate classindex.

Figure 7 presents a schematic of the series combina-tion of multiple experts based on the Reevaluation Ap-proach. The first expert in the hierarchy receives a listof possible classes which might be expressed as S0 = 0,1, 2, ....,m, where m is the total number of classes presentin the problem domain. From the responses obtained, itgenerates a confidence quotient for each test pattern. αijk

denotes the sample confidence value assigned by the ithexpert to the kth sample coming from the jth class in thetest set (Rahman and Fairhurst [43]). At any layer, there-fore, a pattern is classified with an associated sample con-fidence value and compared with a threshold value. De-pending on this comparison, the pattern is either recog-nised with high or low confidence, or rejected. If it is re-jected, or classified with a low confidence, then the nextexpert in the hierarchy compares the responses from allthe m prototypes which describes the original solutionspace. It then decides whether to accept or reject it, thisprocess continuing until a classification with a higher con-fidence is found.

174 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

In general, an expert appearing at the nth layer gen-erates κn responses, and κn−1 = κn = κn+1, where κn

represents the number of class possibilities considered atthis layer, κn−1 is the number of classes that were con-sidered at the previous layer and κn+1 is the number ofclasses that were considered at the next layer. Hence, ingeneral, the actions taken by the nth layer are as follows:

– Generate κn responses, where κn−1 = κn = κn+1.– Sort the κn responses.– Pick the top response.

– The decision associated with the sample confi-dence value α is accepted as the final decision, pro-vided the sample confidence value is greater thanor equal to a threshold value, so that, αuwt ≥ ψc,where u is the expert appearing at the nth layer,w is the class under consideration, t is the samplein question and ψc is the first threshold.

– The decision associated with the sample confi-dence value α is recognised with low confidenceprovided the sample confidence value is less thanthe first threshold value but greater than or equalto a second threshold value, so that, ψc > αuwt ≥θc, where θc is the second threshold.

– The decision associated with the sample confi-dence value α is rejected provided the sample con-fidence value is less than the second thresholdvalue, so that, αuwt < θc.

Many researchers have explored this approach forcombining multiple experts, especially Kovacs et al. [44],Lam and Suen [15], Wu et al. [45], and Dimauro et al.[46]. Other researchers, such as Vlontzos and Kung [47],Shridhar and Badreldin [48], Ahmed and Suen [49], andShridhar and Badreldin [50], have offered different solu-tions in this category by combining distinctive featuresources. Brown et al. [51], on the other hand, exploredan idea for integrating recognition logic with feature ex-traction algorithms. The basic idea of performing a roughclassification in the initial stages and then incrementallyincreasing the quality of classification has been exploredby many researchers, especially Rodriguez et al. [52],Zhou et al. [53], Jing et al. [54], and Suzuki et al. [55,56]. Some researchers have employed neural networks ora combination of neural and non-neural techniques in avertical decision combination framework, such as Wangand Wang [57], Park et al. [58], Lee et al. [59], Wang etal. [60], Kimura et al. [61], Hsin-Chia and Kuo-Ping [62],Lee and Gomes [63], De Carvalho et al. [64, 65, 66, 67],Cao et al. [68], Huang et al. [69], and Shih and Chung[70]. The use of fuzzy reasoning has also been exploredin relation to vertical decision combination frameworks.The research of Xuefang et al. [71], Masulli et al. [72],Gwo-En and Jhing-Fa [73], and Tanprasert et al. [74] andChi et al. [14] in this field is important. Researchers havealso used relaxation matching techniques in combiningmultiple experts in the vertical framework. Kimura et al.[75], Kimura and Shridhar [76], Gader and Forester [77],Byung et al. [78], and Shindo et al. [79] have all con-tributed to this approach.

Class set reduction

The underlying principle in the class set reduction ap-proach is to make certain that the number of possibletarget classes is reduced continuously as a vertical config-uration is traversed. In this case, the pattern (character)to be recognised is presented to the first expert in theconfiguration, which generates a list of possible recogni-tion indices indicating the class to which it is predictedthat the current character might belong, this list beinga subset of the total number of classes under considera-tion. The next expert then limits further investigation tothe domain of the classes handed down by the primaryexpert, generating another ordered list and so on. Thisprocess continues until the final expert picks up the ulti-mate class index. Here multiple algorithms are applied inincreasing order of complexity and performance (Rahmanand Fairhurst [80]). As the experts appearing later in thehierarchy generally have more weight attached to theirdecisions, it is important to make sure that more pow-erful experts appear late in the configuration. Since theexperts appearing later in the hierarchy need to search asmaller solution space, this sequential ordering leads toconfigurations offering very high throughput.

Figure 8 presents a schematic of the class set reduc-tion approach. The first expert in the hierarchy receivesa list of possible classes which might be expressed asS0 = 0, 1, 2, ....,m, where m is the total number of classespresent in the problem domain. It extracts relevant fea-tures and compares the character to the m prototypesbuilt during a training phase. From the responses ob-tained, it generates a candidate list where the possiblecandidate identifiers for the pattern in question are listedin decreasing order of probability. The top p choices arehanded down to the second expert, where p < m. If #Sn

denotes the number of members in set S relating to thenth layer, then #S0 = m and #S1 = p. The secondexpert receives the input pattern, along with the candi-date list handed down by the first expert. It then goes onto generate its own features and compares the responsesfrom the p prototypes of the candidate list, rather thanconsidering all the m prototypes, generating its own can-didate list and handing this down to the next expert, andso on. The last expert makes the final decision by select-ing the top choice from its own candidate list as the classto which the input pattern should be assigned.

In general, an expert appearing at the nth layer gen-erates κn responses, and κn < κn−1, where κn representsthe number of class possibilities considered at this layerand κn−1 is the number of classes that were consideredat the previous layer. It is also noted that the set of classindices ζn considered at the nth layer is a subset of ζn−1considered at the previous layer. ζn ⊂ ζn−1 automaticallyimplies that the class indices dropped by the the (n−1)thlayer are not considered at the nth layer. The structurealso suggests that once a class index is dropped from thelist of possible candidates, it is no longer considered at allat any subsequent layers. Hence, in general, the actionstaken by the nth layer are as follows:– Generate κn responses, where κn < κn−1.

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 175

Fig. 8. Vertical combination of multiple experts: Class setreduction approach

– Sort the κn responses.– Pass on the top κn+1 responses to the (n+1)th layer,

where κn+1 < κn.

Researchers applying these ideas include Kao andParng [81], Toda et al. [82, 83], Gader et al. [84], Fairhurstand Mattaso Maia [85], Fairhurst and Abdel Wahab [86],and many more. Specifically, vertical combination usingneural networks and a combination of neural and non-neural classifiers is explored by Ng et al. [87] and Jeongand Seong [88].

4.2.2. Horizontal decision combination. There are nu-merous ways to combine the decisions supplied in par-allel by multiple experts. The various rules proposed forthis depend on the amount of information that is avail-able from the individual experts. If n classifiers (experts),working on the same problem, deliver a set of classifica-tion responses, then the decision combination process hasto combine all the individual decisions in such a way thatthe final decision improves the decisions taken by any ofthe individual experts. Hence, the decision fusion processhas to take into account both the individual weaknessesof the different cooperating classifiers, and the individualstrengths of these classifiers, building particularly on the

latter to deliver a more robust final decision. The generalapproaches adopted are now reviewed.

Bayesian combination method

Xu et al. [13] discussed the averaged Bayes rule for com-bining decisions provided all the classifiers that are co-operating under the combination rule are Bayesian Clas-sifiers. They also demonstrate that it is possible to com-bine decisions delivered by different kinds of classifiersby the averaged Bayes rule, provided some form of post-probabilities are computable from all the classifier out-puts. Haralick [89] gives the error bounds when the in-tersection approach is applied to combine results frommultiple Bayesian classifiers. Kang and Kim [90, 91] dis-cuss the issue of the dependency relationship among clas-sifiers in combining multiple decisions, and the impor-tance of analysing the nature of their inter-dependence.Kang and Kim [92] have proposed a probabilistic frame-work for combining multiple classifiers at an abstractlevel. Kang et al. [93, 94], in related work, proposehow to approximate optimally the (K + 1)th-order dis-tribution with a product set of kth-order dependencieswhere 1 ≤ k ≤ K, which are identified by a systematicdependency-directed approach. In general, a probabilis-tic combination of K classifiers’ decisions obtained fromsamples needs a (K+1)th-order probability distribution.Both Chow and Liu and Lewis [94] proposed an approxi-mation scheme for a high-order distribution with a prod-uct of only first-order tree dependencies. They also pro-pose a new method to combine probabilistically multipledecisions with the product set of the kth-order dependen-cies, using a Bayesian formalism.

Majority voting and its variations

Simple majority voting

Majority Voting Schemes are very important for combin-ing decisions by multiple experts. Many researchers haveused this technique to recognise handwritten and printedcharacters and recently it has been demonstrated that,although majority voting is by far the simplest combina-tion strategy, properly applied it can also be very effective(Suen et al. [3]). Lam and Suen [95] explored this further,seeking a deeper understanding of how and why it works.Other important work in this area is reported in Ng andSingh [96], Stajniak et al. [97], Belaid and Anigbogu [98],Parker [99], He [100], and Ji and Ma [101].

Weighted majority voting

An obvious enhancement to the simple majority systemscan be made if the decision of each classifier is multipliedby a weight to reflect the individual confidence of each de-cision. Lam and Suen [102, 103] report the performance

176 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

of combination methods including a Bayesian formula-tion and a weighted majority vote with weights obtainedthrough a genetic algorithm. Alpaydin [104] employs aweighted majority voting scheme by adopting a Bayesianframework where ‘weights’ in voting may be interpretedas plausibilities of the participating classifiers.

Restricted majority voting

Sometimes it is important to shift the emphasis of deci-sion combination in selecting the best appropriate classi-fier from an array of classifiers. In effect, this restricts themajority voting to accepting the decision of the classifierwhich is most appropriate for that particular problem,which may easily be found by ranking the classifiers interms of their past performance. Gang et al. [105] de-scribe such a modularised neuroclassifier for enhancingthe recognition accuracy of mixed printed and handwrit-ten numerals.

Enhanced majority voting

Simple and weighted majority voting are very robusttechniques provided an adequate number of classifiers areavailable to reach the correct consensus. There are vari-ous ways in which qualitative enhancement can be addedto this framework. Rovatti et al. [106] discuss a form ofcooperation between k-nearest neighbours (NN) classifi-cation and their neural like property of adaptation. Atunable, high level k-nearest neighbours decision rule isdefined that encapsulates most previous generalisationsof the common majority rule. A learning procedure isdeveloped that applies this rule and exploits those statis-tical features which can be deduced from the training set.The overall approach is tested on handwritten characterrecognition, and experiments show that adaptivity in thedecision rule may improve the recognition and rejectioncapability of standard k −NN classifiers.

The success of majority voting depends on the consen-sus reached among the experts. It can therefore be arguedthat the quality of the combined decision should increaseif the quality of the consensus is enhanced. Fairhurst andRahman [107] report a new classifier structure based onthis concept. The approach implements a decision con-sensus approach, but the quality of the consensus is eval-uated in terms of the past track record of the expertsbefore it is accepted.

Ranked majority voting

It is possible to include additional information derivedfrom participating classifiers in reaching the final con-sensus. Instead of using only final class labels, it is alsopossible to produce a ranked list of suggested decisionscovering multiple classes, which can then be manipulatedto reach a final consensus. A very interesting approachto majority voting is reported by Ho et al. [1] where

decisions by the classifiers are represented as rankingsof classes so that they are comparable across differenttypes of classifiers and different instances of a problem.Ho et al. [108] emphasise the re-ranking of the ranked out-puts delivered by the cooperating classifiers. A detailedmethod is presented that utilises the results of applyingthe classifiers to training data in order to determine a setof thresholds on the rankings. This technique also identi-fies the classifiers that are redundant and removes themfrom the recognition system. In [109], Ho et al. describe aconcise and focused version of the ideas presented in [1],again emphasising the substantial improvements achiev-able from these multiple expert systems.

Duong [110] has discussed the problem of combina-tion of forecasts employing a ranking and subset selectionapproach. Difficulties with the Bayesian approach (e.g.,choice of the prior), together with the availability of alarge number of forecast methods in any given situation,point to the need for new directions in this importantproblem. The ranking and subset selection approach issuggested as a statistical procedure for ranking alterna-tive forecasts. The main objective of this procedure isto screen out ‘bad’ forecasts based on past performances,and at the same time, allow some control by the decision-maker. An adaptive method based on the forecast errorranges has been developed to combine the chosen fore-casts, and is shown to compare favourably with thosebased on other optimality criteria when applied to realseries. Although not directly related to character recog-nition, the issues discussed here are entirely applicable tothis task domain.

Alternate confidence combination methods

The majority rule and its variations are special cases ofa generalised Sum Rule (Kittler et al. [111]). In this rule,the probabilities of correctness (often expressed as the de-cision confidence) of decisions, delivered by various clas-sifiers, are added to build a combined confidence. The ad-vantage of this rule is that, given an adequate number ofclassifiers and provided they are not significantly poorerin performance than other participating classifiers, thefinal decision reflects an average of the opinions of var-ious classifiers, and specifically it is not biased heavilyby an over-performing or an under-performing expert.However, there is another generalised rule, the ProductRule (Kittler et al. [111]), which takes a different viewof decision combination. In this case, the probabilities ofcorrectness of decisions delivered by various classifiers aremultiplied to build a combined confidence. This rule hasthe disadvantage that if one of the classifiers produces adecision with very poor confidence, the overall decisioncombination suffers a dramatic decrease in confidence.On the other hand, this rule is very useful in cases wherethe number of classifiers participating in decision combi-nation is small, typically less than five, when using theSum Rule often produces a stalemate, resulting in inde-cision within the combination framework. In these cases,the Product Rule delivers a better solution.

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 177

Elms and Illingworth [112] report an efficient recog-nition scheme for large-set handwritten characters inthe framework of multiple stochastic models (in thiscase, first-order hidden Markov models) which can modelstochastically the input pattern with numerous varia-tions. In this scheme, after extracting four kinds of re-gional projection contours for an input pattern four kindsof HMMs are constructed during the training phase basedon the direction components of these contours. In therecognition phase, the four kinds of HMMs constructedin the training phase are combined using the ProductRule to output the final recognition result for an inputpattern. Elms and Illingworth [113, 114] propose else-where a model for the recognition of printed characters,and its extension to the segmentation and recognitionof noisy printed words is outlined. The method is basedon the representation of the shape of a character by twohidden Markov models. Recognition is achieved by scor-ing these models against the test pattern and combiningthe results. The method is evaluated using Baird’s noisemodel [114]. The method generalises to recognise char-acters with noise levels greater than those included inthe training set, and suggests that much of the effect ofnoise on the recognition performance on images of naturallanguage text could be overcome using a word recogniseremploying shallow contextual knowledge. Again, decisioncombination is achieved using a Product Rule.

Neural networks

Neural network solutions to the problem of classificationof handwritten and printed characters also demand seri-ous attention. Although the possibility of applying neuralnetwork techniques to multiple expert configurations wasinitially not very promising (see, for example, Nadal et al.[115]), researchers have more recently applied these tech-niques with much enthusiasm and comparative success.Before reviewing this work, it is important to clarify sev-eral issues relating to the ways in which these individualneural networks are combined.

Sharkey [116] identified two main approaches. Thefirst is classified as ensemble-based, in which a set of neu-ral networks is trained on what is essentially the sametask and then the outputs of the neural networks arecombined using some form of decision combination frame-work. The second is a modular-based approach in whicha problem is decomposed into a number of sub-tasks andan expert is associated with each sub-task. Modular com-bination methods can in turn be classified as cooperative,competitive, sequential, and supervisory.

In this section, various types of multiple expert frame-work employing neural networks in a horizontal config-uration are discussed. Other frameworks reported in theliterature will be reviewed in appropriate sections. It isuseful to classify various multiple expert approaches us-ing horizontal decision combination frameworks into thefollowing two classes.

Ensemble approaches

Battiti and Colla [117] report some possible ways to com-bine the outputs of a set of neural network classifiers toreach a combined decision, ranging from the requirementof a complete agreement among the individual classifi-cations to election schemes based on the distribution ofvotes collected by the different classes. The same ideashave been explored by Autret and Thepaut [118] andSong et al. [119, Lin et al. [120], Cho and Kim [30, 31].

It is also possible to use some of the generalisedmultiple-expert decision combination frameworks exclu-sively in relation to neural networks, such as Khotanzadand Chung [121] (Behaviour Knowledge Space Method[122]), Ifarraguerri [123] and Rogova [124] [ Dempster-Shafer (D-S) Theory of Evidence], Cho [125] (genetic al-gorithms), and many more.

Modular approaches

There have been many attempts to formalise the methodof multiple neural networks. Lee et al. [126], Bebis et al.[127], and Kovacs et al. [128, 129] have explored issuesassociated with this topic. Others have explored ways ofcombining multiple neural networks at the design level,such as Yildirim and Marsland [130]. Other approachesinclude application of genetic algorithms in combiningmultiple neural networks, such as Cho [131], Yuanhui etal. [132], Lee [133]. Researchers have also explored waysof incorporating self-configuration into multiple neuralnetwork configurations. (Cho [134, 135], Scofield et al.[136], d’Acierno et al. [137, 138] etc.). Other interestingmethods of decision combination using multiple neuralnetworks include Xu et al. [2], Vico et al. [139], Bellegardaet al. [140], Wu et al. [141], Loncelle et al. [142], Wangand Jean [143], and Idan and Auger [144].

Knowledge-based methods

Knowledge about a specific task domain is a very im-portant tool in designing successful recognition systems,especially for character recognition. There are, however,two different ways in which this information can be used.The first is to define formal knowledge acquisition frame-works, which collect and exploit the task-specific knowl-edge in a formalised fashion. In general, these approacheshave deep roots in theoretical studies, e.g., informa-tion modelling, science of information, entropy, statistics,probability theories, etc. and are usually generic so thatthey can be applied to different task domains with rela-tive ease. These methods can collectively be called gen-eralised knowledge-based methods. On the other hand,there are methods that are primarily dependent on in-tuitive reasoning, which try to maximise local informa-tion usage by adapting the configuration to suit a par-ticular task domain. Although these methods are veryversatile in design and have the additional flexibility togenerate very efficient solutions to a specific problem,

178 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

they sometimes lack a formal functionality description.These systems may also lack generality, which mightlead to a localised rather than a global solution, but thefreedom from rigorous theoretical treatment enjoyed bythese methods means that additional a priori knowledgecan be extensively used, driving up the ultimate perfor-mance of the complete system. These methods can becollectively called empirical knowledge-based methods.In general they are specialised for a localised problem,but can be adapted to different situations by manipu-lating the associated knowledge base, making sure thatspecific fine-tuning applied is flexible enough to accom-modate changed environments.

Both generalised and empirical knowledge-basedmethods can be categorised by analysing the approachthey take in collecting and exploiting the knowledge base.It is possible to extract relevant quotients characterisingvarious classifiers (experts) based on the relative perfor-mance of the experts on the chosen database and ex-ploiting these quotients in designing decision combinationmethodologies. An alternative way of extracting informa-tion can seek to categorise the target task domain by ex-tracting quotients characterising it, and reflecting thatknowledge in designing decision combination methodolo-gies. The former can be called an expert evaluation ap-proach and the latter a data evaluation approach.

Expert evaluation

One way of using the knowledge about the participatingexperts in a decision combination framework is to assesshow they perform on a dataset different from the trainingand the test datasets. This involves recording the successand failure rates of these experts on the chosen task do-main and, provided enough data is available, a statisti-cally accurate estimate of the behaviour of the expertsin that specific task domain can be made. Huang andSuen [145, 146, 147] use this concept in developing the‘Behaviour-Knowledge Space Method’. There are moredirect ways to exploit additional information about ex-pert reliability in a specific task domain. Various indicescan be extracted in characterising the relative compe-tence of various experts. These ideas have been exploredby Rahman and Fairhurst [148], Cordella et al. [149],Gorski [150] and Hull et al. [151]. Attempts to formalisethis multi-index decision combination strategy have beenmade by Rahman and Fairhurst [152, 153], Huang andSuen [154] and Lin et al. [155]. Others have used a moretheoretical approach to combine multiple experts usingaccumulated information from the knowledge base. Theresearch of Guo et al. [156], Vuurpijl and Schomaker [157],and Kawatani and Shimizu [158] needs to be mentionedhere.

Data evaluation

Data evaluation is another very powerful tool in designingsuccessful decision combination strategies. In this con-text, the research of Ho [159], Jones et al. [160], Srihari

et al. [161], Huang and Chuang [162], Prevost and Mil-gram [163] can be mentioned. Others, such as Puuronenand Terziyan [164], have concentrated on either refiningor re-evaluating the information contained in a knowledgebase.

Fuzzy approach

The fuzzy approach has been used to achieve decisioncombination. In most cases fuzzy logic is combined withneural networks to build better decision combinationschemes. The research of Cho [165], Yi and Yamaoka[166] and Mitra and Kim [167] should be mentioned inthis respect.

4.2.3. Hybrid decision combination. As discussed inSects. 4.2.2 and 4.2.1, it is possible to design multiple ex-pert decision combination strategies where informationflow is restricted to either horizontal or vertical direc-tions based on a uniform arrangement of the cooperatingexperts. Both have their advantages and disadvantages,suggesting that a hybridisation may also be fruitful. Inhybrid configurations, the experts are arranged so thatin some parts of the configuration they are vertically or-ganised and in others horizontally organised. This sectionreviews such multiple expert decision combination strate-gies.

Filtering

In this approach, multiple experts are initially applied togenerate rough classification and the quality of this clas-sification is then assessed. Based on this assessment, mul-tiple experts appearing later in the hierarchy are appliedto produce a finer classification. A generalised schematicof such a configuration is presented in Fig. 9. The ba-sic classifier performs an initial classification of the inputcharacters. Based on locally defined criteria, initially clas-sified characters are regrouped to form smaller groupsof clusters, incorporating characters that have a com-mon element in terms of the chosen criteria. A numberof such groups may be formed. Some of the charactersare classified directly using a generalised classifier, whilethe different groups formed in the previous stage undergogroup-wise classification. The final decision is formed bycombining the decisions of the generalised classifier andthe specialised group-wise classifiers. This configurationcan therefore be defined as a filter structure.

Depending on the way in which the groups formedin the first layer are re-classified, these filter-based con-figurations can be classified into the following two broadcategories.

Decision support and reevaluation

One way of designing a group-wise classification config-uration is to implement multiple special classifiers (of-ten capable of trained rejection) along with a generalised

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 179

Fig. 9. Generalised multi-expert filter configuration

reject recovery classifier. A schematic of such a config-uration is presented in Fig. 10. In the following stages,specialised classifiers can be designed to reclassify mem-bers of that particular group and reject the others. For anm class problem, where m < n, n being the total numberof classes under investigation, this leads to the designof a classifier with m + 1 output classes. The charac-ters rejected at these stages may be channelled to thereject recovery classifier. A final group-wise decision isachieved by combining the decisions made by the spe-cialised classifiers and the decisions of the reject recov-ery classifier. The reject recovery classifiers are trained toproduce n output classes, as the rejected characters maybelong to any class. In general, the additional stages pro-vide effective backup for the first stages of classification.Sometimes this might be in the form of decision support,where experts appearing in later stages provide decisionverification (in term of support and confirmation) to theexperts appearing earlier in the hierarchy. In other con-figurations, the additional stages might provide supportin the form of re-evaluation, where experts appearing inlater stages may override (or confirm) decisions of theexperts appearing earlier in the hierarchy.

Research on these concepts have been reported byDuerr et al. [168], Suen et al. [169], Anisimovich et al.[170], Jonghyun et al. [171], Zhou et al. [172], Rahmanand Fairhurst [173, 174, 175, 176, 177, 178], Kwon et al.[179], Tung and Lee [180], and Nadal et al. [115].

Fig. 10. Combination of multiple experts: the terminology

Divide and conquer

If the ideas of decision support and re-evaluation arestretched further, it is seen that one way of achievingbetter performance is to form multiple groups by adopt-ing localised criteria, dividing a large problem into a fi-nite number of smaller problems. A schematic of sucha configuration is presented in Fig. 11. In this case, theprimary expert appearing in the first layer can be con-sidered as a multiple output filter structure, trained toseparate the incoming stream intended for a particulargroup into two streams, the first stream consisting of thecharacters which are accepted as correctly belonging tothat particular group and the other comprising the char-acters which are not. Hence, implementation of such afilter for an m class problem requires the design of a clas-sifier with two output classes. Each accepted group isthen processed by designated classifiers appearing laterin the hierarchy and the accepted characters are thenre-evaluated by these tailor-made classifiers dealing withparticular classes only. The specialised classifiers recog-nising the individual members of the small groups aretrained on m classes and hence designed to produce moutput classes, where m >> n, n again being the to-

180 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

Fig. 11. Generalised hybrid multi-expert structure: divideand conquer

tal number of classes under investigation and typicallym = 2, 3. Any rejected characters are processed by thereject recovery classifier, exactly as described in the pre-vious section. The final decision is reached by combiningthe corresponding decisions of these two types of classi-fier.

Research based on these ideas have been reported byZhou and Pavlidis [181], Errico et al. [182], Rahman andFairhurst [4, 183], Mao et al. [184], Wu et al. [185], Ha etal. [186],. Cho and Kim [187], and Corwin et al. [188].Other methods based on hybrid decision combinationstrategies are to be found in [189, 190, 191, 192].

Neural approaches

Some approaches using multiple layers of experts inter-connected in the hybrid configuration use neural net-works exclusively, both as constituent experts and as amain expert combining other experts. Specially impor-tant are the contributions of Reddy and Nagabhushan[193, 194], Price et al. [195], Kovacs [196], Cao et al. [197,198], Liu et al. [199], Rahman and Fairhurst [200], Wangand Jean [201], Zhou and Pavlidis [202], Teo and Shinghal[203], Lin et al. [204], Cho et al. [205], Kovacs et al. [206],Cho and Kim [207], Cho and Kim [208], Vlontzos andKung [209], Huang and Suen [210], Jung-Hsien Chiang[12], Cai et al. [211], Hiang et al. [212], Khofanzad andChung [213], Pawlicki [214], Jouny and Sheridan [215],

Fig. 12. A decision tree classifier configuration using multipleexperts

Houle and Eom [216], Mandalia et al. [217], Benedikts-son et al. [218], Alexandre and Guyot [219], Xiaoyan andSong [220], Rahman and Fairhurst [221], Wang et al. [222]and, finally, Sung-Bae and Kim [223]. Other neural net-work techniques embodying the multiple expert decisioncombination paradigm can be found in [224, 225, 226,227].

Decision tree classifiers

Decision tree classification is an important classificationmethodology. A decision tree classifier configuration con-sists of a root-node, a number of non-terminal nodes anda number of terminal nodes. Associated with the rootnode is the entire set of classes into which a patternmay be classified. A non-terminal node represents an in-termediate decision and its immediate descendant nodesrepresent the decisions originating from that particularnode. The decision-making process terminates at a termi-nal node and the pattern can be classified to belong to theassociated class. The schematic of a system incorporatingthese ideas is presented in Fig. 12. The main attraction ofthis configuration is the way the number of alternativesto a possible identification is reduced, which allows thecorresponding application of more focused algorithms torefine ambiguous decisions. It is possible to design theconfiguration in a way that enables only the most suit-able features to be used in training the decision-makingprocess associated with each node, thereby maximisingthe information usage of the corresponding feature space.The important parameters in these tree classifiers includethe specific tree structure, the identification of specificsub-feature space to be associated with each node andthe decision rule to be implemented at each node.

The research contributions of Lijia and Franklin [228],Sethi and Yoo [229], Perrone and Intrator [230], Perrone[231, 232], Aggarwal and Lereno [233], Shlien [234], Shi-

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 181

nozawa and Okoma [235], and Kurzynski [236] are espe-cially important in this respect.

4.3. Related research

This review has so far dealt directly with various multi-ple expert decision combination strategies and it is clearhow versatile and varied these approaches are in terms oftheir style, formation and underlying principles. It is im-portant to analyse these approaches in order to fully ap-preciate where the similarities and dissimilarities lie andto take advantage of their diversity to select an appropri-ate solution for a specific task domain. Researchers haveexplored various ways of analysing these multiple expertstrategies and their approaches range from knowledge-based methods to neural and expectation maximisationtheorems. This section discusses some of these approachesand reviews the relevant literature.

Rahman and Fairhurst [5] analyse the problem of de-cision combination of multiple experts from a perspec-tive based on the assumption that the success or fail-ure of the decision combination strategy largely dependson the extent to which the various possible sources ofinformation are exploited. They argue that it is pos-sible to evaluate multiple expert decision combinationapproaches based on their capability for exploiting di-verse information sources as a yardstick in estimatingthe level of performance that is achievable. This “infor-mation management” approach is also explored by otherresearchers. Given that the purpose of a decision fusionsystem is to combine related data from multiple sourcesto provide enhanced information, one way of assessing theperformance is to measure the enhancement or degrada-tion in the information provided by the system. The mostcommonly used method employs the relationship whichexists between measures of information and measures ofuncertainty. By convention, measures of uncertainty fora system can also be treated as measures of the potentialinformation in the system and a decrease in uncertaintyshould amount to an increase in the potential informa-tion. Oxenham et al. [237] describe a rule-based expertsystem which performs high level data fusion for humandecision support and discuss measures of information de-vised for assessing the performance of the system. Theyalso discuss ways of combining these measures to gaugethe enhancement or degradation in the information pro-vided by the system. Bollacker and Ghosh [238], on theother hand, propose a framework which reuses knowl-edge from previously trained classifiers to improve per-formance in a current and possibly related classificationtask. They test their system on a typical character recog-nition task and achieve very encouraging results.

Many researchers apply probability theories inanalysing the behaviour of multiple expert systems. Suenet al. [239] propose an advanced hierarchical model toproduce a more effective character recogniser based onthe probability of occurrence of the patterns. New defi-nitions such as crucial parts, efficiency ratios, degree ofconfusion and similar character pairs are also given to fa-

cilitate pattern analysis and character recognition. Usingthese definitions, algorithms are developed to recognisethe characters by parts. The recognition rates are anal-ysed and compared to those obtained from subjective ex-periments.

Another way of analysing these methods is throughapplication of the theory of entropy. Tang et al. [240] pro-pose an approach to analyse systematically the changesin entropy which occur in the different stages of a patternrecogniser. This models the entire pattern recognitionsystem as a multiple level information source (MLIS).For a typical recognition system there are four levels inthis information source, IS1 to IS4, and these can be di-vided into two categories: entropy-reduced and entropy-increased. By examining the internal structures of a pat-tern recognition system, it is possible to use MLIS to ad-dress all the different factors which increase the entropyat the different levels of the entire system. A theoreticalanalysis of the entropy distribution in the MLIS indi-cates that in order to improve the performance of a pat-tern recognition system, the entropy of the MLIS must bereduced in all the different levels. Elsewhere, Wang andSuen [241] analyse the general decision tree classifier withoverlap based on a recursive process of reducing entropy.They show that when the number of pattern classes isvery large, formal theorems can reveal both the advan-tages of tree classifiers and the main difficulties in theirimplementation.

The Expectation-maximisation (EM) algorithm is an-other promising tool for analysis of these configurations.Jordan and Xu [242] have shown that the EM algorithm,which is an iterative approach to maximum likelihood pa-rameter estimation, can be regarded as a variable metricalgorithm with its searching direction having a positiveprojection on the gradient of the log likelihood. This workis based on Jordan and Jacobs’ [243] EM algorithm in-volving a mixture of various experts (experts originallyproposed by Jacobs, Jordan, Nowlan and Hinton (1991)).Similar arguments can also be found in [244] and otherresearchers have expressed interest in the use of multiplemodels for classification and regression. The Hierarchi-cal Mixture of Experts (HME) has been successful in anumber of regression problems, yielding faster trainingthrough the use of the EM algorithm. Waterhouse andRobinson [245] have extended the HME to classificationand have reported results for common classification prob-lems.

Researchers have also formulated theoretical frame-works concerning multiple level decision combinationconfigurations as applied to character recognition. Pudilet al. [246] report the idea of constructing a multi-stagepattern classification system with a reject option. Theyderive the conditions in terms of upper bounds of the costof higher stage measurements for a multi-stage classifierto give lower decision risk than a single-stage classifierand demonstrate that such a system with a reject optionin all the stages except the final one can yield a lower av-erage decision risk than the commonly used single-stageclassification systems. Others have sought to define the

182 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

identity of optimal stages for a multi-stage classificationsystem and the determination of optimal recognition al-gorithms. A very interesting problem in multiple expertdecision fusion is determining the upper limit of perfor-mance enhancement when multiple classifiers are com-bined in any framework. Tumer and Ghosh [247] addressthis problem, reporting two approaches to the estima-tion of performance limits in hybrid networks. The firstinvolves a framework that estimates Bayes error rateswhen linear combiners are used and the second is a moregeneral method that provides decision confidences anderror bounds based on error types arising from the train-ing data. Although they demonstrate the applicability oftheir methods in a different task domain, the issues aregeneralised enough to be relevant to handwritten char-acter recognition. Foggia et al. [248], on the other hand,discuss reject criteria for a Bayesian combiner, and ex-tend this to determine the trade-off between error andrejection rate for a multi-expert system. The method isbased on estimation of reliability of each classificationact. The authors demonstrate the applicability of theirapproach on a standard handwritten character database.

Another interesting problem is the optimality of mul-tiple expert decision combination frameworks. Bubnicki[249] discusses the problem of optimisation of a two-levelclassifier as opposed to the relatively common single-levelclassifier combinations. Pattern recognition in the two-level system depends on taking into account not onlythe directly measured features, but also features whichhave been previously transformed during identificationor recognition procedures. Similarly, Jozefczyk [250] in-vestigates a specific two-level system and discusses theapplication of linear decision functions to the determina-tion of the optimal multi-stage recognition algorithms.

Many practical classification problems and processingstructures require the implementation of optimal patterndichotomisers to distinguish solely between class pairs.Examples are found in problem-specific domains, in opti-mising preprocessing for neurally-based architectures andin the realisation of high performance multi-expert clas-sifier structures ( [197], [251], [173], etc.). In many cases,efficient dichotomiser structures become difficult to definebecause of the high similarity among the classes to be dis-tinguished. A technique for finding areas of maximum dis-similarity between the statistical models of two closely re-sembling classes is introduced by Rahman and Fairhurst[252]. It is shown that this technique combines statisticaland syntactic recognition techniques by extracting fea-tures from structurally distinctive regions of the modelsand is applicable in a variety of image classification andrecognition tasks. Argentiero et al. [253] analyse an auto-mated technique for effective decision tree design whichrelies only on a priori statistics. They utilise canonicaltransforms and Bayes look-up decision rules and also pro-duce a procedure for computing the global probability ofcorrect classification.

Attention has been given to the problem of classifierselection in a multiple expert configuration. In general,questions about how many experts are to be combined,

and which experts are to be combined to achieve thebest performance, are solved by experimentation (typi-cally trial and error). Automation of this process is essen-tial to make these solutions versatile enough to be appliedto a variety of applications. Kim et al. [254] propose suchan approach by introducing a similarity measure whichcan be calculated from the errors of individual classifiers,and used as an index to select experts. Lee and Srihari[255] argue that effective combination of multiple expertscan be achieved if dependency information is used to dy-namically combine classifiers. They identify two types ofdynamic selections. Postconditioned selection seeks bet-ter approximation to unconditioned classifier output dis-tribution, whereas preconditioned selection captures thevariations in the density function of classifier outputs con-ditioned on the inputs. Although both types of selectionshave the potential to improve combination performance,they argue that preconditioned selections have lower er-ror bounds than postconditioned selections.

The solutions discussed so far are applicable onlywhen combining classifiers in a horizontal hierarchy. Forexample, a hybrid multiple expert classification structureneeds to be re-configured and re-tuned to take accountof the statistics of the new data if the target datasetchanges. Rahman and Fairhurst [256] address this issueby developing an approach by which this type of opti-misation and task orientation can be achieved automat-ically. The reconfiguration process customarily involvescycling through the process of training and re-evaluationon the available algorithms, and the novelty of the pro-posed approach is to streamline and make effective thisoperation through a process of structured optimisationby applying a genetic algorithm. Hashem [257, 258, 259]discusses the problems of selecting classifiers in a multi-ple expert framework when all the individual experts areneural networks. Since the corresponding outputs of thecombined networks approximate the same physical quan-tity (or quantities), the linear dependency (collinearity)among these outputs may affect the estimation of the op-timal weights for combining the networks, resulting in acombined model which is inferior to the apparent bestnetwork. The author presents two algorithms for select-ing the component networks for the combination in orderto reduce the undesirable effects of collinearity, thus im-proving the overall generalisation ability. Elsewhere, theapproximation accuracy of this combined model is com-pared with two common alternatives: the apparent bestnetwork or the simple average of the corresponding net-work outputs (Hashem [260]).

5. Conclusion

This paper has reviewed the field of multiple expertcombination techniques to recognise handwritten andmachine printed characters. As has been demonstrated,there are numerous ways to combine decisions of multi-ple classifiers. A new method of categorising these meth-ods based on how information is exchanged between vari-ous cooperating experts has been presented, and reported

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 183

methods have been reviewed within this proposed frame-work. Many of these methods are generic and are appli-cable to other task domains, a generality which is alsohighlighted in the course of the review. The issue of in-formation flow and its management is one of the mostimportant criteria of successful decision fusion, a topicthat has been discussed at length in this paper.

Recently, attempts have been made to unify thesediverse multiple expert approaches within a commonframework. Kittler et al. [111], for example, have devel-oped a framework for combining classifiers which uses dis-tinct pattern representations. They show that many ex-isting schemes can be considered as special cases of com-pound classification where all the pattern representationsare used jointly to make a decision. This, however, coversonly approaches based on a horizontal information flowsystem. Another generic framework is proposed by Rah-man and Fairhurst [261] in modelling multi-layer decisionfusion in terms of the errors accumulated at each stage.This model is completely generalised and can approxi-mate any multi-layer decision combination configurationin terms of information flow. Elsewhere, Rahman andFairhurst [262] present a generalised multi-expert multi-level decision combination strategy, the vertical (serial)combination approach, from the dual viewpoints of the-oretical analysis and practical implementation.

A very important issue associated with multiple ex-pert approaches that needs to be addressed is the ques-tion of comparability. The degree of difficulty in com-paring the various approaches is high, as the number ofvariables in these configurations is much greater than inthe case of stand-alone classifiers. In some cases recogni-tion performance alone is considered, whereas in othersthroughput has been selected as the primary parameterof interest. This is a very important issue and the choiceof a particular algorithm for a specific task depends onthe availability of such comparative data. It is beyondthe scope of this paper to review these issues in detail,but typical papers of relevance include Wesolkowski andHassanein [263], Matsui et al. [264], Tsutsumida et al.[265], Sabourin et al. [266], Gascuel et al. [267], Antanaset al. [268], Auda and Kamel [269], Alimouglu and Al-paydin [270], Chen and Li [271], Cheng and Xia [272],Mitra and Kim [167], Rahman and Fairhurst [273, 274,43, 275], and Fairhurst and Mattaso Maia [276].

As multiple expert approaches are increasingly ap-plied to real time applications, speed of execution(throughput) has become a very important issue. Asidefrom optimisation of these decision combination algo-rithms on traditional software platforms, possible imple-mentations using different platforms (e.g., DSP, FPGA)have been explored (Maruyama et al. [277], Martonfi andSzigeti [278], Rahman et al. [279]). In some cases, a com-bination of software and hardware platforms has alsobeen explored (e.g., Rahman and Fairhurst [280]).

The methods cited so far are directly related to therecognition of handwritten or printed characters basedon the concept of multiple experts, but there are otherresearch activities that contribute to this very interesting

and important field. Despite the fact that their contribu-tion is indirect, nevertheless the impact of this researchon the overall advancement of this research area is im-portant enough to deserve a mention here. The Theory ofProbabilities has been applied by various researchers toexplore ways of fusing decision algorithms (e.g., Chen andAnsari [281]). Evidence Theory has been used extensively,both for theoretical and practical studies concerning de-cision fusion. References include Shafer and Logan [282],Franke and Mandler [283], Mandler and Schurmann [40],Ng and Singh [284], Denoeux [285], Bloch [286], Lingrasand Wong [42], Lu [287, 288], Yen [289], Ruspini et al.[290], Deutsch-McLeish [291], Spillman [292], Tchamova[293], Haralick [294], Zhu et al. [295] Norton and Hirsh[296], and Mogre et al. [297]. Some researchers have em-phasised the need to gain a better understanding of hu-man expertise (psychology) in solving recognition prob-lems as a prerequisite of developing an efficient computersystem to emulate the recognition capacity of humans(e.g., Legault et al. [298, 299]). There has also been atremendous contribution made in the understanding ofthe problem of combination of multiple experts from re-searchers working on expert, control and fuzzy systems(e.g., Wang and Feng [300], Edwards et al. [301], Bar-low et al. [302], Suen et al. [303], Chatterji [304], Srihari[305, 306]). Some have considered the problem of decisionfusion where information sources are imprecise (e.g., Ya-gar [307]), others have exploited probabilistic reasoning(e.g., Kwoh and Gillies [308]), competition among the dis-tinct expert systems (e.g., Holsapple et al. [309]), methodof aggregation (e.g., Troutt et al.), belief functions (e.g.,Kramosil [310], forecasting methods (e.g., Finnie [311]),or adaptive combination (e.g., Loo-Nin and Ah-Hwee[312]). Problems concerning decisions derived from sta-tistical reasoning also include the uncertainty and igno-rance aspect of the combined decisions.

Fuzzy rules have been put forward by some re-searchers as a possible solution to this problem (e.g., DeMathelin and Perneel [313], Cao et al. [314, 315], Chat-terji [304]). Researchers from social science disciplineshave also utilised various concepts related to multipleexpert decision combination for applications in the fieldsof economics (Pattanaik [316]), study of elections (Black[317]), voting practices of the general public (Farquhar-son [318]), politics (Pattanaik [319]), committee decisions(Black and Newing [320]) and game theory (Peleg [321]).Historically, Bernoulli (Cantor [322]), Borda [323], Con-dorcet [37], Laplace (Pearson [324]), Nanson [325], Gal-ton [326], and Dodgson [327] are key researchers whohave contributed significantly to the understanding of thevery complex problem of decision combination by exploit-ing multiple sources of information and multiple individ-ual decisions. Further information about these and rulessuch as the Social Welfare Function or Groups Consen-sus Functions can be found in Fishburn [328] and Good-man and Markowitz [329]. It is fascinating to see howwidespread is the application of multiple expert methods.The fields described here are by no means exhaustive, and

184 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

should be treated as a preliminary guide to further studyby interested readers.

The final issue to be discussed is the idea of combin-ing decision combination methods. Since various multi-ple expert methodologies are available, it is possible tocombine various combination methods to make the fi-nal classification even more robust and to drastically re-duce error rates, a requirement in many practical appli-cations (Knerr et al. [330], Leroux et al. [331], Lee etal. [332], etc.). Paik et al. [333] report such a system.For the recogniser, they use three representative recog-nisers: a structural, a statistical, and a neural networkapproach. They also use three combining methods (avote, a Bayesian, and BKS) to combine the results ob-tained from three recognisers. A very interesting mod-elling of human problem solving is reported by Anzaiet al. [334], where a combination of a serial (vertical)and a parallel (horizontal) information-processing modelis investigated. The model is characterised by a struc-ture with mutual feedback between a parallel perceptualprocessing and serial conceptual processing modules. Itis argued that the model reflects sharply the macro-levelfunctional locality of human brain, and also provides abasis for constructing inference-driven pattern recogni-tion systems. This has a very interesting relation to theinformation fusion model discussed above in classifyingvarious multiple expert configurations.

One final word about the scope of this review. Likeany review of this kind, this review is neither complete,nor is it meant to be. The aim of this review was todemonstrate the advances in the field of decision combi-nation involving multiple classifiers, the width and breathof applications to these concepts and present a compre-hensive picture of the state of the art. In similar fashion,the list of papers is by no means exhaustive, and only aselection to demonstrate the richness of the field. For fur-ther reading, the reader is directed to the proceedings ofestablished Pattern Recognition related conferences, spe-cially the International Conference of Document Analysisand Recognition (ICDAR), International Conference ofPattern Recognition (ICPR) and International Workshopon the Frontiers of Handwriting Recognition (IWFHR),all of which had special sessions on multiple classifiercombination techniques. Specifically, the series of work-shops on Multiple Classifier Systems (MCS) that takesplace every year, exclusively concentrate on this topic.

Although this paper has reviewed multiple expert de-cision combination approaches as applied to the charac-ter recognition problem, most of the methods describedhere are essentially generic in nature and are applicableto other task domains. Typical examples include signa-ture verification (e.g., Bajaj and Chaudhury [9]), medi-cal imaging (e.g., Kittler et al. [8]), shop-floor schedul-ing (e.g., Choi and Park [335]), writer verification (e.g.,Huang et al. [336]), medical systems (e.g., Terziyan et al.[337]), visual classification (e.g., Ellis et al. [338]), wordrecognition (e.g., Gader and Mohamed [339]), forecast-ing (e.g., Golovchenko and Noskov [340]), Geological In-formation Systems (GIS) (e.g., Goodenough and Robson

[341]), civil engineering (e.g., Gouveia and Barthes [342]),industrial sensor fusion (e.g., Hanson et al. [343]), soft-ware reliability (e.g., Hee-Joong and Kim [91]), imageprocessing (e.g., Ke et al. [344]) and remote sensing (e.g.,Sawaragi et al. [345]), to mention just a few. It is hopedthat this review will give researchers in different disci-plines, who are interested in combining information frommultiple sources, the inspiration to explore these issuesfurther and to use this material as a preliminary guide tothe study of decision combination. For researchers in thefield of character recognition, it is hoped that this reviewwill provide an overview of the huge advances apparentin this field to date and help to support a deeper under-standing of how these methods can achieve performanceenhancement.

References

1. Ho TK, Hull JJ, Srihari SN (1994) Decision combinationin multiple classifier systems. IEEE Trans Pattern AnalMach Intell 16(1):66–75

2. Xu L, Krzyzak A, Suen CY (1991) Associative switchfor combining multiple classifiers. In: Int. Joint Conf. onNeural Networks, vol.1. pp 43–48, Seattle, Wash., USA

3. Suen CY, Nadal C, Mai TA, Legault R, Lam L (1990)Recognition of totally unconstrained handwritten nu-merals based on the concept of multiple experts. In:Proc. IWFHR, pp 131–143, Montreal, Canada

4. Fairhurst MC, Rahman AFR (1997) A generalised ap-proach to the recognition of structurally similar hand-written characters. IEE Proc Vision Image Signal Pro-cess 144(1):15–22

5. Rahman AFR, Fairhurst MC (1999) Enhancing multipleexpert decision combination strategies through exploita-tion of a priori information sources. IEE Proc VisionImage Signal Process 146(1):1–10

6. Yuan T, Lo-Ting T, Jiming L, Seong-Whan L, Win-WinL (1998) Off-line recognition of Chinese handwriting bymultifeature and multilevel classification. IEEE TransPattern Anal Mach Intell 20(5):556–561

7. Yaeger LS, Webb BJ, Lyon RF (1998) Combining neuralnetworks and context driven search for online, printedhandwriting recognition in the NEWTON. AI Mag19(1):73–89

8. Kittler J, Hojjatoleslami A, Windeatt T (1997) Weight-ing factors in multiple expert fusion. In: Proc. BritishMach Vision Conference, pp 41–50

9. Bajaj R, Chaudhury S (1997) Signature verificationusing multiple neural classifiers. Pattern Recognition30(1):1–7

10. Chen K, Wang L, Chi H (1997) Methods of combiningmultiple classifiers with different features and their ap-plications to text-independent speaker recognition. Pat-tern Recognit Artif Intell 11(3):417–445

11. Larkey LS, Bruce Croft W (1996) Combining classifiersin text categorization. In: 19th Conference on Researchand Development in Information Retrieval, pp 289–297,Zurich, Switzerland

12. Cherkauer KJ (1996) Human expert-level performanceon a scientific image analysis task by a system usingcombined artificial neural networks. In: 13th National

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 185

Conference on Artif Intelligence: Working Notes, In-tegrating Multiple Learned Models for Improving andScaling Machine Learning Algorithms Workshop, Port-land, Ore., USA

13. Xu L, Krzyzak A, Suen CY (1992) Methods of combin-ing multiple classifiers and their applications to hand-writing recognition. IEEE Trans Syst Man Cybern23(3):418–435

14. Chi Z, Wu J, Yan H (1995) Handwritten numeral recog-nition using self-organising maps and fuzzy rules. Pat-tern Recogn 28(1):59–66

15. Lam L, Suen CY (1988) Structural classification and re-laxation matching of totally unconstrained handwrittenzip-code numbers. Pattern Recogn 21(1):19–31

16. CEDAR CDROM-1, State University of New York atBuffalo, UB Commons, 520 Lee Entrance, Suite 202,Amherst, NY 14228-2567, USA

17. Lucas S, Amiri A (1996) Statistical syntactic methodsfor high-performance OCR. IEE Proc Vision Image Sig-nal Process 143(1):23–30

18. Vision Speech Signal Processing Group, Department ofElectronic & Electrical Engineering, University of Sur-rey, Guildford, Surrey GU2 5XH, UK

19. Image Processing & Computer Vision Research Group,Electronic Engineering, University of Kent, Canterbury,CT2 7NT, UK

20. Intelligent Systems Laboratory, Dept. of Electrical Engi-neering, FT-10, University of Washington, Seattle, WA98195, USA

21. NIST Special Databases 1–3, 6–8, 19, 20, National In-stitute of Standards and Technology, Gaithersburg, MD20899, USA

22. CEDAR CDROM-2, State University of New York atBuffalo, UB Commons, 520 Lee Entrance, Suite 202,Amherst, NY 14228-2567, USA

23. ERIM: Environmental Research Institute of Michigan,Document Processing Research Program, P.O. Box134001, Ann Arbor, MI 48113-4001, USA

24. Handwriting Recognition Group, Nijmegen Institute forCognition and Information, P.O. Box 9104, 6500 He Ni-jmegen, The Netherlands

25. Fairhurst MC (1988) Computer vision for robotic sys-tems: an introduction. Prentice-Hall, Reading, Mass.,USA

26. Hansen LK, Salamon P (1988) Neural network ensem-bles. IEEE Trans Pattern Anal Mach Intell 12(10):993–1001

27. Special issue on combining artificial neural nets: ensem-ble Approaches (1996) Int J Connection Sci 8:3–4

28. Special issue on combining artificial neural nets: modu-lar approaches (1997) Int J Connection Sci 9:(1)

29. Sharkey AJC (ed) (1999) Combining artificial neuralnets: ensemble and modular multi-net systems. Perspec-tives in neural computing. Springer, Berlin HeidelbergNew York

30. Sung-Bae C, Kim JH (1993) A multiple network ar-chitecture combined by fuzzy integral. In: IJCNN ’93-Nagoya. Proc. 1993 International Joint Conference onNeural Networks (Cat. No.93CH3353-0), pp 1373–1376

31. Sung-Bae C, Kim JH (1995) Combining multiple neu-ral networks by fuzzy integral for robust classification.IEEE Trans Syst Man Cybern 25(1):380–384

32. Sung-Bae C, Kim JH (1995) Multiple network fusion us-ing fuzzy logic. IEEE Trans Neural Networks 6(1):497–501

33. Wolpert DH (1992) Stacked generalization. Neural Net-works 5(1):241–259

34. Kearns M, Mansour Y (1996) On the boosting abilityof top-down decision tree learning algorithms. In: 28thAnnual ACM Symposium on the Theory of Computing,Philadelphia, Pa., USA

35. Breiman L (1996) Bagging predictors. Mach Learn24(1):123–140

36. Todhunter I (1865) A history of the mathematical the-ory of probability from the time of pascal to that oflaplace. Macmillan, Cambridge, UK

37. deCondorcet NC (1785) Essai sur l’application del’analyzea la probabilite des decisions rendues a la plu-ralite des voix. Imprimerie Royale, Paris, France

38. Mazurov VD, Krivonogov AI, Kazantsev VL (1987)Solving of optimisation and identification problems bythe committee methods. Pattern Recogn 20(2):371–378

39. Tin KH, Hull JJ, Srihari SN (1992) A regression ap-proach to combination of decisions by multiple characterrecognition algorithms. Proc. SPIE 1661:137–145

40. Mandler E, Schurmann J (1988) Combining the classi-fication results of independent classifiers based on theDempster/Shafer theory of evidence. In: Gelsema ES,Kanal LN(eds) North-Holland, Amsterdam, pp 381–393

41. Jian-Bo Y, Singh MG (1994) An evidential reasoning ap-proach for multiple-attribute decision making with un-certainty. IEEE Trans Syst Man Cybern 24(1):1–18

42. Lingras P, Wong SKM (1988) An optimistic rule for ac-cumulation of evidence. In: Methodologies for intelligentsystems 3. Proc. 3rd International Symposium, pp 60–69

43. Rahman AFR, Fairhurst MC (1998) An evaluation ofmulti-expert configurations for recognition of handwrit-ten numerals. Pattern Recogn 31(9):1255–1273

44. Kovacs ZM, Ragazzoni R, Rovatti R, Guerrieri R(1993) Improved handwritten character recognition us-ing second-order information from training set. ElectronLett 29(14):1308–1310

45. Wu QZ, Jou IC, Lecunn Y (1995) A pyramid multires-olution classifier for online large vocabulary Chinesecharacter-recognition. In: Proc. Conf. on Visual Com-munications and Imaging Processing: SPIE 2501:158–165, Taipei, Taiwan

46. Dimauro G, Gerardi G, Impedovo S, Pirlo G, TegoloD (1992) Integration of a structural features-based pre-classifier and a man-machine interactive classifier for afast multi-stroke character-recognition. In: 11th IAPRInt. Conf. on Pattern Recogn pp 190–194, The Hague,Netherlands

47. Vlontzos JA, Kung SY (1989) A hierarchical system forcharacter-recognition. In: IEEE Int. Symp. on Circuitsand Systems (ISCAS), pp 1–4, Portland, Ore., USA

48. Shridhar M, Badreldin A (1983) A 2-stage character-recognition algorithm combining Fourier and topologi-cal descriptors. In: Int. Conf. on Systems Man Cybern1/2:658–662

49. Ahmed P, Suen CY (1988) A decision-making methodand its application in unconstrained handwrittencharacter-recognition. In: Proc. 1st Int. Conf. on Indus-trial and Engineering Applications of Artificial Intelli-gence and Expert Systems (IEA/AIE-88), Vols. 1 and2, pp 638–644, Baltimore, Md., USA

186 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

50. Shridhar M, Badreldin A (1984) High-accuracycharacter-recognition algorithm using Fourier and topo-logical descriptors. Pattern Recogn 17(5):515–524

51. Brown RM, Fay TH, Walker CL (1988) Handprintedsymbol recognition systems. Pattern Recogn 21(1):91–118

52. Rodriguez C, Muguerza J, Navarro M, Zarate A, MartinJI, Perez JM (1997) Digit recognition in forms: a two-phase solution. Informatica Automatica 30(3):34–41

53. Zhou X-L, Hua X-C, Li F (1995) A method of Jia GuWen recognition based on a two-level classification. In:Proc. 3rd Int. Conf. on Document Analysis Recognitionpp 833–836, Montreal, Que., Canada

54. Jing Z, Xiaoqing D, Youshou W, Fanxia G (1997) Com-binatorial coarse classification method for OLCCR. In:Computer Vision – ACCV ’98. 3rd Asian Conference onComputer Vision. Proc. pp 145–152

55. Suzuki T, Nishida H, Nakajima Y, Yamagata H,Tachikawa M, Sato G (1995) A handwritten characterrecognition system by efficient combination of multi-ple classifiers. In: International Association for PatternRecognition Workshop on Document Analysis Systems,pp 169–187

56. Yamagata H, Nishid Ha, Suzuki T, Tachikawa M, Naka-jima Y, Sato G (1996) A handwritten character recog-nition system by efficient combination of multiple clas-sifiers. IEICE Trans Inf Syst E79-D(5):498–50333

57. Wang G-E, Wang J-F (1995) A multi-layer classifier forrecognition of unconstrained handwritten numerals. In:Proc. 3rd Int. Conf. on Document Analysis Recognitionpp 849–852, Montreal, Quebec, Canada

58. Park H-S, Song H-H, Lee S-W (1998) A self-organizinghierarchical classifier for multi-lingual large-set orientalcharacter recognition. Int J Pattern Recognit Artif Intell12(1):191–208

59. Lee S-W, Park H-S (1996) Multi-lingual large-set Ori-ental character recognition using a hierarchical neu-ral network classifier. Comput Process Oriental Lang10(1):129–145

60. Wang H, Bell D, Ojha P (1995) A new approach topattern classification: cooperation between constrainedassociative network and MLP. In: Proc. Int. Conf. onDigital Image Computing: Techniques and Applications,pp 169–173, Brisbane, Australia

61. Kimura Y, Wakahara T, Odaka K (1997) Combiningstatistical pattern recognition approach with neural net-works for recognition of large-set categories. In: Proc.Int. Conf. on Neural Networks, pp 1429–1432, Houston,Tex. USA

62. Hsin-Chia F, Kuo-Ping C (1995) Recognition of hand-written Chinese characters by multi-stage neural net-work classifiers. In: 1995 IEEE Int Conference on NeuralNetworks Proceedings (Cat. No.95CH35828), pp 2149–2153

63. Lee LL, Gomes NR (1997) Handwritten numeral recog-nition using a sequential classifier. In: Proc. IEEE Int.Symposium on Information Theory, p 212, Ulm, Ger-many

64. De Carvalho A, Fairhurst MC, Bisset DL (1992) Inte-grated Boolean neural networks for feature extractionand image classification. In: IEE Colloquium on NeuralNetworks for Image Processing Applications (Digest No.186), pp 2/1–2/3

65. De Carvalho A, Fairhurst MC, Bisset DL (1994) An inte-grated Boolean neural network for pattern classification.Pattern Recogn Lett 15(8):807–813

66. De Carvalho A, Fairhurst MC, Bisset DL (1997) Com-bining Boolean neural architectures for image recogni-tion. Connection Sci 9(2):405–418

67. De Carvalho A, Fairhurst MC, Bisset DL (1998) Com-bining two Boolean neural networks for image classifica-tion in RAM Based Neural Networks, ch. 3. In: JamesAustin (ed) Progress in neural processing 9. World Sci-entific, Singapore

68. Cao J, Ahmadi M, Shridhar M (1997) A hierarchicalneural network architecture for handwritten numeralrecognition. Pattern Recogn 30(1):289–294

69. Huang YS, Liu K, Suen CY (1995) The combination ofmultiple classifiers by a neural-network approach. Int JPattern Recognit Artif Intell 9(3):579–597

70. Shih J-L, Chung P-C (1994) Two-dimensional invariantpattern recognition using a back-propagation networkimproved by distributed associative memory. In: 4thConference on Hybrid Image Signal Processing: SPIE2238:140–147, Orlando, Fla., USA

71. Xuefang Z, Qingyun S, Minteh C (1998) Research on au-tomatic recognition of unconstrained handwritten dig-its. High Technol Lett 8(3):25–28

72. Masulli F, Sperduti A, Alfonso D (1996) A hybrid pat-tern recognition scheme. Proc. SPIE 2761:154–164

73. Gwo-En W, Jhing-Fa W (1994) A new hierarchical ap-proach for recognition of unconstrained handwritten nu-merals. IEEE Trans Consumer Electron 40(3):428–436

74. Tanprasert C, Sinthupinyo S, Dubey P, Tanprasert T(1998) Improved mixed Thai and English OCR usingtwo-step neural net classification. In: Proc. Int. Conf.on Neural Information Processing and Intelligent Infor-mation Systems, pp 1227–1230, Dunedin, New Zealand

75. Kimura M, Ejima T, Aso H, Yashiro H, Son N, SuzukiM (1988) An intelligent character recognition systemwith high accuracy and high speed by integrating image-type and logical-type information processings. In: 9thInt Conference on Pattern Recognition (IEEE Cat.No.88CH2614-6), pp 38–40

76. Kimura F, Shridhar M (1991) Handwritten numeralrecognition based on multiple algorithms. PatternRecogn 24(10):969–983

77. Gader P, Forester B (1990) Integrating template andmodel matching for unconstrained handwritten numeralrecognition. In: SPSE’s 43rd Annual Conference, pp 60–62

78. Byung KA, Chin TC, Seong HK, Ki SH (1992) Two-stage hand-written digit recognition system by usingthe expert-classifiers. J Korean Inst Telematics Electron29B(12):1195–1206

79. Shindo N, Aso H, Kimura M (1994) A compound recog-nition algorithm good for printed characters of bad qual-ity. Trans Inf Process Soc Japan 35(9):1714–1721

80. Rahman AAFR, Fairhurst MC (1998) A multiple-expertdecision combination strategy for handwritten characterrecognition. In: Proc. Int. Conf. on Computational Lin-guistics, Speech and Document Process February 18–20,pp A23–A28, Calcutta, India

81. Kao WC, Parng TM (1998) Integrating statistical andstructural approaches to handprinted Chinese characterrecognition. IEICE Trans Inf Syst E81-D(2):391–400

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 187

82. Toda M, Magome Y, Kubota K (1997) A high-speedrough classification method based on associative match-ing technique. Trans Inst Electron Inf Comm Eng D-II,J80D-II(11):2920–2929

83. Toda M, Magome Y, Jinling H, Kubota T (1996) Anon-line character recognition system using a new pre-processing and detailed classification algorithm. Re-search Reports of the Faculty of Engineering, TokyoDenki University, (44):17–25

84. Gader P, Forester B, Ganzberger M, Gillies A, MitchellB, Whalen M, Yocum T (1991) Recognition of handwrit-ten digits using template and model matching. PatternRecogn 24(5):421–431

85. Fairhurst MC, Mattaso Maia MAG (1983) A two-layermemory network architecture for a pattern classifier.Pattern Recogn Lett 1(2):267–271

86. Fairhurst MC, Abdel Wahab HMS (1990) An interac-tive two-level architecture for a memory network patternclassifier. Pattern Recogn Lett 11(8):537–540

87. Ng GS, Low SK, Lau CT (1992) A hybrid system forhandwritten digit recognition. In: Proc. 1st SingaporeInt Conference on Intelligent Systems (SPICIS ’92). In-telligent Systems 2000, pp 397–401

88. Jeong SP, Seong WL (1994) Efficient two-step patternmatching method for off-line recognition of handwrit-ten Hangul (1998) J Korean Inst Telematics Electron31B(2):1–8

89. Haralick RM (1976) The table look-up rule. CommStatistics-Theory Meth A5(12):1163–1191

90. Kang H-J, Kim JH (1995) Dependency relationship-based decision combination in multiple classifier sys-tems. In: Proc. 14th Int. Joint Conf. on Artif Intell pp1130–1136, Montreal, Canada

91. Hee-Joong K, Kim JH (1995) Combining multiple clas-sifiers based on dependency and its application. J KISS[B] Software Appl 22(11):1590–1999

92. Kang HJ, Kim JH (1997) A probabilistic framework forcombining multiple classifiers at abstract level. In: Proc.4th Int. Conference on Document Analysis RecognitionICDAR97, vol. 2, pp 870–874, Ulm, Germany

93. Kang HJ, Kim K, Kim JH (1997) Approximating op-timally discrete probability distribution with kth-orderdependency for combining multiple decisions. Informa-tion Processing Lett 62(1):67–75

94. Kang HJ, Kim K, Kim JH (1997) Optimal approxima-tion of discrete probability distribution with kth-orderdependency and its application to combining multipleclassifiers. Pattern Recogn Lett 18(6):515–523

95. Lam L, Suen CY (1997) Application of majority vot-ing to pattern recognition: an analysis of its behaviorand performance. IEEE Trans Pattern Anal Mach Intell27(5):553–568

96. Ng CS, Singh H (1998) Democracy in pattern classifica-tions: combinations of votes from various pattern clas-sifiers. Artif Intelligence Eng 12(3):189–204

97. Stajniak A, Szostakowski J, Skoneczny S (1997) Mixedneural-traditional classifier for character recognition. In:Proc. Int. Conf. on Imaging Sciences and Display Tech-nologies: SPIE, 2949:102–110, Berlin, Germany

98. Belaid A, Anigbogu JC (1994) Use of many classifiersfor multifont text recognition. Traitement du Signal11(1):57–75

99. Parker JR (1995) Voting methods for multiple au-tonomous agents. In: Proc. 3rd Australian and NewZealand Conf. on Intelligent Information Systems, pp128–133, Perth, Australia

100. Parker JR (1995) Recognition of hand printed digits us-ing multiple parallel methods. In: Intelligent Systems.3rd Golden West Int Conference, pp 923–931

101. Ji C, Ma S (1997) Combination of weak classfiers. IEEETrans Neural Networks 8(1):32–42

102. Lam L, Suen CY (1994) A theoretical-analysis of theapplication of majority voting to pattern-recognition.In: Proc. 12th IAPR Int. Conf. on Pattern RecognitionConf. B: Pattern Recognition and Neural Networks, vol.2, pp 418–420, Jerusalem, Israel

103. Lam L, Suen CY (1995) Optimal combinations of pat-tern classifiers. Pattern Recogn Lett 16(9):945–954

104. Alpaydin E (1994) Improved classification accuracy bytraining multiple models and taking a vote. In: 6th Ital-ian Workshop. Neural Nets Wirn Vietri-93, pp 180–185

105. Gang JR, Woo TK, Sung IC (1995) Recognition ofprinted and handwritten numerals using multiple fea-tures and modularized neural networks. J Korean InstTelematics 32B(10):101–1111

106. Rovatti R, Ragazzoni R, Kovacs ZM, Guerrieri R (1995)Voting rules for k-nearest neighbors classifiers. NeuralComput 7(3):594–605

107. Fairhurst MC, Rahman AFR (2000) Enhancing consen-sus in multiple expert decision fusion. IEE Proc VisionImage Signal Process 147(1):39–46 (in press)

108. Ho KH, Hull JJ, Srihari SN (1992) Combination of deci-sions by multiple classifiers in structured document im-age analysis. In: Baird HS, Bunke H, Yamamoto K (eds)pp 188–202

109. Ho TK, Hull JJ, Srihari SN (1992) On multiple classifiersystems for pattern recognition. In: Proc. 11th ICPR, pp84–87, The Hague, Netherlands

110. Duong QP (1989) The combination of forecasts: a rank-ing and subset selection approach. Math Comput Model12(9):1131–1143

111. Kittler J, Hatef M, Duin RPW, Matas J (1998) On com-bining classifiers. IEEE Trans Pattern Anal Mach Intell20(3):226–239

112. Elms AJ, Illingworth J (1994) Combining HMMs for therecognition of noisy printed characters. In: BMVC94.Proc. 5th British Machine Vision Conference, pp 185–194

113. Elms AJ, Illingworth J (1995) Combination of HMMs forthe representation of printed characters in noisy docu-ment images. Image Vision Comput 13(5):385–392

114. Elms AJ, Illingworth J (1994) A hidden Markov modelapproach for degraded and connected character recogni-tion. In: IEE European Workshop on Handwriting Anal-ysis Recognition, pp 8/1–8/7

115. Nadal C, Legault R, Suen CY (1990) Complementaryalgorithms for the recognition of totally unconstrainedhandwritten numerals. In: Proc. 10th ICPR, pp 443–449, Atlantic City, N.J., USA

116. Sharkey AJC (1999) Multi-net systems in combining ar-tif neural nets: ensemble and modular multi-net systems.Perspectives in neural computing. Springer, Berlin Hei-delberg New York

117. Battiti R, Colla AM (1994) Democracy in neuralnets: voting schemes for classification. Neural Networks7(2):691–707

188 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

118. Autret Y, Thepaut A (1997) Neural network coopera-tion for handwritten digit recognition: a comparison offour methods. In: Proc. 5th European Symposium onArtif Neural Networks, pp 7–12, Bruges, Belgium

119. Song W, Xiaoyan X, Yijian J (1996) Multiple expertsrecognition system based on neural network. In: Proc.13th Int Conference on Pattern Recogn pp 452–456

120. Lin X, Wu Y, Ding X (1998) Linear regression basedcombination of neural classifiers. In: Proc. Int. Conf. onNeural Information Processing and Intelligent Informa-tion Systems, pp 608–611, Dunedin, New Zealand

121. Khotanzad A, Chung C (1994) Hand written digit recog-nition using BKS combination of neural network classi-fiers. In: Proc. IEEE Southwest Symposium on ImageAnalysis Interpretation, pp 94–99

122. Huang YS, Suen CY (1993) A knowledge model withdecision making for combination of multiple classifiers.In: 4th Int Conference on Cognitive and Computer Sci-ences for Organizations. Communicating Knowledge inOrganizations. Proc. pp 194–202

123. Ifarraguerri A (1995) Evidence processing with empir-ical belief functions. In: 1995 IEEE Int Conference onSystems Man and Cybernetics. Intelligent Systems forthe 21st Century (Cat. No.95CH3576-7), pp 818–822

124. Rogova G (1994) Combining the results of several neuralnetwork classifiers. Neural Networks 7(5):777–781

125. Sung-Bae C (1999) Pattern recognition with neural net-works combined by genetic algorithm. Fuzzy Sets Syst103:339–347

126. Lee DS, Srihari SN (1995) A theory of classifier combi-nation: The neural network approach. In: Proc. 3rd Int.Conf. on Document Analysis Recognition IEEE Com-puter Society Press, vol. 1, pp 42–45, Montreal, Canada

127. Bebis GN, Georgiopoulos M, Papadourakis GM, Heile-man GL (1992) Increasing classification accuracy us-ing multiple neural network schemes. Proc. SPIE1709(1):221–231

128. Kovacs ZM, Guerrieri R, Baccarani G (1994) Cooper-ative classifiers for high-quality handprinted character-recognition. Biosensors Bioelectron 9(9–10):611–615

129. Kovacs ZM, Guerrieri R, Baccarani G (1994) A hybridsystem for handprinted character recognition. In: 6thItalian Workshop. Neural Nets Wirn Vietri-93, pp 322–327

130. Yildirim T, Marsland JS (1996) An RBF/MLP hy-brid neural network implemented in VLSI hardware. In:Proc. Neural Networks and Their Application, pp 156–160, Marseille, France

131. Sung-Bae C (1997) Combining modular neural net-works developed by evolutionary algorithm. In: Proc.Int. Conf. on Evolutionary Comput pp 647–650, Indi-anapolis, Ind., USA

132. Yuanhui Z, Zhaohui Z, Yuchang L, Chunyi S (1996) Mul-tistrategy learning using genetic algorithms and neuralnetworks for pattern classification. In: Proc. Int. Conf.on Systems, Man, and Cybernetics: Information Intelli-gence and Systems, pp 1686–1689, Beijing, China

133. Seong-Whan L (1996) Off-line recognition of totally un-constrained handwritten numerals using multilayer clus-ter neural network. IEEE Trans Pattern Anal Mach18(6):648–652

134. Sung-Bae C (1996) Recognition of unconstrained hand-written numerals by doubly self-organizing neural net-

work. In: Proc. 13th Int Conference on Pattern Recognpp 426–430

135. Sung-Bae C (1998) Handwritten digit recognition bycombining structure-adaptive self-organizing maps. In:Proc. Int. Conf. on Neural Information Processingand Intelligent Information Systems, pp 1231–1234,Dunedin, New Zealand

136. Scofield CL, Kenton L, Chang JC (1991) Multiple neuralnet architectures for character recognition. In: COMP-CON Spring ’91. Digest of Papers (Cat. No.91CH2961-1), pp 487–491

137. d’Acierno A, De Stefano C, Vento M (1991) A multi-netneural classifier tailored by means of test-set characteri-zation. In: 4th Italian Workshop. Parallel Architecturesand Neural Networks, pp 321–325

138. Cordella LP, Destefano C, Tortorella F, Vento M (1992)Improving character-recognition rate by a multi-netneural classifier. In: 11th IAPR Int. Con. on PatternRecognition pp 615–618, The Hague, Netherlands

139. Vico FJ, Ortega F, Sandoval F (1995) Character-recognition with neural assemblies. In: Lecture Notesin Computer Science. Springer, Berlin Heidelberg NewYork, pp 486–491

140. Bellegarda EJ, Bellegarda JR, Kim JH (1994) Onlinehandwritten character-recognition using parallel neuralnetworks. In: Proc. ICASSP-94, 1994 IEEE Int. Con.on Acoustics, Speech Signal Process, vol. 2, pp 605–608,Adelaide, Australia

141. Wu QZ, Cun YL, Jackel LD, Jeng BS (1993) Onlinerecognition of limited-vocabulary Chinese character us-ing multiple convolutional neural networks. In: 1993IEEE Int. Sym. on Circuits and Systems, pp 2435–2438,Chicago, Ill., USA

142. Loncelle J, Derycke N, Soulie FF (1992) Opticalcharacter-recognition and cooperating neural networkstechniques. In: Aleksander I, Taylor J (eds) Artificialneural networks. Elsevier, New York, pp 1591–1594

143. Wang J, Jean J (1991) Automatic rule generation formachine printed character-recognition using multipleneural networks. In: IEEE Int. Con. on Systems En-gineering, pp 343–346, Fairborn, Ohio, USA

144. Idan Y, Auger JM (1992) Pattern recognition by coop-erating neural networks. Proc. SPIE 1766:437–443

145. Huang YS, Suen CY (1995) A method of combining mul-tiple experts for the recognition of unconstrained hand-written numerals. IEEE Trans Pattern Anal Mach Intell17(1):90–94

146. Huang YS, Suen CY (1993) A knowledge model withdecision making for combination of multiple classifiers.In: 4th Int Conference on Cognitive and Computer Sci-ences for Organizations. Communicating Knowledge inOrganizations. Proc. pp 194–202

147. Huang YS, Suen CY (1993) The behavior-knowledgespace method for combination of multiple classifiers. In:Proc. IEEE Computer Soc. Conf. on Computer Visionand Pattern Recognition (CVPR 93), pp 347–352, N.Y.,USA

148. Rahman AFR, Fairhurst MC (1997) Exploiting secondorder information to design a novel multiple expert de-cision combination platform for pattern classification.Electron Lett 33(6):476–477

149. Cordella LP, Foggia P, Sansone C, Tortorella F, VentoM (1997) Classification reliability and its use in multi-

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 189

classifier systems. In: Proc. 9th Int. Conf. on ImageAnalysis Process ICIAP’97, pp 46–53, Florence, Italy

150. Gorski M (1997) Information-based combination of mul-tiple classifiers. In: Proc. 4th Int. Conf. on PatternRecognition and Information Processing, pp 81–86,Minsk, Byelorussia

151. Hull JJ, Commike AA, Ho TK (1990) Multiple algo-rithms for handwritten character recognition. In: Proc.1st Int. Workshop on Frontiers in Handwriting Recog-nition pp 117–124, Montreal, Canada

152. Rahman AFR, Fairhurst MC (1998) A novel confidence-based framework for multiple expert decision fusion. In:Proc. 9th British Mach Vision Conference, September14–17, Southampton, UK, vol. 2, pp 205–213

153. Rahman AFR, Fairhurst MC (1999) A new multiple ex-pert framework for decision fusion. In: Proc. 9th Int.Graphonomics Society Conference (IGS’99), pp 161–166, Singapore

154. Huang YS, Suen CY (1993) Combination of multipleclassifiers with measurement values. In: Proc. 2nd Int.Conference on Document Analysis Recognition (Cat.No.93TH0578-5), pp 598–601

155. Lin X, Ding X, Chen M, Zhang R, Wu Y (1998) Adap-tive confidence transform based classifier combinationfor Chinese character recognition. Pattern Recogn Lett19:975–988

156. Guo H, Ding X, Guo F, Wu Y (1997) New methodof combining multiple classifiers for Chinese charac-ter recognition. J Tsinghua University. Sci Technol37(10):91–444

157. Vuurpijl L, Schomaker L (1998) A framework for usingmultiple classifiers in a multiple-agent architecture. In:Proc. 3rd European Workshop on Handwriting AnalysisRecognition pp 8/1–8/6, Brussels, Belgium

158. Kawatani H, Shimizu H (1997) Complementary classifierdesign using difference principal components. In: Proc.4th Int. Conference on Document Analysis RecognitionICDAR97, vol. 2, pp 875–880, Ulm, Germany

159. Ho TK (1993) Recognition of handwritten digits by com-bining independent learning vector quantizations. In:Proc. 2nd Int. Conf. on Document Analysis Recognitionpp 818–821, Tsukuba Science City, Japan

160. Jones MA, Story GA, Ballard GW (1991) Integratingmultiple knowledge sources in a Bayesian OCR post-processor. In: Proc. 1st Int. Conf. on Document AnalysisRecognition pp 925–933, Saint-Malo, France

161. Srihari SN, Hull JJ, Choudhari R (1983) Integrating di-verse knowledge sources in text recognition. ACM TransOffice Inf Syst 1(1):68–87

162. Huang JS, Chuang K (1986) Heuristic approach to hand-written numeral recognition. Pattern Recogn 19(1):15–19

163. Prevost L, Milgram M (1997) Static and dynamic clas-sifier fusion for character recognition. In: Proc. 4thInt. Conference on Document Analysis Recognition IC-DAR97, vol. 2, pp 499–506, Ulm, Germany

164. Puuronen S, Terziyan V (1997) The voting-type tech-nique in the refinement of multiple expert knowledge.In: Proc. 13th Hawaii Int Conference on System Sci-ences, pp 287–296, Hawaii, USA

165. Sung-Bae C (1996) Genetic combining multiple neuralnetworks for handwritten numeral recognition. In: Proc.Int. Conf. on Soft Computing: Methodologies for the

Conception, Design, and Application of Intelligent Sys-tems, pp 774–777, Fukuoka, Japan

166. Yi L, Yamaoka F (1997) Fuzzy integration of classifica-tion results. Pattern Recogn 30(11):1877–9111

167. Mitra S, Kim YS (1994) Neuro-fuzzy models in patternrecognition. In: SPIE 2304:344–364, Orlando, Fla., USA

168. Duerr B, Haettich W, Tropf H, Winkler G (1980) Acombination of statistical and syntactical pattern recog-nition applied to classification of unconstrained hand-written numerals. Pattern Recogn 12:189–199

169. Suen CY, Nadal C, Legault R, Mai TA, Lam L (1992)Computer recognition of unconstrained handwritten nu-merals. Proc. IEEE 80(7):1162–1180

170. Anisimovich K, Rybkin V, Shamis A, Tereshchenko V(1997) Using combination of structural, feature andraster classifiers for recognition of handprinted charac-ters. In: Proc. 4th Int. Conference on Document Anal-ysis Recognition ICDAR97, vol. 2, pp 881–885, Ulm,Germany

171. Jonghyun P, Sung-Bae C, Kwanyong L, Yillbyung L(1996) Multiple recognizers system using two-stage com-bination. In: Proc. the 13th Int Conference on PatternRecognition pp 581–585

172. Zhou J, Gan Q, Suen CY (1997) A high performancehand-printed numeral recognition system with verifica-tion module. In: Proc. 4th Int. Conference on Docu-ment Analysis Recognition ICDAR97, vol. 1, pp 293–297, Ulm, Germany

173. Fairhurst MC, Rahman AFR (1996) A new multi-expertarchitecture for high performance object recognition. In:Proc. Int. Symp. on Intelligent Systems and AdvancedManufacturing, Machine Vision Applications, Architec-tures, and Systems Integration V, SPIE 2908:140–151

174. Rahman AFR, Fairhurst MC (1997) Comparative studyof two different multiple expert architectures for robustobject recognition. In: Proc. Int. Symp. on IntelligentSystems and Automated Manufacturing, Machine Vi-sion Applications, Architectures, and Systems Integra-tion VI, SPIE 3205:187–196, Pittsburgh, Pa., USA

175. Rahman AFR, Fairhurst MC (1997) A new hybrid ap-proach in combining multiple experts to recognise hand-written numerals. Pattern Recogn Lett 18(8):781–790

176. Rahman AFR, Fairhurst MC (1996) A new approachto handwritten character recognition using multiple ex-perts. In: Proc. 5th Int. Workshop on Frontiers of Hand-writing Recognition pp 283–286, University of Essex,UK

177. Rahman AFR, Fairhurst MC (1997) A new approachto handwritten character recognition using multiple ex-perts. In: Downton AC, Impedovo S (eds) Progress inhandwriting recognition. pp 321–325. World Scientific,Singapore

178. Rahman AFR, Fairhurst MC (1996) Recognition ofhandwritten characters with a multi-expert system. In:Proc. 2nd IEE European Workshop on HandwritingAnalysis Recognition. Reference No. 1996/165, IEE,London

179. Kwon JO, Sin B, Kim JH (1997) Recognition of on-line cursive Korean characters combining statistical andstructural methods. Pattern Recogn 30(8):1255–1263

180. Tung CH, Lee JH (1993) 2-stage character recognitionby detection and correction of erroneously-identifiedcharacters. In: Proc. 2nd Int Conference on Document

190 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

Analysis Recognition (Cat. No. 93TH0578-5), pp 834–837

181. Zhou J, Pavlidis T (1994) Discrimination of charactersby a multi-stage recognition process. Pattern Recogn27(11):1539–1549

182. Errico GD, Dimauro G, Impedovo S, Pirlo G (1996) Anew multi-expert system for handwritten digit recogni-tion. In: Proc. 5th Int. Workshop on Frontiers of Hand-writing Recognition pp 341–344

183. Rahman AFR, Fairhurst MC (1997) High performanceshape recognition: a novel multiple expert recogniserconfiguration. In: Proc. IEE Colloquium on IndustrialInspection. Reference No. 1997/041, IEE, London

184. Mao J, Mohiuddin K, Fujisaki T (1995) A two-stagemulti-network OCR system with a soft pre-classifier anda network selector. In: Proc. 3rd Int. Conf. on Docu-ment Analysis Recognition pp 78–81, Montreal, Quebec,Canada

185. Wu Y, Ding X, Zhao M, Guo H, Guo F (1996) Study onhigh performance OCR system implemented by hybridneural networks. In: Proc. Int. Conf. on Neural Informa-tion Processing: Progress in Neural Information Processpp 410–414, Hong Kong

186. Ha TM, Zimmermann M, Bunke H (1998) Off-linehandwritten numeral string recognition by combin-ing segmentation-based and segmentation-free methods.Pattern Recogn 31(3):257–272

187. Sung-Bae C, Kim JH (1992) Recognition of large-setprinted Hangul Korean script by two-stage backprop-agation neural classifier. Pattern Recogn 25(11):1353–1360

188. Corwin E, Greni S, Logar A, Whitehead K, Welch R(1994) A multi-stage neural network classifier. In: WorldCongress on Neural Networks-San Diego. 1994 Int Neu-ral Network Society Annual Meeting, pp III/198–III/203

189. Bunke H, Henderson TC, Baird H, Cristobal G, Har-alick RM, Kittler J, Ressler S, Sanfeliu A, SiromoneyR, Subramanian KG (1988) Working Group-C – Hybridtechniques in syntactic and structural pattern recogni-tion, pp 453–456. Springer, Berlin Heidelberg New York

190. Hull JJ, Srihari SN, Cohen E, Kaun CL, Cullen P,Palumbo P (1988) A blackboard-based approach tohandwritten zip code recognition. In: Proc. US PostalService Adv. Tech. Conf., pp 1018–1032

191. Ahmed P, Suen CY (1987) Computer recognition of to-tally unconstrained handwritten zip codes. Int J PatternRecognit Artif Intell 1(1):1–15

192. Kuan CL, Srihari SN (1988) A stroke-based approach tohandwritten numeral recognition. In: Proc. US PostalService Adv. Tech. Conf., pp 1033–1041

193. Reddy NVS, Nagabhushan P (1997) A multi-stage neu-ral network model for unconstrained handwritten nu-meral recognition. Vivek 10(1):3–11

194. Reddy NVS, Nagabhushan P (1998) A connection-ist expert system model for conflict resolution in un-constrained handwritten numeral recognition. PatternRecogn Lett 19(1):161–169

195. Price D, Knerr S, Personnaz L, Dreyfus G (1995) Pair-wise neural network classifiers with probabilistic out-puts. In: Adv Neural Inf Process Syst 7:1109–1116

196. Kovacs-V ZsM (1995) A novel architecture for high qual-ity hand-printed character recognition. Pattern Recogn28(11):1685–1692

197. Cao J, Ahmadi M, Shridhar M (1995) Recognition ofhandwritten numerals with multiple feature and multi-stage classifier. Pattern Recogn 28(1):153–160

198. Cao J, Ahmadi M, Shridhar M (1994) Handwritten nu-meral recognition with multiple features and multistageclassifiers. In: 1994 IEEE Int Symposium on Circuitsand Systems (Cat. No.94CH3435-5), pp 323–326

199. Liu K, Huang YS, Suen CY (1994) Image classifica-tion by classifier combining technique. In: Conf. on Neu-ral and Stochastic Methods in Image Signal ProcessingIII: SPIE 2304:210–217. SPIE-INT, Bellingham, Wash.,USA

200. Rahman AFR, Fairhurst MC (1997) A novel pair-wiserecognition scheme for handwritten characters in theframe-work of a multi-expert configuration. In: DelBimbo A (ed) Lecture Notes in Computer Science, vol.1311. Proc. 9th Int. Conf. on Image Analysis Processvol. 2, pp 624–631, Florence, Italy. Springer, Berlin Hei-dleberg New York

201. Wang J, Jean J (1993) Resolving multifont charac-ter confusion with neural networks. Pattern Recogn26(1):175–187

202. Jiangying Z, Pavlidis T (1993) Disambiguation of char-acters by a second stage classifier. Proc. SPIE 1906:166–171

203. Teo RY, Shinghal R (1997) A hybrid classifier for recog-nizing handwritten numerals. In: Proc. 4th Int. Confer-ence on Document Analysis Recognition ICDAR97, vol.1, pp 283–287, Ulm, Germany

204. Lin X, Ding X, Wu Y (1997) Handwritten numeralrecognition using MFNN Based multiexpert combina-tion strategy. In: Proc. 4th Int. Conference on Docu-ment Analysis Recognition ICDAR97, vol. 2, pp 471–474, Ulm, Germany

205. Cho JW, Lee SY, Park CH (1995) Online handwrittencharacter-recognition by a hybrid method based on neu-ral networks and pattern-matching. In: Mira J, SandovalF (eds) Lecture Notes in Computer Sci, pp 926–933.Springer, Berlin Heidelberg New York

206. Kovacs ZM, Guerrieri R, Baccarani G (1993) Cooper-ative classifiers for high-quality handprinted character-recognition. In: Proc. WCNN’93 – PORTLAND, WorldCongress on Neural Networks, pp 186–189, Portland,Ore., USA

207. Cho SB, Kim JH (1992) A hybrid method of hiddenMarkov model and neural network classifier for on-line handwritten character-recognition. In: AleksanderI, Taylor J (eds) Artificial neural networks, pp 741–744.Elsevier, New York

208. Cho SB, Kim JH (1990) Hierarchically structured neuralnetworks for printed Hangul character-recognition. In:IJCNN Int. Joint Conf. on Neural Networks, pp 265–270, San Diego, Calif., USA

209. Vlontzos JA, Kung SY (1988) A hierarchical system forcharacter-recognition with stochastic knowledge repre-sentation. In: IEEE Int. Conf. on Neural Networks, pp601–608, San Diego, Calif., USA

210. Huang YS, Suen SY (1994) A method of combining mul-tiple classifiers – a neural-network approach. In: Proc.12th IAPR Int. Conf. on Pattern Recognition Conf. B:Pattern Recognition and Neural Networks, vol. 2, pp473–475, Jerusalem, Israel

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 191

211. Jun C, Tong L, Shan F (1996) Multiple neural networksparadigm for function modeling in expert system. In:Proc. Critical Technology: Proc. 3rd World Congress onExpert Systems, pp 1377–1384, Seoul, South Korea

212. Hiang PCK, Erdogan SS, Ng GS (1996) Fusion of neuralnetwork experts. In: Proc. Int. Conf. on Neural Informa-tion Processing, pp 204–209, Hong Kong

213. Khotanza A, Chung C (1998) Handwritten digit recog-nition using combination of neural network classifiers.In: Proc. Southwest Symposium on Image Analysis In-terpretation, pp 168–173, Tucson, Ariz., USA

214. Pawlicki T (1988) A neural network architecture for ev-idence combination. Proc. SPIE 931:149–153

215. Jouny I, Sheridan M (1992) Character-recognition usinga multistage neural network. In: Sadjadi FA (ed) Au-tomatic object recognition II, pp 73–82. Elsevier, NewYork

216. Houle G, Eom KB (1992) Use of a priori knowledge forcharacter recognition. Proc. SPIE 1661:146–156

217. Mandalia AD, Pandya AS, Sudhakar R (1992) A hybridapproach to recognize handwritten alphanumeric char-acters. In: 1992 IEEE Int Conference on Systems, Man,and Cybernetics (Cat. No.92CH3176-5), pp 723–726

218. Benediktsson JA, Sveinsson JA, Ersoy OK (1996)Optimized combination of neural networks. In: 1996IEEE Int Symposium on Circuits and Systems. Circuitsand Systems Connecting the World, ISCAS 96 (Cat.No.96CH35876), pp 535–538

219. Alexandre F, Guyot F (1995) Neurobiological inspira-tion for the architecture and functioning of cooperatingneural networks. In: From Natural to Artif Neural Com-putation. Int Workshop on Artif Neural Networks. Proc.pp 24–30

220. Xiaoyan Z, Song W (1995) Multiple neural networksmodel and its application in pattern recognition. In:Proc. Int Conference on Neural Information Processing(ICONIP ‘95), pp 966–969

221. Rahman AFR, Fairhurst MC (1999) Clifford Network:Anovel application in character recognition. In: Proc. Int.Conf. on Advances in Intelligent Data Analysis (AIDA),pp 248–253, Rochester, N.Y., USA

222. Wang M, Zhong Y, Sheng L (1995) Handwritten nu-meral classification using cascaded multi-layer neuralnetworks. In: Proc. Int Conference on Neural Informa-tion Processing (ICONIP ‘95), pp 943–946

223. Sung-Bae C, Kim JH (1992) A two-stage classificationscheme with backpropagation neural network classifiers.Pattern Recogn Lett 13(5):309–313

224. Hung CA, Lin SF (1993) An artmap based hybridneural-network for shift-invariant Chinese character-recognition. In: IEEE Int. Con. on Neural Networks, pp1600–1605, IEEE Service Center, Piscataway, N.J., USA

225. Wang J, Jean J (1993) Multiresolution neural networksfor omnifont character-recognition. In: IEEE Int. Con.on Neural Networks, pp 1588–1593, IEEE Service Cen-ter, Piscataway, N.J., USA

226. Rice JM (1991) A novel character-recognition systemusing a contextual feedback connectionist module toenhance system performance. In: Artificial neural net-works, pp 1095–1098. Elsevier, New York

227. Pawlicki T (1988) A neural network architecture for ev-idence combination. In: Conf. on Sensor Fusion, pp 149–153. Orlando, Fla., USA

228. Lijia S, Franklin S (1993) ANN-TREE: a hybrid methodfor pattern recognition. Proc. SPIE 1965:358–363

229. Sethi K, Yoo JH (1997) Structure-driven induction ofdecision tree classifiers through neural learning. PatternRecogn 30(11):1893–1904

230. Perrone MP, Intrator N (1992) Unsupervised splittingrules for neural tree classifiers. In: IJCNN Int. JointConf. on Neural Networks, pp 820–825

231. Perrone MP (1992) A soft-competitive splitting rule foradaptive tree-structured neural networks. In: IJCNNInt. Joint Conf. on Neural Networks, pp 689–693

232. Perrone MP (1991) A novel recursive partitioning crite-rion. In: IJCNN Int. Joint Conf. on Neural Networks, p989

233. Aggarwal N, Lereno M (1996) Combining decision treeclassifiers and neural networks in group technology. In:Proc. 14th Int Conference on Applied Informatics, pp405–408, Innsbruck, Austria

234. Shlien S (1990) Multiple binary decision tree classifiers.Pattern Recogn 23(7):757–763

235. Shinozawa Y, Okoma S (1997) The construction of deci-sion tree for rough classification in handwritten charac-ter recognition. Trans. Information Processing Societyof Japan, 38(12):2479–2489

236. Kurzynski MW (1989) On the identity of optimalstrategies for multistage classifiers. Pattern Recogn Lett10(1):39–46

237. Oxenham MG, Kewley DJ, Nelson MJ (1996) Perfor-mance assessment of data fusion systems. In: Proc. 1stConf. Australian Data Fusion Symposium, pp 36–41,Adelaide, Australia

238. Bollacker KD, Ghosh J (1997) Knowledge reuse in multi-ple classfier systems. Pattern Recogn Lett 18:1385–1390

239. Suen CY, Guo J, Li ZC (1994) Analysis recognition ofalphanumeric handprints by parts. IEEE Trans SystemsMan Cybern 24(2):614–631

240. Tang YY, Qu YZ, Suen CY (1991) Multiple-level in-formation source and entropy-reduced transformationmodels. Pattern Recogn 24(2):341–357

241. Wang QR, Suen CY(1984) Analysis design of a deci-sion tree based on entropy reduction and its applicationto large character set recognition. IEEE Trans PatternAnal Mach Intell 6(2):406–414

242. Jordan MI, Xu L (1995) Convergence results for the EMapproach to mixtures of experts architectures. NeuralNetworks 8(9):1409–1431

243. Jordan MI, Jacobs RA (1994) Hierarchical mixtures ofexperts and the EM algorithm. Neural Comput 6:181–214

244. Dempster AP, Laird NM, Rubin DB (1977) Maximumlikelihood from incomplete data via the EM algorithm.J Royal Stat Soc 39:1–38

245. Waterhouse SR, Robinson AJ (1994) Classification us-ing hierarchical mixtures of experts. In: Proc. IEEEWorkshop on Neural Networks for Signal Processing IV,pp 177–186

246. Pudil P, Novovi J, Blaha S (1992) Multistage patternrecognition with reject option. In: Proc. 11th IAPRICPR, pp 92–95

247. Tumer K, Ghosh J (1995) Limits to performance gainsin combined neural classifiers. In: Proc. Int. Conf. onIntelligent Engineering Systems Through Artif NeuralNetworks, pp 419–424, St. Louis, Mo., USA

192 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

248. Foggia P, Sansone C, Tortorella F, Vinto M (1999) Mul-ticlassification: reject criteria for the Bayesian combiner.Pattern Recogn 32:1435–1447

249. Bubnicki Z (1982) Pattern recognition systems and theirapplications. In: Proc. 2nd Int. Conf. on Systems Engi-neering, pp 1–8. Coventry (Lanchester) Polytechnic

250. Jozefczyk J (1986) Determination of optimal recognitionalgorithms in the two-level system. Pattern Recogn Lett4(6):413–420

251. Carvalho A, Fairhurst MC, Bisset DL (1993) Image clas-sification using integrated GSN networks. In: Proc. 13thIEE Pt. I, Communications, Speech Vision vol. 140, pp12–19

252. Rahman AFR, Fairhurst MC (1997) Selective parti-tion algorithm for finding regions of maximum pair-wise dissimilarity among statistical class models. Pat-tern Recogn Lett 18(7):605–611, July

253. Argentiero P, Chin Y, Beaudet P (1982) An automatedapproach to the design of decision tree classifiers. IEEETrans Pattern Anal Mach Intell 4(1):51–57

254. Kim J, Seo K, Chung K (1997) A systematic approach toclassifier selection on combining multiple classifiers forhandwritten digit recognition. In: Proc. 4th Int. Confer-ence on Document Analysis Recognition ICDAR97, vol.2, pp 459–462, Ulm, Germany

255. Dar-Shyang L, Srihari SN (1995) Dynamic classifiercombination using neural network. Proc. SPIE 2422:26–37

256. Rahman ARF, Fairhurst MC (1999) Automatic self-configuration of a novel multiple-expert classifier usinga genetic algorithm. In: Proc. Int. Conf. on Image Pro-cessing and Appl (IPA’99), vol. 1, pp 57–61

257. Hashem S (1997) Algorithms for optimal linear combina-tions of neural networks. In: IEEE Int. Conf. on NeuralNetworks, pp 242–7

258. Hashem S (1997) Optimal linear combinations of neuralnetworks. Neural Networks 10(2):599–614

259. Hashem S, Schmeiser B (1995) Improving model accu-racy using optimal Linear Combinationa of trained neu-ral networks. IEEE Trans Neural Networks 6(3):792–795

260. Hashem S (1996) Effects of collinearity on combiningneural networks. Connection Sci 8(3–4):315–36

261. Rahman AFR, Fairhurst MC (1998) Towards a theoret-ical framework for multilayer decision fusion. In: Proc.3rd IEE European Workshop on Handwriting Analy-sis Recognition July 14–15, Reference No. 1998/440, pp7/1–7/7, Brussels, Belgium

262. Rahman AFR, Fairhurst MC (1999) Serial combinationof multiple experts: A unified evaluation. Pattern AnalAppl 2:292–311

263. Wesolkowski S, Hassanein K (1997) A comparativestudy of combination schemes for an ensemble of digitrecognition neural networks. In: Proc. Int. Conf. on Sys-tems, Man, and Cybernetics: Computational Cybernet-ics and Simulation, pp 3534–3539, Orlando, Fla., USA

264. Matsui T, YamashitaI, Wakahara T (1994) The resultsof the first IPTP character recognition competition andstudies on multi-expert recognition for handwritten nu-merals. IEICE Trans Inf Syst E77-D(7): 801–809

265. Tsutsumida T, Matsui T, Noumi T, Wakahara T (1996)Results of IPTP Character Recognition Competitionsand studies on multi-expert system for handprinted nu-meral recognition. IEICE Trans Inf Syst E79-D(5):429–435

266. Sabourin M, Mitiche A, Thomas D, Nagy D (1993) Clas-sifier combination for hand-printed digit recognition. In:Proc. 2nd Int Conference on Document Analysis Recog-nition (Cat. No. 93TH0578-5), pp 163–166

267. Gascuel O, Bouchon-Meunier B, Caraux G, GallinariP, Gunoche A, Guermeur Y, Lechevallier Y, MarsalaC, Miclet L, Nicolas J, Nock R, Ramdani M, Sebag M,Tallur B, Venturini G, Vitte P (1998) Twelve numerical,symbolic and hybrid supervised classification methods.Int J Pattern Recogn Artif Intell 12(5):517–571

268. Verikas A, Lipnickas A, Malmqvist K, Bacauskiene M,Gelzinis A (1999) Soft combination of neural classifier:a comparative study. Pattern Recogn Lett 20:429–444

269. Auda G, Kamel M (1978) Modular neural network clas-sifiers: a comparative study. J Intell Robotic Syst: The-ory Appl 21(1):117–129

270. Alimouglu F, Alpaydin E (1997) Combining multiplerepresentations and classifiers for pen-based handwrit-ten digit recognition. In: Proc. 4th Int. Conference onDocument Analysis Recognition ICDAR97, vol. 2, pp637–640, Ulm, Germany

271. Chen YS, Li DH (1994) A hybrid optical correlator forcharacter-recognition. In: 4th Conference on Hybrid Im-age Signal Processing: SPIE 2238:42–47, Orlando, Fla.,USA

272. Cheng HD, Xia DC (1996) A novel parallel approachto character recognition and its VLSI implementation.Pattern Recogn 29(1):97–119

273. Rahman AFR, Fairhurst MC (1997) A comparativestudy of decision combination strategies for a novelmultiple-expert classifier. In: Proc. 6th Int. Conferenceon Image Processing and Its Appl, vol. 1, pp 131–135,Dublin, Ireland

274. Rahman AFR, Fairhurst MC (1999) A comparativestudy of some multiple expert recognition strategies. In:Proc. IEE Col. on Document Image Processing and Mul-timedia (DIPM’99), vol. 99/041, pp 10/1–10/4, London,UK

275. Rahman AFR, Fairhurst MC (1998) Mach-printed char-acter recognition revisited: Re-application of recent ad-vances in handwritten character recognition research.Special Issue on Document Image Processing and Mul-timedia Environments. Image Vision Comput 16(12–13):819–842

276. Fairhurst MC, Mattaso Maia MAG (1986) Performancecomparison in hierarchical architectures for memory net-work pattern classifiers. Pattern Recogn Lett 4(1):121–124

277. Maruyama M, Nakahira H, Fukuda M, Sakiyama S,Kouda T, Imagaw T, Maruno S (1996) A selfconfig-urable digital neuro chip addressing to multi-networkarchitecture. In: 1996 Symposium on VLSI Circuits. Di-gest of Technical Papers (IEEE Cat. No.96CH35943),pp 38–39

278. Martonfi Z, Szigeti Z (1995) XILINX based hardware forpicture processing and character recognition purposes.Periodica Polytechnica Electr Eng 39(1):103–114

279. Rahman AFR, Fairhurst MC, Lee P (1998) Design con-siderations in real-time implementation of multiple ex-pert image classifiers within a modular and flexiblemultiple-platform design environment. Special Issue onReal-Time Visual Monitoring and Inspection. Real TimeImaging 4:361–376

A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review 193

280. Rahman AFR, Fairhurst MC (1999) A study of somemulti-expert recognition strategies for industrial appli-cations: Issues of processing speed and implementability.In: Proc. 12th Int. Conf. on Vision Interfaces (VI’99), pp569–574, Quebec, Canada

281. Chen JG, Ansari N (1998) Adaptive fusion of correlatedlocal decisions. IEEE Trans Syst Man Cybern Part C,Appl Rev 28(1):276–281

282. Shafer G, Logan R (1987) Implementing Dempster’s rulefor hierarchical evidence. Artif Intell 33:271–298

283. Frank J, Mandler E (1992) In: Proc. 11th ICPR, pp Acomparison of two approaches for combining the votes ofcooperating classifiers611–614, The Hague, Netherlands

284. Ng GS, Singh H (1998) Data equalisation with evidencecombination for pattern recognition. Pattern RecognLett 19:227–235

285. Denoeux T (1995) A k-nearest neighbour classificationrule based on Dempster-Shafer theory. IEEE Trans Pat-tern Anal Mach Intell 25(5):804–813

286. Bloch I (1996) Some aspects of Dempster-Shafer evi-dence theory for classification of multi-modality med-ical images taking partial volume effect into account.Pattern Recogn Lett 17:905–919

287. Yi L (1993) Evidential reasoning in a multiple classifiersystem. In: Industrial and Engineering Appl of Artif In-telligence and Expert Systems. IEA/AIE 93. Proc. 6thInt Conference, pp 476–479

288. Lu Y(1994) Integration of knowledge in a multiple clas-sifier system. In: Industrial and Engineering Applicationof Artificial Intelligence and Expert Systems. Proc. 7thInt Conference, pp 557–564

289. Yen J (1990) Generalizing the Dempster-Schafer theoryto fuzzy sets. IEEE Trans Syst Man Cybern 20(3):559–570

290. Ruspini EH, Lowrance JD, Strat TM (1992) Under-standing evidential reasoning. Int J Approx Reason6(3):401–424

291. Deutsch-McLeish M (1991) A study of probabilities andbelief functions under conflicting evidence:comparisonsand new methods. In: Uncertainty in Knowledge Bases.3rd Int. Conference on Information Processing and Man-agement of Uncertainty in Knowledge-Based SystemsIPMU ’90, pp 41–49

292. Spillman R (1990) Managing uncertainty with belieffunctions. AI Expert 5(5):44–49

293. Tchamova A (1996) Evidence reasoning logic and fuzzyset theory: combined method for resolving Dempster’srule indefiniteness in a case of conflicting evidence. In:Proc. 4th European Congress on Intelligent Techniquesand Soft Computing Proceedings, EUFIT ’96, pp 135–138, Aachen, Germany

294. Haralick RM (1992) Contextual decision making withdegrees of belief. In: Proc. 11th IAPR ICPR, pp 105–111

295. Shijing Z, Xianjia W, Ting C (1997) An approach forintegrating quantitative decision model with qualitativejudgment. J Systems Eng Electron 8(1):45–52

296. Norton SW, Hirsh H (1992) Classifier learning fromnoisy data as probabilistic evidence combination. In:10th National Conf. on Artif Intell pp 141–146. AAAI-92, San Jose, Calif., USA

297. Mogre AM, McLaren RW, Keller JM (1989) A techniqueto compensate for disparate sources in evidence combi-

nation. In: IEEE Int. Conf. on Systems, Man, and Cy-bernetics, pp 1022–1023. Decision-making in Large-scaleSystems. Cambridge, Mass., USA

298. Legault R, Suen CY, Nadal C (1990) Classification ofconfusing handwritten numerals by human subjects. In:Proc. IWFHR, pp 181–191, Montreal, Canada

299. Legault R, Suen CY, Nadal C (1992) Difficult cases inhandwritten numeral recognition. In: Structured docu-ment image analysis, pp 235–249, Springer, Berlin Hei-delberg New York

300. Wang ZJ, Feng S (1993) The research on multipleexpert-systems integrated decision-support systems forsolving complex problems. In: Liu B, Chen T, Zheng YP(eds) Large Scale Systems: Theory and Applications, pp241–246. Pergamon, New York

301. Edwards JS, Barrett AR, Moores TT (1994) Expert-systems in conventional software engineering – the prob-lems of multiple experts. In: Liebowitz J (ed) MovingToward Expert Systems Globally in the 21st Century,pp 753–759. Scholium

302. Barlow RE, Mensing RW, Smiriga NG (1986) Combi-nation of experts opinions based on decision-theory. In:Basu AP (eds) Reliability and Quality Control, pp 9–19.Elsevier, New York

303. Suen CY, Huang YS, Bloch A (1994) Multiple expert-systems and multiexpert systems. In: Liebowitz J (ed)Moving Toward Expert Systems Globally in the 21stCentury, pp 207–212. Scholium

304. Chatterji BN (1984) A combined fuzzy set theoreticand heuristic method for character-recognition. In: Int.Symp. on Fuzzy Information, Knowledge Representa-tion and Decision Analysis of Federation, pp 187–190,Marseilles, France

305. Srihari SN (1982) Reliability-analysis of majority votesystems. Inf Sci 26(3):243–256

306. Srihari SN (1982) Reliability-analysis of biasedmajority-vote systems. IEEE Trans Reliability31(1):117–118

307. Yagar RR (1997) A general approach to the fusion ofimprecise information. Int J Intell Syst 12(1):1–29

308. Kwoh C-K, Gillies DF (1998) Probabilistic reasoningand multiple-expert methodology for correlated objec-tive data. Artif Intell Eng 12(1–2):21–33

309. Holsapple CW, Lee A, Otto J (1997) A machine learn-ing method for multi-expert decision support. Ann OperRes 75:171–188

310. Kramosil I (1988) Expert systems with non-numericalbelief functions. Prob Control Inf Theory 17(5):285–295

311. Finnie GR (1991) Knowledge-based selection and com-bination of forecasting methods. South Afr Comput J(2):55–63

312. Loo-Nin T, Ah-Hwee T (1995) Adaptive integrationof multiple experts. In: 1995 IEEE Int Conference onNeural Networks Proceedings (Cat No. 95CH35828), pp1215–1220

313. De Mathelin M, Perneel C (1993) Knowledge combina-tion and decision algorithms for expert-systems based onfuzzy-logic in Applications of Fuzzy Logic Technology,vol. 2061, pp 270–281. SPIE-INT, Bellingham, USA

314. Cao J, Shridhar M, Ahmadi M (1995) Fusion of clas-sifiers with fuzzy integrals. In: Proc. 3rd Int. Conf. onDocument Analysis Recognition pp 108–111, Montreal,Quebec, Canada

194 A.F.R. Rahman, M.C. Fairhurst: Multiple classifier decision combination strategies for character recognition: A review

315. Cao J, Shridhar M, Ahmadi M (1994) Handwritten nu-merical recognition with neural networks and informa-tion fusion. In: Proc. the 37th Midwest Symposium onCircuits and Systems (Cat. No. 94CH35731), pp 569–572

316. Pattanaik PK (1978) Strategy and group choice. North-Holland, Amsterdam

317. Black D (1958) The theory of committees and election.Cambridge University, Cambridge

318. Farquharson R (1969) Theory of voting. Yale University,USA

319. Pattanaik PK (1971) Voting and collective choice. Cam-bridge University, London, UK

320. Black D, Newing RA (1951) Committee decisions withcomplementary valuation. William Hodge, London

321. Peleg B (1984) Game theoretic analysis of voting in com-mittees. Cambridge University Press, Cambridge

322. Cantor (1980) Vorlesungen uber Geschichte der Mathe-matik. Op. cit. Vol. IV, pp 257

323. de Borda JC (1781) Memoire sur les Elections auScrutin. Memoire de l’Academie Royale des Sciences.(English translation by A. de Grazia, Isis, 44, 1953).Imprimerie Royale, Paris, France

324. Pearson K (1929) Laplace, being extracts from lectures.Biometrika, xxi

325. Nanson EJ (1907) Methods of election in British Gov-ernment blue book, Misc. No. 3, Cd. 3501

326. One vote, one value. Nature, Vol. LXXV, 1907, pp 414.See also: Vox populi, ibid. pp 450; The ballot box, ibid.pp 509; Memories of my life, ibid. pp 278–83

327. Dodgson CL (1873) A discussion of the various methodsof procedure in conducting elections. Preface dated 18Dec. pp 15, Princeton University Library

328. Fishburn PC (1972) The theory of social choice. Prince-ton University, Princeton

329. Goodman LA, Markowitz H (1952) Social welfare func-tions based on individual rankings. Am J Sociol 58:257–262

330. Knerr S, Anisimov Y, Baret O, Gorski N, Price D, Si-mon JC (1997) The A2iA Intercheque System: courtesyamount and legal amount recognition for French checks.Int J Pattern Recognit Artif Intell 11(2):505–548

331. Leroux M, Lethelier E, Gilloux M, Lemarie B (1997)Automatic reading of handwritten amounts on Frenchchecks. Int J Pattern Recognit Artif Intell 11(2):619–638

332. Lee LL, Lizarraga MG, Gomes NR, Koerich AL (1997)A prototype for Brazilian bankcheck recognition. Int JPattern Recognit Artif Intell 11(2):549–569

333. Paik J, Jung S, Lee Y (1993) Multiple combined recog-nition system for automatic processing of credit cardslip applications. In: Proc. 2nd Int Conference on Doc-ument Analysis Recognition (Cat. No. 93TH0578-5), pp520–523

334. Anzai Y, Mori H, Ito M, Hayashi Y (1987) A serial-parallel integrated information-processing model forcomplex human problem solving. In: Cognitive engineer-ing in the design of human-computer interaction andexpert systems. Proc. 2nd Int. Conference on Human-Computer Interaction. Vol. II, pp 175–182

335. Choi HS, Park KH (1997) Shop-floor scheduling at ship-building yards using the multiple intelligent agent sys-tem. Intell Manufact 8(6):505–515

336. Huang K, Wu J, Yan H (1997) Off-line writer ver-ification utilizing multiple neural networks. Opt Eng36(11):3127–3133

337. Terziyan V, Puuronen S, Kovalainen M (1997) Decisionsupport system for telemedicine based on multiple ex-pertise. In: Proc. 10th IEEE Symposium on Computer-Based Medical Systems, pp 8–13, Maribor, Slovenia

338. Ellis R, Simpson R, Culverhouse PF, Parisini T,Williams R, Reguera B, Moore B, Lowe D (1994) Expertvisual classification and neural networks: can general so-lutions be found? In: OCEANS 94. Oceans Engineeringfor Today’s Technology and Tomorrow’s Preservation.Proc. (Cat. No. 94CH3472-8), pp I/330–I/334

339. Gader PD, Mohamed MA (1995) Multiple classifier fu-sion for handwritten word recognition. In: 1995 IEEEInt Conference on Systems Man Cybernetics. IntelligentSystems for the 21st Century (Cat. No. 95CH3576-7),pp 2329–2334

340. Golovchenko VB, Noskov NI (1992) Combining fore-casts with expert information. Autom Remote Control53(11):1746–1753

341. Goodenough DG, Robson MA (1988) Data fusion andobject recognition. In: Proc. Vision Interface ’88, pp 42–56

342. Gouveia FR, Barthes JPA (1993) Cooperative agentsin engineering environments. In: Advanced Technolo-gies: architecture – planning – civil engineering. 4th Eu-ropIA Int Conference on the Application of Artif IntellRobotics and Image Processing to Architecture, Build-ing Engineering, Civil Engineering, and Urban Designand Urban Planning, pp 319–326

343. Hanson AR, Riseman EM, Williams TD (1988) Sen-sor and information fusion from knowledge-based con-straints. Proc. SPIE 931:186–196

344. Ke L, Yea-Shuan H, Suen CY (1994) Image classificationby classifier combining technique. Proc. SPIE 2304:210–217

345. Sawaragi T, Umemura J, Katai O, Iwai S (1996) Fusingmultiple data and knowledge sources for signal under-standing by genetic algorithm. IEEE Trans Ind Electron43(3):411–421