PCA-based Offline Handwritten Character Recognition System

12
Smart Computing Review, vol. 3, no. 5, October 2013 DOI: 10.6029/smartcr.2013.05.005 346 Smart Computing Review PCA-based Offline Handwritten Character Recognition System Munish Kumar 1 , M. K. Jindal 2 , and R. K. Sharma 3 1 Computer Science Department, P. U. Rural Centre / Kauni, Muktsar, Punjab, India / [email protected] 2 Department of Computer Science & Applications, P. U. Regional Centre / Muktsar, Punjab, India 3 School of Mathematics & Computer Applications, Thapar University / Patiala, India * Corresponding Author: Munish Kumar Received July 4, 2013; Revised September 22, 2013; Accepted September 29, 2013; Published October 31, 2013 Abstract: Principal component analysis (PCA) has been used widely in pattern recognition to reduce the extent of the data. In this paper, we explore using this technique to recognize offline handwritten Gurmukhi characters, and a system for offline handwritten Gurmukhi character recognition using PCA is proposed. The system first prepares a skeleton of the character so that meaningful feature information about the character can be extracted. For classification, we used k- nearest neighbor, Linear-SVM, polynomial-SVM and RBF-SVM based approaches and combinations of these approaches. In this work, we collected 16,800 samples of isolated offline handwritten Gurmukhi characters. These samples were divided into three categories. In category 1 (5600 samples), each Gurmukhi character was written 100 times by a single writer. In category 2 (5600 samples), each Gurmukhi character was written 10 times by 10 different writers, and in category 3 (5600 samples), each Gurmukhi character was written by 100 different writers. The set of the basic 35 akhars of Gurmukhi has been considered here. A partitioning strategy for selecting the training and testing patterns is also explored in this work. We used zoning, diagonal, directional, transition, intersection and open end point, parabola curve fittingbased and power curve fittingbased feature extraction in order to find the feature set for a given character. The proposed system achieves a recognition accuracy of 99.06% in category 1, 98.73% in category 2 and 78.30% in category 3. Keywords: Handwritten character recognition, Feature extraction, PCA, k-NN, SVM Introduction

Transcript of PCA-based Offline Handwritten Character Recognition System

Smart Computing Review, vol. 3, no. 5, October 2013

DOI: 10.6029/smartcr.2013.05.005

346

Smart Computing Review

PCA-based Offline Handwritten Character

Recognition System Munish Kumar 1, M. K. Jindal2, and R. K. Sharma3

1 Computer Science Department, P. U. Rural Centre / Kauni, Muktsar, Punjab, India / [email protected]

2 Department of Computer Science & Applications, P. U. Regional Centre / Muktsar, Punjab, India

3 School of Mathematics & Computer Applications, Thapar University / Patiala, India

* Corresponding Author: Munish Kumar

Received July 4, 2013; Revised September 22, 2013; Accepted September 29, 2013; Published October 31, 2013

Abstract: Principal component analysis (PCA) has been used widely in pattern recognition to

reduce the extent of the data. In this paper, we explore using this technique to recognize offline

handwritten Gurmukhi characters, and a system for offline handwritten Gurmukhi character

recognition using PCA is proposed. The system first prepares a skeleton of the character so that

meaningful feature information about the character can be extracted. For classification, we used k-

nearest neighbor, Linear-SVM, polynomial-SVM and RBF-SVM based approaches and

combinations of these approaches. In this work, we collected 16,800 samples of isolated offline

handwritten Gurmukhi characters. These samples were divided into three categories. In category 1

(5600 samples), each Gurmukhi character was written 100 times by a single writer. In category 2

(5600 samples), each Gurmukhi character was written 10 times by 10 different writers, and in

category 3 (5600 samples), each Gurmukhi character was written by 100 different writers. The set

of the basic 35 akhars of Gurmukhi has been considered here. A partitioning strategy for selecting

the training and testing patterns is also explored in this work. We used zoning, diagonal,

directional, transition, intersection and open end point, parabola curve fitting–based and power

curve fitting–based feature extraction in order to find the feature set for a given character. The

proposed system achieves a recognition accuracy of 99.06% in category 1, 98.73% in category 2

and 78.30% in category 3.

Keywords: Handwritten character recognition, Feature extraction, PCA, k-NN, SVM

Introduction

Smart Computing Review, vol. 3, no. 5, October 2013

347

ffline handwritten character recognition, usually abbreviated as offline HCR, is the process of converting offline

handwritten characters into a machine process-able format. In this paper, we present an offline handwritten

Gurmukhi character recognition system using principal component analysis (PCA). A handwritten character recognition

system consists of several phases, namely digitization, preprocessing, feature extraction and classification. The feature

extraction stage analyzes a handwritten character image and selects a set of features that can uniquely be used for

recognition of that character. Different feature extraction methods have been proposed for representation of characters, such

as projection histograms, contour profile, zoning, Zernike moments, gradient features and Gabor features, etc. Singh et al.

[17] presented a study of different feature extractors and classifiers for handwritten Devanagari character recognition.

Aradhya et al. [1] presented a multilingual OCR system for south Indian scripts based on PCA. Deepu et al. [5] presented a

system based on PCA for online handwritten character recognition. Sundaram and Ramakarishnan [18] presented 2D-PCA

for online Tamil character recognition. Bhattacharya et al. [3] presented an efficient two-stage approach for handwritten

Bangla character recognition. Kumar et al. [7] presented an offline handwritten Gurmukhi character recognition system

based on support vector machines (SVM). In that work, they performed recognition without using PCA and used only an

SVM classifier for classification purpose. They also provided an offline handwritten Gurmukhi character recognition

system using a k-nearest neighbor (k-NN) classifier [8]. Sharma et al. [16] presented an online handwritten Gurmukhi script

recognition system. They used an elastic matching method in which the character is recognized in two stages. The first

stage recognizes the strokes and, in the second stage, the character is constructed on the basis of recognized strokes. In the

present work, a PCA-based offline handwritten Gurmukhi character recognition system is proposed from experimenting

with different recognition methods, namely, k-NN, Linear-SVM, Polynomial-SVM, RBF-SVM and combinations of these

recognition methods.

Data Collection

In this study, 16,800 samples of offline handwritten Gurmukhi characters have been collected. These samples have further

been divided into three categories. Category 1 consists of 5600 samples of Gurmukhi characters where each character was

written 100 times by a single writer. Category 2 also contains 5600 samples, and each Gurmukhi character was written 10

times by 10 different writers. In category 3, each Gurmukhi character was written by 100 different writers. This category

also consists of 5600 samples. All these characters were scanned at 300 dots per inch resolution. As such, a sufficiently

large database has been collected for offline handwritten Gurmukhi characters. These three categories have further been

analyzed and discussed in this paper.

Gurmukhi Script

Gurmukhi script is the script used for writing the Punjabi language and is derived from the old Punjabi term “Guramukhi”,

which means “from the mouth of the Guru.” Gurmukhi script is the 12th most widely used script in the world. The writing

style of Gurmukhi script is top to bottom, left to right, and it is not case sensitive. Gurmukhi script has 3 vowel bearers, 32

consonants, 6 additional consonants, 9 vowel modifiers, 3 auxiliary signs and 3 half characters.

The Proposed Recognition System

The proposed recognition system consists of several phases: digitization, preprocessing, feature extraction, and

classification.

■ Digitization

Digitization is the process of translating a paper-based handwritten document into electronic format. Here, each document

consists of only one Gurmukhi character. The electronic conversion is accomplished by using a method whereby a

document is scanned and an electronic representation of the original document in tagged image file format is produced. We

used an HP-1400 scanner for digitization, and the digital image was fed to the preprocessing phase.

■ Preprocessing

O

Kumar et al.: PCA-based Offline Handwritten Character Recognition System

348

In this phase, the gray-level character image is normalized into a window sized 100×100. After normalization, we produced

a bitmap image of the normalized image. Then, the bitmap image was transformed into a thinned image using a parallel

thinning algorithm [20].

■ Feature Extraction

In this phase, features from input characters are extracted. The performance of a handwritten character recognition system

primarily depends on the features that are extracted. The extracted features should allow classification of a character in a

unique way. We used diagonal features [7], intersection and open end points features [7], transition features [8], zoning

features [9], directional features [9], parabola curve fitting–based features [10], and power curve fitting–based features [10]

in order to find the feature set for a given character.

■ Classification

The classification phase uses the features extracted in the previous phase for setting class membership. In this work, we

used k-NN and SVM classifiers for character recognition. The SVM classifier was considered with three different kernels:

linear, polynomial, and RBF. In addition, a C-SVC type classifier in the Lib-SVM tool has been used for SVM

classification purposes. We also used combinations of output for each classifier in parallel, and recognition was done using

a voting scheme. We have taken following combinations of classifiers:

LPR (Linear-SVM + Polynomial-SVM + RBF-SVM)

PRK (Polynomial-SVM + RBF-SVM + k-NN)

LRK (Linear-SVM + RBF-SVM + k-NN)

LPK (Linear-SVM + Polynomial-SVM + k-NN)

Principal Component Analysis

PCA is a mathematical procedure that uses transformation to convert a set of observations of possibly correlated features

into a set of values of uncorrelated features called principal components. PCA is a well-established technique for extracting

representative features for character recognition and is used to reduce the extent of the data. The technique is useful when a

large number of variables prohibit effective interpretation of the relationships between different features. By reducing

dimensionality, one can interpret from a few features, rather than a large number of features. The number of principal

components is less than or equal to the number of original variables. By selecting the top j eigen vectors with larger eigen

values for subspace approximation, PCA can provide a lower dimensional representation to expose the underlying

structures of complex data sets. Let there be P features for handwritten character recognition. In the next step, the

symmetric matrix S of correlation coefficients between these features is calculated. Now, the eigenvectors and the corresponding eigen values are calculated.

From these P eigen vectors, only j eigen vectors are chosen, corresponding to the larger eigen values. An eigenvector

corresponding to a higher eigen value describes more characteristics of a character. Using these j eigen vectors, feature

extraction is done using PCA. In the present work, seven features for a Gurmukhi character have been considered, and the

experiments were conducted by taking 2, 3, 4, 5, 6 and 7 principal components.

Experimental Results and Discussion

In this section, the results of the offline handwritten Gurmukhi character recognition system using PCA are presented. The

recognition results are based on the k-NN, Linear-SVM, Polynomial-SVM and RBF-SVM classifiers, and combinations of

these. As stated earlier, we also experimented with partitioning strategies. We divided the data set of each category using

five partitioning strategies. In the first partitioning strategy (strategy a), we have taken 50% of the data in the training set

and the other 50% of the data in the testing set. In the second partitioning strategy (strategy b), we considered 60% of the

data in the training set and the remaining 40% of the data in the testing set. Partitioning strategy c has 70% of the data in

the training set and 30% of the data in the testing set. Similarly, partitioning strategy d has 80% of the data in the training

set and 20% of the data in the testing set, where as partitioning strategy e was formulated by taking 90% of the data in the

training set and the remaining 10% of the data in the testing set.

Category results of the recognition system based on PCA are presented in the following subsections.

Smart Computing Review, vol. 3, no. 5, October 2013

349

■ Recognition Accuracy for Category 1 Database

In this section, we considered each Gurmukhi character written 100times by a single writer. The features considered here

are the seven features discussed in Section 4.3. For the sake of comparison between the performance of principal

components, two principal components (2-PC), three principal components (3-PC), …, seven principal components (7-PC)

have been considered and taken as input for the classifiers. Partitioning strategy experimental results of testing are

presented in the following subsections.

Recognition accuracy using strategy a

In this subsection, classifier recognition results of partitioning strategy a are presented. PRK is the best classifier

combination for offline handwritten Gurmukhi character recognition when this strategy is followed. A maximum accuracy

of 97.48% canbe achieved withthis strategy. Recognition results of classifiers and their combinations are given in Table 1

for up to seven features (7-feature) and the principal components.

Table 1. Classifier recognition accuracy for Category 1, Strategy a

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 94.00% 92.97% 92.80% 93.60% 94.40% 94.63% 94.00% 93.77%

Poly. - SVM 95.43% 91.31% 69.67% 80.52% 89.09% 83.67% 95.20% 86.41%

RBF - SVM 95.43% 93.43% 91.09% 92.46% 93.82% 94.34% 17.81% 82.63%

k - NN 94.71% 97.41% 93.20% 91.94% 84.68% 82.57% 70.28% 87.83%

LPR 97.25% 95.77% 94.39% 95.25% 96.17% 96.11% 95.54% 95.78%

PRK 97.48% 94.45% 86.17% 90.51% 93.65% 91.19% 56.17% 87.09%

LRK 97.19% 95.65% 95.37% 95.82% 96.11% 95.94% 55.88% 90.28%

LPK 97.08% 94.68% 86.28% 90.85% 94.17% 91.54% 96.17% 92.97%

Average 96.07% 94.46% 88.62% 91.37% 92.76% 91.25% 72.63% 89.59%

Recognition accuracy using strategy b

We achieved an accuracy of 97.99% when we used strategy b, and we saw that LPR is the best classifier combination for

offline handwritten Gurmukhi character recognition with this strategy. Recognition results for up to seven features (7-

feature) and the principal components of partitioning strategy b are depicted in Table 2.

Table 2. Classifier recognition accuracy for Category 1, Strategy b

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 95.14% 94.93% 95.14% 94.93% 95.71% 95.50% 95.78% 95.30%

Poly. - SVM 95.36% 93.64% 79.44% 86.58% 91.93% 89.00% 92.17% 89.73%

RBF - SVM 95.36% 94.29% 92.57% 93.43% 94.71% 94.78% 20.27% 83.63%

k - NN 97.55% 97.42% 96.00% 95.35% 89.21% 86.42% 73.85% 90.83%

LPR 97.99% 97.14% 91.35% 94.35% 96.71% 95.35% 95.50% 95.48%

PRK 97.64% 96.71% 90.85% 94.35% 96.21% 94.64% 60.42% 90.12%

LRK 97.71% 97.57% 97.42% 97.37% 94.71% 97.64% 60.28% 91.81%

LPK 97.92% 97.14% 91.35% 94.35% 96.71% 95.35% 94.71% 95.36%

Average 96.83% 96.11% 91.77% 93.84% 94.49% 93.59% 74.12% 91.53%

Kumar et al.: PCA-based Offline Handwritten Character Recognition System

350

Recognition accuracy using strategy c

In partitioning strategy c, the maximum accuracy that could be achieved is 98.85%. Using this strategy, we again saw that

PRK is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results of this

partitioning strategy, for up to seven features (7-feature) and the principal components are given in Table 3.

Table 3. Classifier recognition accuracy for Category 1, Strategy c

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 95.96% 95.05% 94.95% 94.95% 95.24% 95.24% 95.14% 95.22%

Poly. - SVM 95.24% 93.72% 83.63% 88.68% 92.77% 90.48% 85.81% 90.05%

RBF - SVM 95.14% 94.29% 92.77% 93.62% 95.05% 94.48% 25.50% 84.41%

k - NN 97.42% 97.33% 97.80% 96.38% 90.09% 86.47% 77.52% 91.86%

LPR 98.57% 98.09% 98.00% 98.57% 98.57% 98.66% 98.28% 98.39%

PRK 98.85% 97.61% 93.33% 96.09% 97.42% 96.28% 65.61% 92.17%

LRK 98.57% 98.38% 98.47% 98.47% 98.66% 98.66% 65.23% 93.78%

LPK 98.66% 97.9% 93.04% 95.99% 97.71% 96.28% 94.48% 96.29%

Average 97.30% 96.55% 93.99% 95.34% 95.68% 94.56% 75.94% 92.77%

Recognition accuracy using strategy d

In this subsection, recognition results using strategy d are presented.Using this strategy, we achieved a maximum

accuracy of 99.28% when we use the LRK classifier combination. Recognition results for the features and the principal

components under consideration using this strategy are illustrated in Table 4.

Table 4. Classifier recognition accuracy for Category 1, Strategy d

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 94.00% 94.08% 93.86% 93.72% 93.86% 93.72% 93.10% 93.76%

Poly. - SVM 94.00% 92.43% 85.30% 87.30% 92.15% 90.44% 94.43% 90.86%

RBF - SVM 94.00% 93.29% 92.15% 93.01% 93.86% 93.72% 36.51% 85.22%

k - NN 94.74% 96.14% 97.14% 97.42% 92.57% 87.28% 76.42% 91.67%

LPR 99.00% 98.57% 97.71% 98.00% 98.85% 98.85% 93.86% 97.83%

PRK 99.14% 98.42% 94.71% 96.42% 98.42% 97.71% 92.15% 96.71%

LRK 99.28% 99.14% 98.28% 98.42% 99.14% 99.28% 94.08% 98.23%

LPK 99.14% 98.71% 95.14% 96.28% 98.14% 97.28% 92.57% 96.73%

Average 96.66% 96.34% 94.28% 95.07% 95.87% 94.78% 84.14% 93.88%

Recognition accuracy using strategy e

In this subsection, classifier recognition results of partitioning strategy eare presented. LPK is the best classifier

combination when we follow this strategy. For the features and the principal components under consideration, a maximum

accuracy of 99.71% could be achieved. Recognition results of classifiers and their combinations for up to seven features

(7-feature) and the principal components are given in Table 5.

■ Recognition Accuracy for Category 2 Database

Smart Computing Review, vol. 3, no. 5, October 2013

351

In this section, we consider each Gurmukhi character written 10times by 10different writers. Again the features that have

been considered here are the seven features discussed in Section 4.3 and the principal components, two principal

components (2-PC), three principal components (3-PC), …, seven principal components (7-PC) have been considered and

taken as input for the classifiers. Partitioning strategy experimental results are presented in the following subsections.

Table 5. Classifier recognition accuracy for Category 1, Strategy e

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 89.45% 89.45% 89.46% 89.46% 89.46% 89.46% 89.45% 89.46%

Poly. - SVM 89.46% 87.17% 80.62% 84.90% 87.74% 86.03% 89.46% 86.48%

RBF - SVM 89.74% 89.46% 88.31% 88.60% 98.74% 90.02% 70.08% 87.85%

k - NN 96.42% 97.71% 96.57% 97.14% 88.57% 79.14% 69.71% 89.32%

LPR 99.42% 98.28% 97.99% 98.57% 99.42% 99.14% 99.42% 98.89%

PRK 98.85% 98.28% 94.85% 97.14% 99.14% 97.99% 98.57% 97.83%

LRK 98.85% 99.14% 98.85% 99.42% 99.14% 99.14% 98.85% 99.06%

LPK 99.42% 99.14% 95.14% 96.85% 99.71% 98.00% 98.00% 98.04%

Average 95.20% 94.83% 92.72% 94.01% 95.24% 92.36% 89.19% 93.37%

Recognition accuracy using strategy a

In this subsection, classifier recognition results of partitioning strategy a are presented. When we consider this strategy, k-

NN is the best classifier for offline handwritten Gurmukhi character recognition. The maximum accuracy achieved was

94.51% for this strategy. Recognition results of classifiers and their combinations are given in Table 6.

Table 6. Classifier recognition accuracy for Category 2, Strategy a

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 77.78% 76.58% 74.75% 75.61% 79.55% 81.38% 76.52% 77.45%

Poly. - SVM 75.96% 53.68% 25.58% 33.75% 41.86% 45.34% 53.68% 47.12%

RBF - SVM 80.29% 75.32% 73.04% 75.67% 78.64% 79.72% 15.07% 68.25%

k - NN 91.42% 94.51% 83.71% 75.77% 69.77% 71.42% 60.51% 78.16%

LPR 80.45% 74.62% 70.62% 73.42% 77.99% 78.97% 75.19% 75.89%

PRK 79.85% 65.02% 49.25% 51.65% 58.17% 61.54% 38.68% 57.74%

LRK 79.82% 76.57% 75.60% 75.48% 79.65% 81.14% 37.37% 72.23%

LPK 78.34% 65.34% 49.71% 51.77% 58.00% 61.65% 78.74% 63.36%

Average 80.49% 72.71% 62.78% 64.14% 67.95% 70.14% 54.47% 67.52%

Recognition accuracy using strategy b

In partitioning strategy b, the maximum accuracy that could be achieved is 94.5%. Using this strategy, we again observed

that k-NN is the best classifier for offline handwritten Gurmukhi character recognition. Recognition results of this

partitioning strategy, for up to seven features (7-feature) and the principal components are depicted in Table 7.

Recognition accuracy using strategy c

We achieved an accuracy of 95.14% when we used strategy c, and we infer that LPR is the best classifier combination for

offline handwritten Gurmukhi character recognition withthis strategy. Recognition results for this partitioning strategy are

given in Table 8.

Kumar et al.: PCA-based Offline Handwritten Character Recognition System

352

Recognition accuracy using strategy d

In this subsection, classifier recognition results of partitioning strategy d are presented. When we consider this strategy,

LPK is the best classifier combination for offline handwritten Gurmukhi character recognition. The maximum accuracy

that could be achieved is 97.71% withthis strategy. Recognition results are depicted in Table 9.

Recognition accuracy using strategy e

In partitioning strategy e, the maximum accuracy that could be achieved is 99.42%. Using this strategy, we noticed that,

again, LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results

for the features and the principal components under consideration using this strategy are illustrated in Table 10.

Table 7. Classifier recognition accuracy for Category 2, Strategy b

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 56.24% 79.15% 78.37% 79.22% 82.51% 83.94% 80.01% 77.06%

Poly. - SVM 55.67% 62.04% 34.33% 40.82% 51.32% 56.81% 82.29% 54.75%

RBF - SVM 57.05% 79.37% 77.37% 79.73% 81.44% 83.87% 17.91% 68.11%

k - NN 93.14% 94.50% 86.21% 77.85% 73.28% 73.00% 59.92% 79.70%

LPR 84.50% 80.00% 77.14% 79.57% 82.42% 83.57% 79.42% 80.95%

PRK 83.85% 72.00% 58.07% 59.92% 65.42% 70.28% 43.28% 64.69%

LRK 83.64% 81.35% 80.07% 80.92% 83.78% 85.50% 42.07% 76.76%

LPK 82.57% 72.07% 57.78% 59.14% 65.78% 70.07% 82.78% 70.03%

Average 74.58% 77.56% 68.66% 69.64% 73.24% 75.88% 60.96% 71.50%

Table 8. Classifier recognition accuracy for Category 2, Strategy c

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 82.68% 82.20% 82.39% 82.49% 84.49% 86.58% 82.20% 83.29%

Poly. - SVM 83.06% 70.40% 42.43% 50.04% 59.72% 67.07% 84.87% 65.37%

RBF - SVM 84.97% 81.25% 79.82% 81.44% 83.92% 85.82% 22.64% 74.27%

k - NN 94.57% 95.61% 86.19% 83.04% 79.52% 76.57% 66.95% 83.21%

LPR 95.14% 83.14% 80.57% 82.47% 84.85% 87.42% 82.66% 85.18%

PRK 86.38% 78.66% 64.66% 68.09% 74.28% 78.00% 50.19% 71.47%

LRK 86.57% 84.66% 82.95% 83.99% 86.66% 88.76% 48.85% 80.35%

LPK 84.95% 79.42% 65.33% 68.66% 74.19% 77.61% 84.95% 76.44%

Average 87.29% 81.92% 73.04% 75.02% 78.45% 80.97% 65.41% 77.44%

■ Recognition Accuracy for Category 3 Database

In this section, we consider each Gurmukhi character written by 100different writers. Here, the seven features discussed in

Section 4.3 and the principal components—two principal components (2-PC), three principal components (3-PC), …, seven

principal components (7-PC)—have again been considered and taken as input to the classifiers. The results are presented in

the following subsections.

Recognition accuracy using strategy a

In this subsection, we present classifier recognition results of partitioning strategy a. In this strategy, the maximum

Smart Computing Review, vol. 3, no. 5, October 2013

353

accuracy that could be achieved is 79.48%. Using this strategy, we observed that LPR is the best classifier combination

for offline handwritten Gurmukhi character recognition. Recognition results of classifiers and their combinations are

given in Table 11 for up to seven features (7-feature) and the principal components.

Table 9. Classifier recognition accuracy for Category 2, Strategy d

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 90.72% 90.01% 90.58% 91.01% 92.01% 93.01% 88.59% 90.85%

Poly. - SVM 91.87% 83.02% 55.63% 65.19% 75.89% 80.59% 92.72% 77.84%

RBF - SVM 91.58% 88.73% 88.01% 88.87% 90.44% 91.72% 31.66% 81.57%

k - NN 93.57% 94.42% 87.42% 82.57% 83.71% 77.86% 67.57% 83.87%

LPR 97.28% 93.57% 92.48% 93.28% 94.85% 96.71% 93.28% 94.49%

PRK 97.42% 92.00% 78.42% 82.85% 89.00% 90.85% 61.28% 84.55%

LRK 97.28% 95.28% 94.57% 94.14% 96.14% 97.28% 59.42% 90.59%

LPK 97.71% 93.71% 80.00% 83.99% 89.71% 91.71% 95.28% 90.30%

Average 94.67% 91.34% 83.38% 85.23% 88.96% 89.96% 73.72% 86.75%

Table 10. Classifier recognition accuracy for Category 2, Strategy e

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 89.45% 89.46% 89.45% 89.46% 89.45% 89.74% 88.89% 89.41%

Poly. - SVM 89.17% 87.17% 67.80% 80.63% 84.90% 84.33% 80.63% 82.09%

RBF - SVM 88.60% 87.46% 87.46% 88.03% 88.31% 88.60% 75.49% 86.28%

k - NN 92.85% 95.71% 93.71% 85.14% 78.00% 62.86% 48.00% 79.47%

LPR 99.42% 98.00% 97.99% 98.57% 99.14% 98.57% 99.42% 98.73%

PRK 97.42% 97.42% 90.28% 94.00% 95.42% 95.14% 99.42% 95.59%

LRK 98.85% 98.57% 98.57% 98.57% 97.99% 99.14% 98.85% 98.65%

LPK 98.57% 98.57% 90.57% 94.28% 95.99% 95.71% 98.85% 96.08%

Average 94.29% 94.04% 89.47% 91.08% 91.15% 89.26% 86.19% 90.78%

Recognition accuracy using strategy b

In partitioning strategy b, the maximum accuracy that could be achieved is 81.78%. Using this strategy, we saw that,

again, LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition

results for this strategy are illustrated in Table 12.

Recognition accuracy using strategy c

In this subsection, classifier recognition results of partitioning strategy c have been presented. Here, LPR is again the best

classifier combination when we followed this strategy. A maximum recognition accuracy of 81.8% could be achieved

with this strategy. Recognition results of classifiers and their combinations for up to seven features (7-feature) and the

prinicipal components are given in Table 13.

Recognition accuracy using strategy d

In partitioning strategy d, the maximum accuracy that could be achieved is 84%. Using this strategy, we found PRK is

the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results for this

strategy are given in Table 14.

Kumar et al.: PCA-based Offline Handwritten Character Recognition System

354

Table 11. Classifier recognition accuracy for Category 3, Strategy a

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 74.87% 72.81% 71.62% 72.87% 77.27% 77.84% 75.78% 74.72%

Poly. - SVM 75.04% 30.21% 14.39% 17.70% 23.87% 34.72% 69.67% 37.94%

RBF - SVM 78.35% 69.33% 66.59% 67.10% 72.92% 72.92% 17.81% 63.57%

k - NN 77.27% 75.71% 64.11% 58.45% 48.80% 57.54% 43.88% 60.82%

LPR 79.48% 68.99% 64.91% 65.94% 72.62% 74.85% 74.57% 71.62%

PRK 78.34% 51.37% 38.45% 37.54% 41.37% 48.74% 26.57% 46.05%

LRK 78.74% 72.79% 69.88% 69.95% 75.37% 75.54% 25.59% 66.84%

LPK 76.62% 53.31% 40.45% 39.31% 43.42% 50.97% 77.14% 54.46%

Average 77.33% 61.81% 53.8% 53.60% 56.95% 61.64% 51.37% 59.50%

Table 12. Classifier recognition accuracy for Category 3, Strategy b

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 75.80% 73.73% 73.94% 74.08% 77.73% 78.08% 75.44% 75.54%

Poly. - SVM 76.15% 35.68% 16.27% 21.77% 30.69% 40.54% 51.32% 38.92%

RBF - SVM 78.44% 70.59% 67.16% 68.45% 73.16% 74.23% 20.27% 64.61%

k - NN 79.71% 77.57% 67.00% 59.28% 47.14% 57.57% 40.71% 61.28%

LPR 81.78% 70.35% 68.78% 68.42% 74.14% 75.71% 75.14% 57.06%

PRK 79.57% 55.42% 40.49% 39.85% 44.85% 52.92% 25.00% 48.30%

LRK 77.00% 74.14% 72.57% 71.64% 76.07% 77.21% 23.71% 67.48%

LPK 78.35% 56.64% 43.50% 42.07% 46.57% 55.00% 77.28% 57.06%

Average 78.35% 64.26% 56.21% 55.69% 58.79% 63.90% 48.60% 58.78%

Table 13. Classifier recognition accuracy for Category 3, Strategy c

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 77.73% 74.50% 74.59% 75.45% 79.92% 80.01% 75.73% 76.85%

Poly. - SVM 77.35% 43.67% 21.12% 27.40% 38.24% 45.09% 59.72% 44.66%

RBF - SVM 80.20% 72.78% 70.02% 70.98% 75.26% 75.74% 25.50% 67.21%

k - NN 81.14% 80.28% 68.57% 60.66% 50.76% 59.71% 42.76% 63.41%

LPR 81.80% 72.47% 69.52% 70.30% 77.23% 77.14% 81.80% 75.75%

PRK 80.95% 59.52% 44.66% 44.85% 52.85% 57.33% 75.71% 59.41%

LRK 81.04% 74.85% 73.14% 73.14% 78.19% 78.76% 72.09% 75.89%

LPK 80.00% 60.47% 47.52% 46.47% 54.19% 59.80% 77.61% 60.87%

Average 80.02% 67.31% 58.64% 58.65% 63.33% 66.69% 63.86% 65.50%

Smart Computing Review, vol. 3, no. 5, October 2013

355

Table 14. Classifier recognition accuracy for Category 3, Strategy d

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 78.03% 75.89% 75.04% 75.03% 79.03% 80.74% 75.03% 76.97%

Poly. - SVM 78.60% 51.92% 26.24% 29.81% 46.21% 51.21% 81.45% 52.21%

RBF - SVM 80.74% 70.89% 69.32% 71.04% 75.74% 76.31% 32.46% 68.07%

k - NN 80.28% 77.71% 62.57% 54.86% 42.42% 51.28% 35.28% 57.77%

LPR 83.28% 76.14% 71.99% 73.57% 79.71% 80.85% 81.45% 78.14%

PRK 84.00% 66.14% 45.57% 48.42% 57.85% 61.57% 78.71% 63.18%

LRK 83.99% 77.42% 75.14% 77.00% 79.71% 80.42% 74.42% 78.30%

LPK 83.28% 68.14% 50.71% 49.57% 59.85% 62.85% 80.57% 65.00%

Average 81.52% 70.53% 59.57% 59.91% 65.06% 68.15% 67.42% 67.45%

Recognition accuracy using strategy e

In this subsection, classifier recognition results of partitioning strategy e are presented. Here, PRK is the best classifier

combination for offline handwritten Gurmukhi character recognition. We achieved a maximum recognition accuracy of

84.9% withthis strategy. Recognition results are shown in Table 15.

Table 15. Classifier recognition accuracy for Category 3, Strategy e

Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-

Feature Average

Linear -SVM 74.64% 70.65% 70.94% 70.08% 73.21% 76.35% 69.23% 72.16%

Poly. - SVM 76.07% 51.85% 28.49% 33.33% 47.86% 54.70% 67.00% 51.33%

RBF - SVM 79.48% 65.24% 62.39% 64.96% 69.23% 72.36% 65.24% 68.41%

k - NN 77.14% 72.85% 57.14% 51.42% 35.71% 35.42% 27.71% 51.06%

LPR 83.71% 72.28% 65.42% 68.57% 76.00% 79.42% 79.99% 75.06%

PRK 84.90% 62.28% 42.28% 48.57% 54.85% 59.42% 73.71% 60.86%

LRK 83.71% 73.14% 69.71% 72.28% 73.14% 76.85% 69.14% 74.00%

LPK 81.42% 66.57% 45.14% 48.28% 57.71% 60.85% 75.42% 62.20%

Average 80.13% 66.85% 55.18% 57.18% 60.96% 64.42% 65.93% 64.38%

Conclusion

The work presented in this paper proposes an offline handwritten Gurmukhi character recognition system using PCA. The

features of a character that have been considered in this work include zoning features, diagonal features, directional

features, transition features, intersection and open end points features, parabola curve fitting–based features and power

curve fitting–based features. The classifiers employed in this work are k-NN, Linear-SVM, Polynomial-SVM and RBF-

SVM and combinations of these. Database category and strategy recognition accuracy is depicted in Table 16, and we

conclude that 2-PC is more efficient than other feature sets. The proposed system achieves an average recognition

accuracy of 99.06% fromthe category 1 databasewhen strategy e and the LRK classifier is used, 98.73% fromthe category

2 databasewhen strategy e and the LPR classifier is used, and 78.30% fromthe category 3 database when strategy d and

the LRK classifier is used. This accuracy can further be increased by considering a larger data set while training the

classifier. This work can also be extended for offline handwritten character recognition of other Indian scripts.

Kumar et al.: PCA-based Offline Handwritten Character Recognition System

356

Table 16. Database category wise recognition accuracy

Database category Feature Classifier Accuracy (%)

Category 1 Strategy a 2-PC PRK 97.48%

Category 1 Strategy b 2-PC LPR 97.99%

Category 1 Strategy c 2-PC PRK 98.85%

Category 1 Strategy d 2-PC LRK 99.28%

Category 1 Strategy e 6-PC LPK 99.71%

Category 2 Strategy a 3-PC k - NN 94.51%

Category 2 Strategy b 3-PC k - NN 94.50%

Category 2 Strategy c 2-PC LPR 95.14%

Category 2 Strategy d 2-PC LPK 97.71%

Category 2 Strategy e 2-PC LPR 99.42%

Category 3 Strategy a 2-PC LPR 79.48%

Category 3 Strategy b 2-PC LPR 81.78%

Category 3 Strategy c 2-PC LPR 81.80%

Category 3 Strategy d 2-PC PRK 84.00%

Category 3 Strategy e 2-PC PRK 84.90%

References

[1] V. N. M. Aradhya, G. H. Kumar, S. Noushath, ―Multilingual OCR system for south Indian scripts and English

documents: An approach based on Fourier transform and principal component analysis,‖ Engineering Applications of

Artificial Intelligence, vol. 21, pp. 658-668, 2008. Article(CrossRef Link)

[2] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D. K. Basu, ―A hierarchical approach to recognition of

handwritten Bangla characters,‖ Pattern Recognition, vol. 42, no. 7, pp. 1467-1484, 1999. Article(CrossRef Link)

[3] U. Bhattacharya, M. Shridhar, S. K. Parui, P. K. Sen, B. B. Chaudhuri, ―Offline recognition of handwritten Bangla

characters: an efficient two-stage approach,‖ Pattern Analysis and Applications, vol. 15, no. 4, pp. 445-458, 2012.

Article(CrossRef Link)

[4] T. K. Bhowmik, P. Ghanty, A. Roy, S. K. Parui, ―SVM-based hierarchical architectures for handwritten Bangla

character recognition,‖ International Journal of Document Analysis Recognition, vol. 12, no. 2, pp. 97-108, 2009.

Article(CrossRef Link)

[5] V. Deepu, S. Madhvanath, R. G. Ramakrishnan, ―Principal Component Analysis for online handwritten character

recognition,‖ in Proc. of 17th International Conference on Pattern Recognition, vol. 2, pp. 327-330, 2004.

[6] P. D. Gader, M. Mohamed, J. H. Chiang, ―Handwritten word recognition with character and inter-character neural

networks,‖ IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 27, no. 1, pp. 158-164,

1997. Article(CrossRef Link)

[7] M. Kumar, M. K. Jindal, R. K. Sharma, ―SVM based offline handwritten Gurmukhi character recognition,‖ in Proc. of

International Workshop on Soft Computing and Knowledge Discovery‖, vol. 758, pp. 51-62, 2011.

[8] M. Kumar, M. K. Jindal, R. K. Sharma, ―k-NN based offline handwritten Gurmukhi character recognition,‖ in Proc. of

International Conference on Information and Image Processing, pp.1-4, 2011.

[9] M. Kumar, M. K. Jindal, R. K. Sharma, ―Classification of Characters and Grading Writers in Offline Handwritten

Gurmukhi Script,‖ in Proc. of International Conference on Information and Image Processing, pp. 1-4, 2011.

[10] M. Kumar, M. K. Jindal, R. K. Sharma, ―Offline Handwritten Gurmukhi Character Recognition using Curvature

Feature,‖ in Proc. of International Conference on AMOC, pp. 981-989, 2011.

[11] G. S. Lehal, C. Singh, ―A Gurmukhi script recognition system,‖ in Proc. of 15th

International Conference on Pattern

Recognition, vol. 2, pp. 557-560, 2000.

[12] U. Pal, B. B. Chaudhuri, ―Indian script character recognition: A survey,‖ Pattern Recognition, vol. 37, no. 9, pp.

1887–1899, 2004. Article(CrossRef Link)

Smart Computing Review, vol. 3, no. 5, October 2013

357

[13] U. Pal, T. Wakabayashi, F. Kimura, ―Handwritten Bangla Compound Character Recognition using Gradient Feature,‖

in Proc. of 10th

International Conference on Information Technology, pp. 208-213, 2007

[14] U. Pal, T. Wakabayashi, F. Kimura, ―Handwritten numeral recognition of six popular scripts,‖ in Proc. of

International Conference on Document Analysis and Recognition (ICDAR 07), vol. 2, pp. 749-753, 2007.

[15] U. Pal, T. Wakabayashi, F. Kimura, ―A system for off-line Oriya handwritten character recognition using curvature

feature,‖ in Proc. of 10th International Conference on Information Technology, pp. 227-229, 2007.

[16] A. Sharma, R. Kumar, R. K. Sharma, ―Online handwritten Gurmukhi character recognition using elastic matching,‖

International Journal of Congress on Image and Signal Processing, vol. 2, pp. 391-396, 2008.

[17] B. Singh, A. Mittal, D. Ghosh, ―An Evaluation of Different feature extractors and classifiers for offline handwritten

Devanagri character recognition,‖ Journal of Pattern Recognition Research, vol. 2, pp. 269-277, 2011.

Article(CrossRef Link)

[18] S. Sundaram, A. G. Ramakrishnan, ―Two Dimensional Principal Component Analysis for Online Tamil Character

Recognition,‖ in Proc. of 11th International Conference Frontiers in Handwriting Recognition, pp. 88-94, 2008.

[19] Y. Wen, Y. Lub, P. Shi, ―Handwritten Bangla numeral recognition system and its application to postal automation,‖

Pattern Recognition, vol. 40, no. 1, pp. 99-107, 2007. Article(CrossRef Link)

[20] T. Y. Zhang, C. Y. Suen, ―A fast parallel algorithm for thinning digital patterns,‖ Communications of the ACM, vol.

27, no. 3, pp. 236-239, 1984. Article(CrossRef Link)

Munish Kumar received his Masters degree in Computer Science & Engineering from Thapar

University, Patiala, India in 2008. He started his career as an Assistant Professor in computer

application at Jaito Centre of Punjabi University, Patiala. He is working as Assistant Professor in

the Computer Science Department, Panjab University Rural Centre, Kauni, Muktsar, Punjab,

India. He is currently pursuing his Ph.D. degree from Thapar University, Patiala, Punjab, India.

His research interests include Character Recognition.

Manish Kumar Jindal received his Bachelors degree in science in 1996 and Post Graduate degree

in Computer Applications from Punjabi University, Patiala, India in 1999. He holds a Gold Medal

in his post graduation. He received his Ph.D. degree in Computer Science & Engineering from

Thapar University, Patiala, India in 2008. He is working as Associate Professor in Panjab

University Regional Centre, Muktsar, Punjab, India. His research interests include Character

Recognition and Pattern Recognition.

Rajendra Kumar Sharma received his Ph.D. degree in Mathematics from the University of

Roorkee (Now, IIT Roorkee), India in 1993. He is currently working as Professor at Thapar

University, Patiala, India, where he teaches, among other things, statistical models and their usage

in computer science. He has been involved in the organization of a number of conferences and

other courses at Thapar University, Patiala. His main research interests are statistical models in

computer science, Neural Networks, and Pattern Recognition.

Copyrights © 2013 KAIS