ERHA Offline Activation Proposal_Jan2022.pptx - Projects.co.id
PCA-based Offline Handwritten Character Recognition System
Transcript of PCA-based Offline Handwritten Character Recognition System
Smart Computing Review, vol. 3, no. 5, October 2013
DOI: 10.6029/smartcr.2013.05.005
346
Smart Computing Review
PCA-based Offline Handwritten Character
Recognition System Munish Kumar 1, M. K. Jindal2, and R. K. Sharma3
1 Computer Science Department, P. U. Rural Centre / Kauni, Muktsar, Punjab, India / [email protected]
2 Department of Computer Science & Applications, P. U. Regional Centre / Muktsar, Punjab, India
3 School of Mathematics & Computer Applications, Thapar University / Patiala, India
* Corresponding Author: Munish Kumar
Received July 4, 2013; Revised September 22, 2013; Accepted September 29, 2013; Published October 31, 2013
Abstract: Principal component analysis (PCA) has been used widely in pattern recognition to
reduce the extent of the data. In this paper, we explore using this technique to recognize offline
handwritten Gurmukhi characters, and a system for offline handwritten Gurmukhi character
recognition using PCA is proposed. The system first prepares a skeleton of the character so that
meaningful feature information about the character can be extracted. For classification, we used k-
nearest neighbor, Linear-SVM, polynomial-SVM and RBF-SVM based approaches and
combinations of these approaches. In this work, we collected 16,800 samples of isolated offline
handwritten Gurmukhi characters. These samples were divided into three categories. In category 1
(5600 samples), each Gurmukhi character was written 100 times by a single writer. In category 2
(5600 samples), each Gurmukhi character was written 10 times by 10 different writers, and in
category 3 (5600 samples), each Gurmukhi character was written by 100 different writers. The set
of the basic 35 akhars of Gurmukhi has been considered here. A partitioning strategy for selecting
the training and testing patterns is also explored in this work. We used zoning, diagonal,
directional, transition, intersection and open end point, parabola curve fitting–based and power
curve fitting–based feature extraction in order to find the feature set for a given character. The
proposed system achieves a recognition accuracy of 99.06% in category 1, 98.73% in category 2
and 78.30% in category 3.
Keywords: Handwritten character recognition, Feature extraction, PCA, k-NN, SVM
Introduction
Smart Computing Review, vol. 3, no. 5, October 2013
347
ffline handwritten character recognition, usually abbreviated as offline HCR, is the process of converting offline
handwritten characters into a machine process-able format. In this paper, we present an offline handwritten
Gurmukhi character recognition system using principal component analysis (PCA). A handwritten character recognition
system consists of several phases, namely digitization, preprocessing, feature extraction and classification. The feature
extraction stage analyzes a handwritten character image and selects a set of features that can uniquely be used for
recognition of that character. Different feature extraction methods have been proposed for representation of characters, such
as projection histograms, contour profile, zoning, Zernike moments, gradient features and Gabor features, etc. Singh et al.
[17] presented a study of different feature extractors and classifiers for handwritten Devanagari character recognition.
Aradhya et al. [1] presented a multilingual OCR system for south Indian scripts based on PCA. Deepu et al. [5] presented a
system based on PCA for online handwritten character recognition. Sundaram and Ramakarishnan [18] presented 2D-PCA
for online Tamil character recognition. Bhattacharya et al. [3] presented an efficient two-stage approach for handwritten
Bangla character recognition. Kumar et al. [7] presented an offline handwritten Gurmukhi character recognition system
based on support vector machines (SVM). In that work, they performed recognition without using PCA and used only an
SVM classifier for classification purpose. They also provided an offline handwritten Gurmukhi character recognition
system using a k-nearest neighbor (k-NN) classifier [8]. Sharma et al. [16] presented an online handwritten Gurmukhi script
recognition system. They used an elastic matching method in which the character is recognized in two stages. The first
stage recognizes the strokes and, in the second stage, the character is constructed on the basis of recognized strokes. In the
present work, a PCA-based offline handwritten Gurmukhi character recognition system is proposed from experimenting
with different recognition methods, namely, k-NN, Linear-SVM, Polynomial-SVM, RBF-SVM and combinations of these
recognition methods.
Data Collection
In this study, 16,800 samples of offline handwritten Gurmukhi characters have been collected. These samples have further
been divided into three categories. Category 1 consists of 5600 samples of Gurmukhi characters where each character was
written 100 times by a single writer. Category 2 also contains 5600 samples, and each Gurmukhi character was written 10
times by 10 different writers. In category 3, each Gurmukhi character was written by 100 different writers. This category
also consists of 5600 samples. All these characters were scanned at 300 dots per inch resolution. As such, a sufficiently
large database has been collected for offline handwritten Gurmukhi characters. These three categories have further been
analyzed and discussed in this paper.
Gurmukhi Script
Gurmukhi script is the script used for writing the Punjabi language and is derived from the old Punjabi term “Guramukhi”,
which means “from the mouth of the Guru.” Gurmukhi script is the 12th most widely used script in the world. The writing
style of Gurmukhi script is top to bottom, left to right, and it is not case sensitive. Gurmukhi script has 3 vowel bearers, 32
consonants, 6 additional consonants, 9 vowel modifiers, 3 auxiliary signs and 3 half characters.
The Proposed Recognition System
The proposed recognition system consists of several phases: digitization, preprocessing, feature extraction, and
classification.
■ Digitization
Digitization is the process of translating a paper-based handwritten document into electronic format. Here, each document
consists of only one Gurmukhi character. The electronic conversion is accomplished by using a method whereby a
document is scanned and an electronic representation of the original document in tagged image file format is produced. We
used an HP-1400 scanner for digitization, and the digital image was fed to the preprocessing phase.
■ Preprocessing
O
Kumar et al.: PCA-based Offline Handwritten Character Recognition System
348
In this phase, the gray-level character image is normalized into a window sized 100×100. After normalization, we produced
a bitmap image of the normalized image. Then, the bitmap image was transformed into a thinned image using a parallel
thinning algorithm [20].
■ Feature Extraction
In this phase, features from input characters are extracted. The performance of a handwritten character recognition system
primarily depends on the features that are extracted. The extracted features should allow classification of a character in a
unique way. We used diagonal features [7], intersection and open end points features [7], transition features [8], zoning
features [9], directional features [9], parabola curve fitting–based features [10], and power curve fitting–based features [10]
in order to find the feature set for a given character.
■ Classification
The classification phase uses the features extracted in the previous phase for setting class membership. In this work, we
used k-NN and SVM classifiers for character recognition. The SVM classifier was considered with three different kernels:
linear, polynomial, and RBF. In addition, a C-SVC type classifier in the Lib-SVM tool has been used for SVM
classification purposes. We also used combinations of output for each classifier in parallel, and recognition was done using
a voting scheme. We have taken following combinations of classifiers:
LPR (Linear-SVM + Polynomial-SVM + RBF-SVM)
PRK (Polynomial-SVM + RBF-SVM + k-NN)
LRK (Linear-SVM + RBF-SVM + k-NN)
LPK (Linear-SVM + Polynomial-SVM + k-NN)
Principal Component Analysis
PCA is a mathematical procedure that uses transformation to convert a set of observations of possibly correlated features
into a set of values of uncorrelated features called principal components. PCA is a well-established technique for extracting
representative features for character recognition and is used to reduce the extent of the data. The technique is useful when a
large number of variables prohibit effective interpretation of the relationships between different features. By reducing
dimensionality, one can interpret from a few features, rather than a large number of features. The number of principal
components is less than or equal to the number of original variables. By selecting the top j eigen vectors with larger eigen
values for subspace approximation, PCA can provide a lower dimensional representation to expose the underlying
structures of complex data sets. Let there be P features for handwritten character recognition. In the next step, the
symmetric matrix S of correlation coefficients between these features is calculated. Now, the eigenvectors and the corresponding eigen values are calculated.
From these P eigen vectors, only j eigen vectors are chosen, corresponding to the larger eigen values. An eigenvector
corresponding to a higher eigen value describes more characteristics of a character. Using these j eigen vectors, feature
extraction is done using PCA. In the present work, seven features for a Gurmukhi character have been considered, and the
experiments were conducted by taking 2, 3, 4, 5, 6 and 7 principal components.
Experimental Results and Discussion
In this section, the results of the offline handwritten Gurmukhi character recognition system using PCA are presented. The
recognition results are based on the k-NN, Linear-SVM, Polynomial-SVM and RBF-SVM classifiers, and combinations of
these. As stated earlier, we also experimented with partitioning strategies. We divided the data set of each category using
five partitioning strategies. In the first partitioning strategy (strategy a), we have taken 50% of the data in the training set
and the other 50% of the data in the testing set. In the second partitioning strategy (strategy b), we considered 60% of the
data in the training set and the remaining 40% of the data in the testing set. Partitioning strategy c has 70% of the data in
the training set and 30% of the data in the testing set. Similarly, partitioning strategy d has 80% of the data in the training
set and 20% of the data in the testing set, where as partitioning strategy e was formulated by taking 90% of the data in the
training set and the remaining 10% of the data in the testing set.
Category results of the recognition system based on PCA are presented in the following subsections.
Smart Computing Review, vol. 3, no. 5, October 2013
349
■ Recognition Accuracy for Category 1 Database
In this section, we considered each Gurmukhi character written 100times by a single writer. The features considered here
are the seven features discussed in Section 4.3. For the sake of comparison between the performance of principal
components, two principal components (2-PC), three principal components (3-PC), …, seven principal components (7-PC)
have been considered and taken as input for the classifiers. Partitioning strategy experimental results of testing are
presented in the following subsections.
Recognition accuracy using strategy a
In this subsection, classifier recognition results of partitioning strategy a are presented. PRK is the best classifier
combination for offline handwritten Gurmukhi character recognition when this strategy is followed. A maximum accuracy
of 97.48% canbe achieved withthis strategy. Recognition results of classifiers and their combinations are given in Table 1
for up to seven features (7-feature) and the principal components.
Table 1. Classifier recognition accuracy for Category 1, Strategy a
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 94.00% 92.97% 92.80% 93.60% 94.40% 94.63% 94.00% 93.77%
Poly. - SVM 95.43% 91.31% 69.67% 80.52% 89.09% 83.67% 95.20% 86.41%
RBF - SVM 95.43% 93.43% 91.09% 92.46% 93.82% 94.34% 17.81% 82.63%
k - NN 94.71% 97.41% 93.20% 91.94% 84.68% 82.57% 70.28% 87.83%
LPR 97.25% 95.77% 94.39% 95.25% 96.17% 96.11% 95.54% 95.78%
PRK 97.48% 94.45% 86.17% 90.51% 93.65% 91.19% 56.17% 87.09%
LRK 97.19% 95.65% 95.37% 95.82% 96.11% 95.94% 55.88% 90.28%
LPK 97.08% 94.68% 86.28% 90.85% 94.17% 91.54% 96.17% 92.97%
Average 96.07% 94.46% 88.62% 91.37% 92.76% 91.25% 72.63% 89.59%
Recognition accuracy using strategy b
We achieved an accuracy of 97.99% when we used strategy b, and we saw that LPR is the best classifier combination for
offline handwritten Gurmukhi character recognition with this strategy. Recognition results for up to seven features (7-
feature) and the principal components of partitioning strategy b are depicted in Table 2.
Table 2. Classifier recognition accuracy for Category 1, Strategy b
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 95.14% 94.93% 95.14% 94.93% 95.71% 95.50% 95.78% 95.30%
Poly. - SVM 95.36% 93.64% 79.44% 86.58% 91.93% 89.00% 92.17% 89.73%
RBF - SVM 95.36% 94.29% 92.57% 93.43% 94.71% 94.78% 20.27% 83.63%
k - NN 97.55% 97.42% 96.00% 95.35% 89.21% 86.42% 73.85% 90.83%
LPR 97.99% 97.14% 91.35% 94.35% 96.71% 95.35% 95.50% 95.48%
PRK 97.64% 96.71% 90.85% 94.35% 96.21% 94.64% 60.42% 90.12%
LRK 97.71% 97.57% 97.42% 97.37% 94.71% 97.64% 60.28% 91.81%
LPK 97.92% 97.14% 91.35% 94.35% 96.71% 95.35% 94.71% 95.36%
Average 96.83% 96.11% 91.77% 93.84% 94.49% 93.59% 74.12% 91.53%
Kumar et al.: PCA-based Offline Handwritten Character Recognition System
350
Recognition accuracy using strategy c
In partitioning strategy c, the maximum accuracy that could be achieved is 98.85%. Using this strategy, we again saw that
PRK is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results of this
partitioning strategy, for up to seven features (7-feature) and the principal components are given in Table 3.
Table 3. Classifier recognition accuracy for Category 1, Strategy c
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 95.96% 95.05% 94.95% 94.95% 95.24% 95.24% 95.14% 95.22%
Poly. - SVM 95.24% 93.72% 83.63% 88.68% 92.77% 90.48% 85.81% 90.05%
RBF - SVM 95.14% 94.29% 92.77% 93.62% 95.05% 94.48% 25.50% 84.41%
k - NN 97.42% 97.33% 97.80% 96.38% 90.09% 86.47% 77.52% 91.86%
LPR 98.57% 98.09% 98.00% 98.57% 98.57% 98.66% 98.28% 98.39%
PRK 98.85% 97.61% 93.33% 96.09% 97.42% 96.28% 65.61% 92.17%
LRK 98.57% 98.38% 98.47% 98.47% 98.66% 98.66% 65.23% 93.78%
LPK 98.66% 97.9% 93.04% 95.99% 97.71% 96.28% 94.48% 96.29%
Average 97.30% 96.55% 93.99% 95.34% 95.68% 94.56% 75.94% 92.77%
Recognition accuracy using strategy d
In this subsection, recognition results using strategy d are presented.Using this strategy, we achieved a maximum
accuracy of 99.28% when we use the LRK classifier combination. Recognition results for the features and the principal
components under consideration using this strategy are illustrated in Table 4.
Table 4. Classifier recognition accuracy for Category 1, Strategy d
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 94.00% 94.08% 93.86% 93.72% 93.86% 93.72% 93.10% 93.76%
Poly. - SVM 94.00% 92.43% 85.30% 87.30% 92.15% 90.44% 94.43% 90.86%
RBF - SVM 94.00% 93.29% 92.15% 93.01% 93.86% 93.72% 36.51% 85.22%
k - NN 94.74% 96.14% 97.14% 97.42% 92.57% 87.28% 76.42% 91.67%
LPR 99.00% 98.57% 97.71% 98.00% 98.85% 98.85% 93.86% 97.83%
PRK 99.14% 98.42% 94.71% 96.42% 98.42% 97.71% 92.15% 96.71%
LRK 99.28% 99.14% 98.28% 98.42% 99.14% 99.28% 94.08% 98.23%
LPK 99.14% 98.71% 95.14% 96.28% 98.14% 97.28% 92.57% 96.73%
Average 96.66% 96.34% 94.28% 95.07% 95.87% 94.78% 84.14% 93.88%
Recognition accuracy using strategy e
In this subsection, classifier recognition results of partitioning strategy eare presented. LPK is the best classifier
combination when we follow this strategy. For the features and the principal components under consideration, a maximum
accuracy of 99.71% could be achieved. Recognition results of classifiers and their combinations for up to seven features
(7-feature) and the principal components are given in Table 5.
■ Recognition Accuracy for Category 2 Database
Smart Computing Review, vol. 3, no. 5, October 2013
351
In this section, we consider each Gurmukhi character written 10times by 10different writers. Again the features that have
been considered here are the seven features discussed in Section 4.3 and the principal components, two principal
components (2-PC), three principal components (3-PC), …, seven principal components (7-PC) have been considered and
taken as input for the classifiers. Partitioning strategy experimental results are presented in the following subsections.
Table 5. Classifier recognition accuracy for Category 1, Strategy e
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 89.45% 89.45% 89.46% 89.46% 89.46% 89.46% 89.45% 89.46%
Poly. - SVM 89.46% 87.17% 80.62% 84.90% 87.74% 86.03% 89.46% 86.48%
RBF - SVM 89.74% 89.46% 88.31% 88.60% 98.74% 90.02% 70.08% 87.85%
k - NN 96.42% 97.71% 96.57% 97.14% 88.57% 79.14% 69.71% 89.32%
LPR 99.42% 98.28% 97.99% 98.57% 99.42% 99.14% 99.42% 98.89%
PRK 98.85% 98.28% 94.85% 97.14% 99.14% 97.99% 98.57% 97.83%
LRK 98.85% 99.14% 98.85% 99.42% 99.14% 99.14% 98.85% 99.06%
LPK 99.42% 99.14% 95.14% 96.85% 99.71% 98.00% 98.00% 98.04%
Average 95.20% 94.83% 92.72% 94.01% 95.24% 92.36% 89.19% 93.37%
Recognition accuracy using strategy a
In this subsection, classifier recognition results of partitioning strategy a are presented. When we consider this strategy, k-
NN is the best classifier for offline handwritten Gurmukhi character recognition. The maximum accuracy achieved was
94.51% for this strategy. Recognition results of classifiers and their combinations are given in Table 6.
Table 6. Classifier recognition accuracy for Category 2, Strategy a
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 77.78% 76.58% 74.75% 75.61% 79.55% 81.38% 76.52% 77.45%
Poly. - SVM 75.96% 53.68% 25.58% 33.75% 41.86% 45.34% 53.68% 47.12%
RBF - SVM 80.29% 75.32% 73.04% 75.67% 78.64% 79.72% 15.07% 68.25%
k - NN 91.42% 94.51% 83.71% 75.77% 69.77% 71.42% 60.51% 78.16%
LPR 80.45% 74.62% 70.62% 73.42% 77.99% 78.97% 75.19% 75.89%
PRK 79.85% 65.02% 49.25% 51.65% 58.17% 61.54% 38.68% 57.74%
LRK 79.82% 76.57% 75.60% 75.48% 79.65% 81.14% 37.37% 72.23%
LPK 78.34% 65.34% 49.71% 51.77% 58.00% 61.65% 78.74% 63.36%
Average 80.49% 72.71% 62.78% 64.14% 67.95% 70.14% 54.47% 67.52%
Recognition accuracy using strategy b
In partitioning strategy b, the maximum accuracy that could be achieved is 94.5%. Using this strategy, we again observed
that k-NN is the best classifier for offline handwritten Gurmukhi character recognition. Recognition results of this
partitioning strategy, for up to seven features (7-feature) and the principal components are depicted in Table 7.
Recognition accuracy using strategy c
We achieved an accuracy of 95.14% when we used strategy c, and we infer that LPR is the best classifier combination for
offline handwritten Gurmukhi character recognition withthis strategy. Recognition results for this partitioning strategy are
given in Table 8.
Kumar et al.: PCA-based Offline Handwritten Character Recognition System
352
Recognition accuracy using strategy d
In this subsection, classifier recognition results of partitioning strategy d are presented. When we consider this strategy,
LPK is the best classifier combination for offline handwritten Gurmukhi character recognition. The maximum accuracy
that could be achieved is 97.71% withthis strategy. Recognition results are depicted in Table 9.
Recognition accuracy using strategy e
In partitioning strategy e, the maximum accuracy that could be achieved is 99.42%. Using this strategy, we noticed that,
again, LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results
for the features and the principal components under consideration using this strategy are illustrated in Table 10.
Table 7. Classifier recognition accuracy for Category 2, Strategy b
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 56.24% 79.15% 78.37% 79.22% 82.51% 83.94% 80.01% 77.06%
Poly. - SVM 55.67% 62.04% 34.33% 40.82% 51.32% 56.81% 82.29% 54.75%
RBF - SVM 57.05% 79.37% 77.37% 79.73% 81.44% 83.87% 17.91% 68.11%
k - NN 93.14% 94.50% 86.21% 77.85% 73.28% 73.00% 59.92% 79.70%
LPR 84.50% 80.00% 77.14% 79.57% 82.42% 83.57% 79.42% 80.95%
PRK 83.85% 72.00% 58.07% 59.92% 65.42% 70.28% 43.28% 64.69%
LRK 83.64% 81.35% 80.07% 80.92% 83.78% 85.50% 42.07% 76.76%
LPK 82.57% 72.07% 57.78% 59.14% 65.78% 70.07% 82.78% 70.03%
Average 74.58% 77.56% 68.66% 69.64% 73.24% 75.88% 60.96% 71.50%
Table 8. Classifier recognition accuracy for Category 2, Strategy c
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 82.68% 82.20% 82.39% 82.49% 84.49% 86.58% 82.20% 83.29%
Poly. - SVM 83.06% 70.40% 42.43% 50.04% 59.72% 67.07% 84.87% 65.37%
RBF - SVM 84.97% 81.25% 79.82% 81.44% 83.92% 85.82% 22.64% 74.27%
k - NN 94.57% 95.61% 86.19% 83.04% 79.52% 76.57% 66.95% 83.21%
LPR 95.14% 83.14% 80.57% 82.47% 84.85% 87.42% 82.66% 85.18%
PRK 86.38% 78.66% 64.66% 68.09% 74.28% 78.00% 50.19% 71.47%
LRK 86.57% 84.66% 82.95% 83.99% 86.66% 88.76% 48.85% 80.35%
LPK 84.95% 79.42% 65.33% 68.66% 74.19% 77.61% 84.95% 76.44%
Average 87.29% 81.92% 73.04% 75.02% 78.45% 80.97% 65.41% 77.44%
■ Recognition Accuracy for Category 3 Database
In this section, we consider each Gurmukhi character written by 100different writers. Here, the seven features discussed in
Section 4.3 and the principal components—two principal components (2-PC), three principal components (3-PC), …, seven
principal components (7-PC)—have again been considered and taken as input to the classifiers. The results are presented in
the following subsections.
Recognition accuracy using strategy a
In this subsection, we present classifier recognition results of partitioning strategy a. In this strategy, the maximum
Smart Computing Review, vol. 3, no. 5, October 2013
353
accuracy that could be achieved is 79.48%. Using this strategy, we observed that LPR is the best classifier combination
for offline handwritten Gurmukhi character recognition. Recognition results of classifiers and their combinations are
given in Table 11 for up to seven features (7-feature) and the principal components.
Table 9. Classifier recognition accuracy for Category 2, Strategy d
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 90.72% 90.01% 90.58% 91.01% 92.01% 93.01% 88.59% 90.85%
Poly. - SVM 91.87% 83.02% 55.63% 65.19% 75.89% 80.59% 92.72% 77.84%
RBF - SVM 91.58% 88.73% 88.01% 88.87% 90.44% 91.72% 31.66% 81.57%
k - NN 93.57% 94.42% 87.42% 82.57% 83.71% 77.86% 67.57% 83.87%
LPR 97.28% 93.57% 92.48% 93.28% 94.85% 96.71% 93.28% 94.49%
PRK 97.42% 92.00% 78.42% 82.85% 89.00% 90.85% 61.28% 84.55%
LRK 97.28% 95.28% 94.57% 94.14% 96.14% 97.28% 59.42% 90.59%
LPK 97.71% 93.71% 80.00% 83.99% 89.71% 91.71% 95.28% 90.30%
Average 94.67% 91.34% 83.38% 85.23% 88.96% 89.96% 73.72% 86.75%
Table 10. Classifier recognition accuracy for Category 2, Strategy e
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 89.45% 89.46% 89.45% 89.46% 89.45% 89.74% 88.89% 89.41%
Poly. - SVM 89.17% 87.17% 67.80% 80.63% 84.90% 84.33% 80.63% 82.09%
RBF - SVM 88.60% 87.46% 87.46% 88.03% 88.31% 88.60% 75.49% 86.28%
k - NN 92.85% 95.71% 93.71% 85.14% 78.00% 62.86% 48.00% 79.47%
LPR 99.42% 98.00% 97.99% 98.57% 99.14% 98.57% 99.42% 98.73%
PRK 97.42% 97.42% 90.28% 94.00% 95.42% 95.14% 99.42% 95.59%
LRK 98.85% 98.57% 98.57% 98.57% 97.99% 99.14% 98.85% 98.65%
LPK 98.57% 98.57% 90.57% 94.28% 95.99% 95.71% 98.85% 96.08%
Average 94.29% 94.04% 89.47% 91.08% 91.15% 89.26% 86.19% 90.78%
Recognition accuracy using strategy b
In partitioning strategy b, the maximum accuracy that could be achieved is 81.78%. Using this strategy, we saw that,
again, LPR is the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition
results for this strategy are illustrated in Table 12.
Recognition accuracy using strategy c
In this subsection, classifier recognition results of partitioning strategy c have been presented. Here, LPR is again the best
classifier combination when we followed this strategy. A maximum recognition accuracy of 81.8% could be achieved
with this strategy. Recognition results of classifiers and their combinations for up to seven features (7-feature) and the
prinicipal components are given in Table 13.
Recognition accuracy using strategy d
In partitioning strategy d, the maximum accuracy that could be achieved is 84%. Using this strategy, we found PRK is
the best classifier combination for offline handwritten Gurmukhi character recognition. Recognition results for this
strategy are given in Table 14.
Kumar et al.: PCA-based Offline Handwritten Character Recognition System
354
Table 11. Classifier recognition accuracy for Category 3, Strategy a
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 74.87% 72.81% 71.62% 72.87% 77.27% 77.84% 75.78% 74.72%
Poly. - SVM 75.04% 30.21% 14.39% 17.70% 23.87% 34.72% 69.67% 37.94%
RBF - SVM 78.35% 69.33% 66.59% 67.10% 72.92% 72.92% 17.81% 63.57%
k - NN 77.27% 75.71% 64.11% 58.45% 48.80% 57.54% 43.88% 60.82%
LPR 79.48% 68.99% 64.91% 65.94% 72.62% 74.85% 74.57% 71.62%
PRK 78.34% 51.37% 38.45% 37.54% 41.37% 48.74% 26.57% 46.05%
LRK 78.74% 72.79% 69.88% 69.95% 75.37% 75.54% 25.59% 66.84%
LPK 76.62% 53.31% 40.45% 39.31% 43.42% 50.97% 77.14% 54.46%
Average 77.33% 61.81% 53.8% 53.60% 56.95% 61.64% 51.37% 59.50%
Table 12. Classifier recognition accuracy for Category 3, Strategy b
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 75.80% 73.73% 73.94% 74.08% 77.73% 78.08% 75.44% 75.54%
Poly. - SVM 76.15% 35.68% 16.27% 21.77% 30.69% 40.54% 51.32% 38.92%
RBF - SVM 78.44% 70.59% 67.16% 68.45% 73.16% 74.23% 20.27% 64.61%
k - NN 79.71% 77.57% 67.00% 59.28% 47.14% 57.57% 40.71% 61.28%
LPR 81.78% 70.35% 68.78% 68.42% 74.14% 75.71% 75.14% 57.06%
PRK 79.57% 55.42% 40.49% 39.85% 44.85% 52.92% 25.00% 48.30%
LRK 77.00% 74.14% 72.57% 71.64% 76.07% 77.21% 23.71% 67.48%
LPK 78.35% 56.64% 43.50% 42.07% 46.57% 55.00% 77.28% 57.06%
Average 78.35% 64.26% 56.21% 55.69% 58.79% 63.90% 48.60% 58.78%
Table 13. Classifier recognition accuracy for Category 3, Strategy c
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 77.73% 74.50% 74.59% 75.45% 79.92% 80.01% 75.73% 76.85%
Poly. - SVM 77.35% 43.67% 21.12% 27.40% 38.24% 45.09% 59.72% 44.66%
RBF - SVM 80.20% 72.78% 70.02% 70.98% 75.26% 75.74% 25.50% 67.21%
k - NN 81.14% 80.28% 68.57% 60.66% 50.76% 59.71% 42.76% 63.41%
LPR 81.80% 72.47% 69.52% 70.30% 77.23% 77.14% 81.80% 75.75%
PRK 80.95% 59.52% 44.66% 44.85% 52.85% 57.33% 75.71% 59.41%
LRK 81.04% 74.85% 73.14% 73.14% 78.19% 78.76% 72.09% 75.89%
LPK 80.00% 60.47% 47.52% 46.47% 54.19% 59.80% 77.61% 60.87%
Average 80.02% 67.31% 58.64% 58.65% 63.33% 66.69% 63.86% 65.50%
Smart Computing Review, vol. 3, no. 5, October 2013
355
Table 14. Classifier recognition accuracy for Category 3, Strategy d
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 78.03% 75.89% 75.04% 75.03% 79.03% 80.74% 75.03% 76.97%
Poly. - SVM 78.60% 51.92% 26.24% 29.81% 46.21% 51.21% 81.45% 52.21%
RBF - SVM 80.74% 70.89% 69.32% 71.04% 75.74% 76.31% 32.46% 68.07%
k - NN 80.28% 77.71% 62.57% 54.86% 42.42% 51.28% 35.28% 57.77%
LPR 83.28% 76.14% 71.99% 73.57% 79.71% 80.85% 81.45% 78.14%
PRK 84.00% 66.14% 45.57% 48.42% 57.85% 61.57% 78.71% 63.18%
LRK 83.99% 77.42% 75.14% 77.00% 79.71% 80.42% 74.42% 78.30%
LPK 83.28% 68.14% 50.71% 49.57% 59.85% 62.85% 80.57% 65.00%
Average 81.52% 70.53% 59.57% 59.91% 65.06% 68.15% 67.42% 67.45%
Recognition accuracy using strategy e
In this subsection, classifier recognition results of partitioning strategy e are presented. Here, PRK is the best classifier
combination for offline handwritten Gurmukhi character recognition. We achieved a maximum recognition accuracy of
84.9% withthis strategy. Recognition results are shown in Table 15.
Table 15. Classifier recognition accuracy for Category 3, Strategy e
Classifier 2-PC 3-PC 4-PC 5-PC 6-PC 7-PC 7-
Feature Average
Linear -SVM 74.64% 70.65% 70.94% 70.08% 73.21% 76.35% 69.23% 72.16%
Poly. - SVM 76.07% 51.85% 28.49% 33.33% 47.86% 54.70% 67.00% 51.33%
RBF - SVM 79.48% 65.24% 62.39% 64.96% 69.23% 72.36% 65.24% 68.41%
k - NN 77.14% 72.85% 57.14% 51.42% 35.71% 35.42% 27.71% 51.06%
LPR 83.71% 72.28% 65.42% 68.57% 76.00% 79.42% 79.99% 75.06%
PRK 84.90% 62.28% 42.28% 48.57% 54.85% 59.42% 73.71% 60.86%
LRK 83.71% 73.14% 69.71% 72.28% 73.14% 76.85% 69.14% 74.00%
LPK 81.42% 66.57% 45.14% 48.28% 57.71% 60.85% 75.42% 62.20%
Average 80.13% 66.85% 55.18% 57.18% 60.96% 64.42% 65.93% 64.38%
Conclusion
The work presented in this paper proposes an offline handwritten Gurmukhi character recognition system using PCA. The
features of a character that have been considered in this work include zoning features, diagonal features, directional
features, transition features, intersection and open end points features, parabola curve fitting–based features and power
curve fitting–based features. The classifiers employed in this work are k-NN, Linear-SVM, Polynomial-SVM and RBF-
SVM and combinations of these. Database category and strategy recognition accuracy is depicted in Table 16, and we
conclude that 2-PC is more efficient than other feature sets. The proposed system achieves an average recognition
accuracy of 99.06% fromthe category 1 databasewhen strategy e and the LRK classifier is used, 98.73% fromthe category
2 databasewhen strategy e and the LPR classifier is used, and 78.30% fromthe category 3 database when strategy d and
the LRK classifier is used. This accuracy can further be increased by considering a larger data set while training the
classifier. This work can also be extended for offline handwritten character recognition of other Indian scripts.
Kumar et al.: PCA-based Offline Handwritten Character Recognition System
356
Table 16. Database category wise recognition accuracy
Database category Feature Classifier Accuracy (%)
Category 1 Strategy a 2-PC PRK 97.48%
Category 1 Strategy b 2-PC LPR 97.99%
Category 1 Strategy c 2-PC PRK 98.85%
Category 1 Strategy d 2-PC LRK 99.28%
Category 1 Strategy e 6-PC LPK 99.71%
Category 2 Strategy a 3-PC k - NN 94.51%
Category 2 Strategy b 3-PC k - NN 94.50%
Category 2 Strategy c 2-PC LPR 95.14%
Category 2 Strategy d 2-PC LPK 97.71%
Category 2 Strategy e 2-PC LPR 99.42%
Category 3 Strategy a 2-PC LPR 79.48%
Category 3 Strategy b 2-PC LPR 81.78%
Category 3 Strategy c 2-PC LPR 81.80%
Category 3 Strategy d 2-PC PRK 84.00%
Category 3 Strategy e 2-PC PRK 84.90%
References
[1] V. N. M. Aradhya, G. H. Kumar, S. Noushath, ―Multilingual OCR system for south Indian scripts and English
documents: An approach based on Fourier transform and principal component analysis,‖ Engineering Applications of
Artificial Intelligence, vol. 21, pp. 658-668, 2008. Article(CrossRef Link)
[2] S. Basu, N. Das, R. Sarkar, M. Kundu, M. Nasipuri, D. K. Basu, ―A hierarchical approach to recognition of
handwritten Bangla characters,‖ Pattern Recognition, vol. 42, no. 7, pp. 1467-1484, 1999. Article(CrossRef Link)
[3] U. Bhattacharya, M. Shridhar, S. K. Parui, P. K. Sen, B. B. Chaudhuri, ―Offline recognition of handwritten Bangla
characters: an efficient two-stage approach,‖ Pattern Analysis and Applications, vol. 15, no. 4, pp. 445-458, 2012.
Article(CrossRef Link)
[4] T. K. Bhowmik, P. Ghanty, A. Roy, S. K. Parui, ―SVM-based hierarchical architectures for handwritten Bangla
character recognition,‖ International Journal of Document Analysis Recognition, vol. 12, no. 2, pp. 97-108, 2009.
Article(CrossRef Link)
[5] V. Deepu, S. Madhvanath, R. G. Ramakrishnan, ―Principal Component Analysis for online handwritten character
recognition,‖ in Proc. of 17th International Conference on Pattern Recognition, vol. 2, pp. 327-330, 2004.
[6] P. D. Gader, M. Mohamed, J. H. Chiang, ―Handwritten word recognition with character and inter-character neural
networks,‖ IEEE Transactions on Systems, Man, and Cybernetics-Part B: Cybernetics, vol. 27, no. 1, pp. 158-164,
1997. Article(CrossRef Link)
[7] M. Kumar, M. K. Jindal, R. K. Sharma, ―SVM based offline handwritten Gurmukhi character recognition,‖ in Proc. of
International Workshop on Soft Computing and Knowledge Discovery‖, vol. 758, pp. 51-62, 2011.
[8] M. Kumar, M. K. Jindal, R. K. Sharma, ―k-NN based offline handwritten Gurmukhi character recognition,‖ in Proc. of
International Conference on Information and Image Processing, pp.1-4, 2011.
[9] M. Kumar, M. K. Jindal, R. K. Sharma, ―Classification of Characters and Grading Writers in Offline Handwritten
Gurmukhi Script,‖ in Proc. of International Conference on Information and Image Processing, pp. 1-4, 2011.
[10] M. Kumar, M. K. Jindal, R. K. Sharma, ―Offline Handwritten Gurmukhi Character Recognition using Curvature
Feature,‖ in Proc. of International Conference on AMOC, pp. 981-989, 2011.
[11] G. S. Lehal, C. Singh, ―A Gurmukhi script recognition system,‖ in Proc. of 15th
International Conference on Pattern
Recognition, vol. 2, pp. 557-560, 2000.
[12] U. Pal, B. B. Chaudhuri, ―Indian script character recognition: A survey,‖ Pattern Recognition, vol. 37, no. 9, pp.
1887–1899, 2004. Article(CrossRef Link)
Smart Computing Review, vol. 3, no. 5, October 2013
357
[13] U. Pal, T. Wakabayashi, F. Kimura, ―Handwritten Bangla Compound Character Recognition using Gradient Feature,‖
in Proc. of 10th
International Conference on Information Technology, pp. 208-213, 2007
[14] U. Pal, T. Wakabayashi, F. Kimura, ―Handwritten numeral recognition of six popular scripts,‖ in Proc. of
International Conference on Document Analysis and Recognition (ICDAR 07), vol. 2, pp. 749-753, 2007.
[15] U. Pal, T. Wakabayashi, F. Kimura, ―A system for off-line Oriya handwritten character recognition using curvature
feature,‖ in Proc. of 10th International Conference on Information Technology, pp. 227-229, 2007.
[16] A. Sharma, R. Kumar, R. K. Sharma, ―Online handwritten Gurmukhi character recognition using elastic matching,‖
International Journal of Congress on Image and Signal Processing, vol. 2, pp. 391-396, 2008.
[17] B. Singh, A. Mittal, D. Ghosh, ―An Evaluation of Different feature extractors and classifiers for offline handwritten
Devanagri character recognition,‖ Journal of Pattern Recognition Research, vol. 2, pp. 269-277, 2011.
Article(CrossRef Link)
[18] S. Sundaram, A. G. Ramakrishnan, ―Two Dimensional Principal Component Analysis for Online Tamil Character
Recognition,‖ in Proc. of 11th International Conference Frontiers in Handwriting Recognition, pp. 88-94, 2008.
[19] Y. Wen, Y. Lub, P. Shi, ―Handwritten Bangla numeral recognition system and its application to postal automation,‖
Pattern Recognition, vol. 40, no. 1, pp. 99-107, 2007. Article(CrossRef Link)
[20] T. Y. Zhang, C. Y. Suen, ―A fast parallel algorithm for thinning digital patterns,‖ Communications of the ACM, vol.
27, no. 3, pp. 236-239, 1984. Article(CrossRef Link)
Munish Kumar received his Masters degree in Computer Science & Engineering from Thapar
University, Patiala, India in 2008. He started his career as an Assistant Professor in computer
application at Jaito Centre of Punjabi University, Patiala. He is working as Assistant Professor in
the Computer Science Department, Panjab University Rural Centre, Kauni, Muktsar, Punjab,
India. He is currently pursuing his Ph.D. degree from Thapar University, Patiala, Punjab, India.
His research interests include Character Recognition.
Manish Kumar Jindal received his Bachelors degree in science in 1996 and Post Graduate degree
in Computer Applications from Punjabi University, Patiala, India in 1999. He holds a Gold Medal
in his post graduation. He received his Ph.D. degree in Computer Science & Engineering from
Thapar University, Patiala, India in 2008. He is working as Associate Professor in Panjab
University Regional Centre, Muktsar, Punjab, India. His research interests include Character
Recognition and Pattern Recognition.
Rajendra Kumar Sharma received his Ph.D. degree in Mathematics from the University of
Roorkee (Now, IIT Roorkee), India in 1993. He is currently working as Professor at Thapar
University, Patiala, India, where he teaches, among other things, statistical models and their usage
in computer science. He has been involved in the organization of a number of conferences and
other courses at Thapar University, Patiala. His main research interests are statistical models in
computer science, Neural Networks, and Pattern Recognition.
Copyrights © 2013 KAIS