American Journal of Signal Processing
p-ISSN: 2165-9354 e-ISSN: 2165-9362
2011; 1(1): 34-39
doi: 10.5923/j.ajsp.20110101.06
Hassiba Nemmour , Youcef Chibani
Signal Processing Laboratory, Faculty of Electronic, University of Houari Boumediene, Algiers, Algeria
Correspondence to: Hassiba Nemmour , Signal Processing Laboratory, Faculty of Electronic, University of Houari Boumediene, Algiers, Algeria.
| Email: | ![]() |
Copyright © 2012 Scientific & Academic Publishing. All Rights Reserved.
This paper proposes a fast and robust system for handwritten alphanumeric character recognition. Specifically, a neural SVM (N-SVM) combination is adopted for the classification stage in order to accelerate the running time of SVM classifiers. In addition, we investigate the use of tangent similarities to deal with data variability. Experimental analysis is conducted on a database obtained by combining the well known USPS database with C-Cube uppercase letters where the N-SVM combination is evaluated in comparison with the One-Against-All implementation. The results indicate that the N-SVM system gives the best performance in terms of training time and error rate.
Keywords: Handwritten Alphanumeric Characters, Svms, Tangent Vectors
Cite this paper: Hassiba Nemmour , Youcef Chibani , "Training Tangent Similarities with N-SVM for Alphanumeric Character Recognition", American Journal of Signal Processing, Vol. 1 No. 1, 2011, pp. 34-39. doi: 10.5923/j.ajsp.20110101.06.
, where
(
is the number of training data). SVMs seek the linear separating hyperplane with the largest margin by solving the following optimization problem[3,15]:![]() | (1) |
![]() | (2) |
into the decision surface[3]. Then, the goal is to find a hyperplane which minimizes misclassifications while maximizing the margin of separation such that:![]() | (3) |
![]() | (4) |
is a user-defined parameter that controls the tradeoff between the machine complexity and the number of non-separable data[4]. Commonly, a dual Lagrangian formulation of the problem in which data appear in the form of dot products, is used:![]() | (5) |
![]() | (6) |
are Lagrange multipliers.The dual problem is useful when data are not linearly separable in input space. In such a case, they are mapped into a feature space via a kernel function such that:
. The dual problem becomes:![]() | (7) |
![]() | (8) |
is the number of support vectors which are training data for which
. The optimal hyperplane corresponds to
while test data are classified according to:![]() | (9) |
|
: are user defined.Furthermore, various approaches were proposed to extend SVMs for multi-class classification problems[16]. The One-Against-All (OAA) is the earliest and the most commonly used implementation. For a C-class problem, It performs M SVMs each of which is designed to separate a class from all the others. The cth SVM is trained with all of the examples in the cth class with positive labels, and all other examples with negative labels which leads to a computational time approximately about
. Then, data are assigned to the positive class with the highest output as:![]() | (10) |
and
to a SVM.b) Delete classes
and
from the set of classes.c) Return to 1) if the number of selected pairs is less than C/2. d) Train SVM classifiers over the selected pairs of classes.e) Train a feed-forward neural network over SVM outputs.The neural network is MultiLayer Perceptron (MLP) with a single hidden layer. The input layer contains 18 nodes, which receive SVM outputs whereas each node in the output layer corresponds to a numeral class from ‘0’ to ‘Z’. On the other hand, the number of hidden nodes is fixed experimentally. The MLP is trained by the standard back propagation algorithm including the momentum factor[18]. In addition, N-SVM system receives feature vectors which are constituted by tangent similarities with respect to all classes. The computation of such similarities is detailed in the following section.
denotes a transformation of a pattern
that depends on a set of parameters
. The linear approximation of
using a Taylor expansion around
can be written as[14]:![]() | (11) |
are Tangent Vectors (TV) corresponding to partial derivatives of the transformation
with respect to
so that :![]() | (12) |
for
while for small values the transformation does not change the class membership of
. Hence, the linear approximation describes transformations such as translation, rotation and axis deformations by one prototype and its corresponding tangent vector. The first use of the TV was introduced through the so-called Tangent distance with the K-NN classifier[12]. This distance gave very good precisions for handwritten digit recognition but its time complexity was very important. Always for handwritten digit recognition, TV were used with SVM classifiers by introducing the Tangent distance into distance-based kernels such as RBF and negative distance[20]. However, the runtime of SVMs was significantly extended. More recently, a similarity and dissimilarity measures were introduced in order to allow using tangent vectors with all SVM kernels[21]. These measures are based on class-specific tangent vectors whose calculation is faster compared to the conventional scheme. Then, the time complexity is reduced but the runtime is still extended. In the present work, we aim to introduce the TV concept for alphanumeric character recognition without extending the runtime of SVMs and N-SVM. Specifically, we employ tangent similarities which are based on class-specific TV, as data features. The class-specific TV are computed as follows[14]. For a set of classes
with training data 
, tangent vectors maximizing the dissimilarity between classes are chosen such that
are the eigenvectors with largest eigenvalues of the matrix :![]() | (13) |
: Covariance matrix computed from the full dataset.
: Class dependent scatter matrix given by: ![]() | (14) |
: mean of the class
.It is straightforward to not that the number of TV should be obviously fixed by the user. In addition, each tangent vector refers to a transformation or specific variability knowledge whose number should be a priori chosen by the user. So, once estimated tangent vectors are incorporated into the covariance matrix specific to their classes such that[14]:![]() | (15) |
: user defined parameter.
: TV-based covariance matrix for the class
.Furthermore, the tangent similarity corresponds to the Tangent Vector-based Mahalanobis (TVM) distance of a pattern
with respect to a class
becomes as: ![]() | (16) |
constituted by its TVM with respect to all classes. Then, to evaluate kernel functions the distance
is replaced by
while the dot product is substituted by
. Note that benefits behind the use of such feature vectors are not only the implicit incorporation of invariance knowledge into SVMs but also the reduction of the data size. For alphanumeric character recognition the data size decreases from the character image size to a vector of 36 components (which is the number of classes). This makes the kernel evaluation as well as the training stage faster.![]() | Figure 1. Training samples from the USPS-C-Cube database. |
for negative distance while sigmoid parameters are tuned to:
. The neural network of N-SVM is a multilayer perceptron with 90 hidden nodes, the step size equals 0.08, the momentum is about 0.999 and the network is trained for 5000 iterations each of which corresponds to one processing of all training data through the network. In order to assess the behavior of tangent similarities with kernels based on a dot product and distance measure, we evaluated the Error Rate (ER) of both OAA and N-SVM by using kernels listed in Table (1). The results in terms of error rate are summarized in Table 1. In this test, the number of tangent vectors was arbitrarily fixed at 10.
|
![]() | Figure 2. Error rate variations according to the number of tangent vectors. |
|
![]() | Figure 3. Error rates respective to each class of interest. |