Building A Compact MQDF Classifier by Sparse Coding and Vector Quantization Technique
Building A Compact MQDF Classifier by Sparse Coding and Vector Quantization Technique
Abstract—The modified quadratic discriminant function denoising [14, 15]. In this paper, we propose an approach for
(MQDF) is a very popular handwritten Chinese character building a compact MQDF classifier by integrating sparse
classifier thanks to its high performance with low
computational complexity. However, it suffers from high coding with the VQ technique. Our goal is to further reduce
memory requirement for the storage of its parameters. the memory space of the MQDF classifier without sacrificing
This paper proposes a compact MQDF classifier developed recognition accuracy.
by integrating sparse coding and vector quantization (VQ) The effectiveness of our proposed compact MQDF
technique. To be specific, we use sparse coding to represent the
parameters of MQDF in sparsity first, and then employ the VQ
classifier is tested in a handwritten Chinese character
technique to further compress the sparse coding. The proposed recognition system. The diagram of the recognition system
method is evaluated by comparing the performance with is given in Fig. 1. As shown in the figure, the discriminative
three models, i.e., the original MQDF classifier, the compact features are extracted from the pre-processed training images
MQDF classifier using the VQ technique, and the compact which are then used to train the MQDF parameters. For
MQDF classifier using sparse coding. The effectiveness of our
proposed approach has been confirmed and demonstrated by
compressing the storage, the parameters are first stored in
comparative experiments on ICDAR2013 competition dataset. a form of sparsity via sparse coding. To be specific, the
parameters are used to learn an overcomplete dictionary.
Keywords-compact MQDF classifier; sparse coding; vector
quantization technique;
Then the parameters are encoded by the learned dictionary
to generate a new sparse representation, i.e., each parameter
I. Introduction is represented by some linear combination of a small
number of dictionary elements. To further compress the
The modified quadratic discriminant function (MQDF) [1] storage, the VQ technique is then employed to compress
is a very popular classifier thanks to its high performance the sparse coding. In this way, a compact MQDF is derived.
with low computational complexity, and it has been applied In recognition phase, the parameters are reconstructed by
prevalently in handwritten Chinese character recognition sparse linear combination of the atoms to recognize the input
(HCCR) during the past two decades [2–6]. However, the images.
storage requirement of a MQDF classifier is usually very The remainder of this paper is organized as follows:
large [7, 8], which results in that such a classifier cannot Section 2 describes the MQDF classifier and its compact
be directly embedded into the memory-limited hand-held
devices.
In order to make the MQDF classifier more compact, the
vector quantization (VQ) technique has often been applied
to compress the parameters of a MQDF classifier [9, 10].
It compresses the principal eigenvectors of the MQDF
classifier by subspace-distribution sharing which combines
the VQ technique and clustering algorithm.
Recently, sparse coding has emerged as a successful tool
for analyzing a large class of signals [11–13]. Many signals
can be efficiently represented with a linear superposition
of a few properly chosen basis functions. Such compact
representation of signals is desirable and has been widely Figure 1. The block diagram of our proposed compact MQDF classifier
used in applications such as signal compression and integrating sparse coding and VQ technique.
455
reduce the storage size of MQDF classifier. Assuming that have T total = (T max + T min ) × K/2, where T max ≤ d
we have calculated a set of eigenvectors Φ = {φi j }(φi j ∈ and T min ≥ 2 are the number of nonzero elements
RD , i = 1, . . . , M; j = 1, . . . , K) from the covariance matrixes assigned to the first and last eigenvector of each class,
of all classes. Then, we learn an overcomplete dictionary respectively.
Θ = {θ j }dj=1 (θ j ∈ RD , d ≥ D) by fitting the d basis Finally, we only store the nonzero elements of A and
vectors (also called atoms) {θ j } to those eigenvectors, so the learned dictionary instead of all the eigenvectors. When
that each eigenvector φi j ∈ RD can be represented as a recognizing the characters, the reconstructed eigenvectors
sparse linear combination of these atoms. In other words, = ΘA will be used to compute the MQDF distance.
Φ
the representation of φi j may be an approximate φi j ≈ Θai j ,
where ai j ∈ Rd is the sparse coding of φi j , which contains D. Compact MQDF classifier via combining sparse coding
T 0 (T 0 < D) or fewer non-zero elements. The problem we and vector quantization technique
described above can be formulated as follows, Sparse representation seems to be a seductive solution to
the storage problem, since each vector is stored as a sparse
min{Φ − ΘA22 } subject to ∀i j, ai j 0 ≤ T 0 , (3)
Θ,A linear combination of an overcomplete basis (dictionary).
However, storing the dictionary and sparse coding is
where .0 is the 0 −norm, counting the nonzero elements of
necessary for the reconstruction of those vectors, which
ai j . A = {ai j }(ai j ∈ Rd , i = 1, . . . , M; j = 1, . . . , K) is defined
may also result in a large storage. For the VQ technique,
as all the sparse codings.
according to Ref. [9], the storage size of eigenvectors is
The widely used algorithm for solving the minimization
reduced to nearly one fourth with almost no accuracy loss
objective given in Equation (3) is the K-SVD method [16].
with the setting of DQ = 1. Moreover, the compression
The K-SVD method is an iterative method that alternates
ratio becomes larger with the increase of DQ . However, the
between sparse coding of the data based on the current
degradation of recognition accuracy becomes larger as well.
dictionary and a process of updating the dictionary atoms
In consideration of the advantage of VQ technique under the
to better fit the data. Typically, in the sparse coding step,
setting of DQ = 1, we further compress the sparse coding
dictionary Θ is fixed and sparse codings A are computed
using the VQ technique, and hope that the loss of accuracy
using the Orthogonal Matching Pursuit (OMP) algorithm.
can be minimized.
The OMP algorithm is fast and easy to implement, and fairly
Note that we cannot use sparse coding to represent the
accurate [16]. In the dictionary update step, the atoms θ j are
indexes of VQ technique in sparsity, since the indexes must
updated sequentially using Singular Value Decomposition
be an integer and cannot be greater than 255, whereas the
(SVD), and the relevant coefficients in A are changed at the
sparse codes are always floating type. That is, we can use
same time. For a detailed description of K-SVD algorithm
sparse coding to compress the memory space of MQDF
please refer to [16].
classifier first, and then employ the VQ technique to further
Further more, we use two strategies to reduce the
compress the memory space of sparse coding. However, the
computational complexity and alleviate the degradation
method with the reversed order is not feasible.
of recognition accuracy when applying sparse coding to
Fig. 2 illustrates the combination of sparse coding
represent the eigenvectors in sparsity, as follows:
and VQ technique to compress the eigenvectors. Firstly,
1) Multiple dictionary learning. Learning a single each eigenvector (φi j ) is represented as a sparse linear
dictionary for a large set of data using K-SVD combination (sparse coding ai j ) of the atoms of dictionary
method may suffer from complex computation, since (Θ), that is, φi j = Θai j . The zero elements of ai j (the white
the process of SVD takes O(n3 ) operations [17]. To bins shown in Fig. 2) are not being stored, which can result
deal with this problem, we split the eigenvectors into in a compact space. Secondly, we use the VQ technique
small parts and then learn multiple dictionaries to deal to further compress the memory space, i.e., the nonzero
with each part respectively. Specifically, we use the elements of ai j are represented by the indexes (denoted by I)
eigenvectors in the same order of each class to train of the corresponding cluster centers (denoted by C). In this
the same dictionary. case, half of the storage space can be saved further because
2) Weight-based assignment on the number of nonzero the nonzero elements with 2-byte are able to be substituted
elements T 0 . We consider the importance of each by the indexes which need only 1-byte to store in. Note that
eigenvectors in one class, which has different influence the choice of DQ and L are 1 and 256, respectively, which
on the recognition accuracy of MQDF classifier, and can prevent the loss of accuracy well.
then assign different T 0 to different eigenvectors in
one class. Specifically, we use a tolerance of −1 E. Storage analysis
arithmetic progression to reflect the weight of each The parameters of sparse coding include the dictionary
eigenvectors in one class. Let T total denote the total Θ and the sparse coding A. For sparse coding A, since we
number of nonzero elements for each class, we thus only need to store the nonzero elements, the positions of the
456
Figure 3. The curve of the inequality (9) when D = 160, M = 3755,
K = 40, DQ = 1, L = 256, and = 2. The MQDF classifier can be
compacted when ζ is in the shadow area.
457
A. Handwritten Chinese character recognition system C. Comparison results
A handwritten Chinese character recognition system based The performance includes recognition accuracy (%), total
on MQDF classifier generally consists of three major storage size (MB), and compression ratio. We compare
components, i.e., character normalization, feature extraction, the performance of MQDF-SP classifier and MQDF-VQ
and classification using MQDF classifier. classifier by varying settings of T max and DQ , respectively.
Firstly, each gray character images with different sizes The value of T max is taken in the range of [25, 90] and the
are normalized to a size of 64 × 64 by the method of choice of DQ is 1, 2, 3, and 4. The number of cluster centers
nonlinear gray level normalization [18] and line density L is set to 255 which is the maximal value that 1-byte index
projection interpolation (LDPI) [19]. Then, the gradient can store. For the MQDF-SP-VQ classifier, DQ is fixed to
elements are extracted from the normalized image by using 1 and T max is taken in the range of [25, 90] as well. The
normalization-cooperated gradient feature method (NCGF) comparison results of the three compact MQDF classifiers
[20]. After that, those gradient elements are decomposed and the original MQDF classifier are shown in Fig. 4.
into eight directions and on each direction 8 × 8 values
are extracted by Gaussian blurring. In total, 512-dimension
features are thus extracted for each character image. We
perform the square root for each feature empirically, which
is helpful to improve the classification performance of
statistical classifiers [21]. In order to reduce the classifier’s
complexity, the method of Fisher linear discriminant analysis
(FLDA) is used to decrease the dimension of feature from
512 to 160 [21]. Considering that the MQDF is time
consuming, multiple levels of classifiers could be used to
accelerate the recognition process. In our experiments, a
two-level classification system is employed. The first-level
classifier is a coarse classifier which is used to select
200 top rank classes according to Euclidean distance. The
(a) Recognition accuracy vs. storage size.
second-level classifier, MQDF, is then computed on the 200
candidate classes only.
B. Parameter settings and dataset
For the solution of sparse coding, the dictionary Θ is first
randomly initialized and each dictionary atom is normalized
to a unit 2 − norm. The number of dictionary atoms d is
set to be equal to the dimension of the extracted features,
that is, d = D = 160. The maximum number of iterations of
K-SVD is set to 50. In general, the values of parameters are
actually stored in 4-byte floating type. For further reducing
the memory space, we convert the 4-byte floating type
parameters into 2-byte short integer type by multiplying
a constant. Accordingly, during the computation of MQDF
(b) Recognition accuracy vs. compression ratio.
distance, these parameters stored in 2-byte short integer type
need to be recovered by dividing the same constant used Figure 4. Comparison of different methods.
before.
The performance of MQDF-VQ classifier is evaluated We can see from the results that the quantization of VQ
on ICDAR2013 competition dataset [8]. For ICDAR2013 technique used in MQDF-VQ classifier with DQ = 1 is
competition dataset, the method of Ref. [9] has achieved nearly perfect, and the compression ratio is more than 3
86.01% recognition accuracy with 6.51MB (parameter with almost no accuracy loss. However, as DQ increases,
settings is K = 12, and the split VQ technique is only the degradation of recognition accuracy becomes larger and
used to compact principal eigenvectors) [8]. Using the same larger, especially in the case of DQ ≥ 3. For MQDF-SP
settings, the accuracy of our uncompressed HCCR system classifier, although the recognition accuracy is lower than
has reached 90.2% on this dataset. Under this condition, that of MQDF-VQ under the setting of DQ = 1, the
we compare the performance of three compact MQDF sparse coding used in MQDF-SP classifier demonstrates its
classifiers, viz., MQDF-VQ classifier, MQDF-SP classifier advantages in the case that the compression ratio becomes
and MQDF-SP-VQ classifier. larger. The recognition accuracy of MQDF-SP classifier
458
is higher than that of the MQDF-VQ classifier if the [6] Ming-Ke Zhou, Xu-Yao Zhang, Fei Yin, and Cheng-Lin Liu,
compression ratio is larger than 4. “Improving handwritten Chinese character recognition with
discriminative quadratic feature extraction,” in International
When we implement the VQ technique with DQ = Conference on Pattern Recognition, 2014, pp. 244–249.
1 to compress the sparse coding, we can see that, the [7] Hai-Long Liu and Xiao-Qing Ding, “Handwritten character
MQDF-SP-VQ classifier not only further reduces the total recognition using gradient feature and quadratic classifier
storage size, but also has almost no recognition accuracy with multiple discrimination schemes,” in International
Conference on Document Analysis and Recognition, 2005,
loss compared to the MQDF-SP classifier under the same pp. 19–23.
settings. [8] Cheng-Lin Liu, Fei Yin, Qiu-Feng Wang, and Da-Han Wang,
“ICDAR 2011 Chinese handwriting recognition competition,”
IV. Conclusion in International Conference on Document Analysis and
In this paper, we proposed to build a compact MQDF Recognition, 2011, pp. 1464–1469.
[9] Teng Long and Lian-Wen Jin, “Building compact MQDF
classifier by integrating sparse coding and VQ technique. classifier for large character set recognition by subspace
Many existing methods applied the VQ technique to distribution sharing,” Pattern Recognition, vol. 41, no. 9, pp.
compress the parameters of MQDF classifier. This methods 2916–2925, 2008.
can lead to a compact classifier which size is nearly one [10] Jin-Feng Gao, Bi-Lan Zhu, and Masaki Nakagawa,
“Development of a robust and compact on-line handwritten
fourth of the original classifier’s and the accuracy has Japanese text recognizer for hand-held devices,” IEICE
almost no loss under the condition that the dimension of Transactions on Information and Systems, vol. 96, no. 4, pp.
sub-vectors is 1 (that is DQ = 1). However, with the further 927–938, 2013.
increase of compression ratio, the accuracy of MQDF-VQ [11] Tanaya Guha and Rabab Kreidieh Ward, “Learning
sparse representations for human action recognition,” IEEE
classifier will significantly decrease. Sparse coding used Transactions on Pattern Analysis and Machine Intelligence,
for building compact MQDF-SP classifier can alleviate the vol. 34, no. 8, pp. 1576–1588, 2012.
degradation of recognition accuracy, especially in the case [12] John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar
that the compression ratio becomes larger compared to the Sastry, and Yi Ma, “Robust face recognition via sparse
representation,” IEEE Transactions on Pattern Analysis and
MQDF-VQ classifier. In consideration of the advantage of Machine Intelligence, vol. 31, no. 2, pp. 210–227, 2009.
VQ technique under the setting of DQ = 1, we further [13] Anil M. Cheriyadat, “Unsupervised feature learning for aerial
compress the sparse coding using the VQ technique. The scene classification,” IEEE Transactions on Geoscience and
experimental results have showed that our strategy not only Remote Sensing, vol. 52, no. 1, pp. 439–451, 2014.
[14] Michael Elad and Michal Aharon, “Image denoising
further reduces the storage size, but also with almost no via sparse and redundant representations over learned
recognition accuracy loss when being compared to the dictionaries,” IEEE Transactions on Image Processing, vol.
MQDF-SP classifier under the same settings. 15, no. 12, pp. 3736–3745, 2006.
[15] Ehsan Elhamifar and Rene Vidal, “Sparse subspace
Acknowledgment clustering: Algorithm, theory, and applications,” IEEE
Transactions on Pattern Analysis and Machine Intelligence,
This work is jointly supported by the Science and vol. 35, no. 11, pp. 2765–2781, 2013.
Technology Commission of Shanghai Municipality under [16] Michal Aharon, Michael Elad, and Alfred Bruckstein,
research grants 14511105500 and 14DZ2260800. “K-SVD: An algorithm for designing overcomplete
dictionaries for sparse representation,” IEEE Transactions
References on Signal Processing, vol. 54, no. 11, pp. 4311–4322, 2006.
[1] F. Kimura, K. Takashina, S. Tsuruoka, and Y. Miyake, [17] James Demmel and W. Kahan, “Accurate singular values
“Modified quadratic discriminant functions and the of bidiagonal matrices,” Siam Journal on Scientific and
application to Chinese character recognition,” IEEE Statistical Computing, vol. 11, no. 5, pp. 873–912, 1990.
Transactions on Pattern Analysis and Machine Intelligence, [18] Cheng-Lin Liu, Fei Yin, Da-Han Wang, and Qiu-Feng
vol. 9, no. 1, pp. 149–153, 1987. Wang, “Online and offline handwritten Chinese character
[2] Cheng-Lin Liu, Sakou Hiroshi, and Fujisawa Hiromichi, recognition: Benchmarking on new databases,” Pattern
“Discriminative learning quadratic discriminant function for Recognition, vol. 46, no. 1, pp. 155–162, 2013.
handwriting recognition,” IEEE Transactions on Neural [19] Cheng-Lin Liu and K. Marukawa, “Pseudo two-dimensional
Networks, vol. 15, no. 2, pp. 430–444, 2004. shape normalization methods for handwritten Chinese
[3] Yan-Wei Wang, Xiao-Qing Ding, and Chang-Song Liu, character recognition,” Pattern Recognition, vol. 38, no. 12,
“MQDF discriminative learning based offline handwritten pp. 2242–2255, 2005.
Chinese character recognition,” in International Conference [20] Cheng-Lin Liu, “Normalization-cooperated gradient feature
on Document Analysis and Recognition, 2011, pp. extraction for handwritten character recognition,” IEEE
1100–1104. Transactions on Pattern Analysis and Machine Intelligence,
[4] Yan-Wei Wang, Xiao-Qing Ding, and Chang-Song Liu, vol. 29, no. 8, pp. 1465–1469, 2007.
“MQDF retrained on selected sample set,” IEICE [21] Cheng-Lin Liu, “Handwritten Chinese character recognition:
Transactions on Information and Systems, vol. 94, no. 10, Effects of shape normalization and feature extraction,” in
pp. 1933–1936, 2011. Proceedings of the 2006 conference on Arabic and Chinese
[5] Xu-Yao Zhang and Cheng-Lin Liu, “Locally smoothed Handwriting Recognition, 2006, pp. 104–128.
modified quadratic discriminant function,” in International
Conference on Document Analysis and Recognition, 2013,
pp. 8–12.
459