0% found this document useful (0 votes)

8 views17 pages

Kim 05 A

This paper discusses dimension reduction techniques for text classification using Support Vector Machines (SVMs) to improve computational efficiency without sacrificing prediction accuracy. It introduces centroid-based algorithms and a generalized discriminant analysis method that effectively preserve cluster structures in the data. Experimental results indicate that these dimension reduction methods enhance performance, particularly in clustered data scenarios.

Uploaded by

Humas Rudenim Surabaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views17 pages

Kim 05 A

Uploaded by

Humas Rudenim Surabaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Journal of Machine Learning Research 6 (2005) 37–53 Submitted 3/03; Revised 9/03; Published 1/05

Dimension Reduction in Text Classification

with Support Vector Machines

Hyunsoo Kim HSKIM @ CS . UMN . EDU

Peg Howland HOWLAND @ CS . UMN . EDU
Haesun Park HPARK @ CS . UMN . EDU
Department of Computer Science and Engineering
University of Minnesota
200 Union Street S.E., 4-192 EE/CS Building
Minneapolis MN 55455, USA

Editor: Nello Christianini

Abstract
Support vector machines (SVMs) have been recognized as one of the most successful classifica-
tion methods for many applications including text classification. Even though the learning ability
and computational complexity of training in support vector machines may be independent of the
dimension of the feature space, reducing computational complexity is an essential issue to effi-
ciently handle a large number of terms in practical applications of text classification. In this paper,
we adopt novel dimension reduction methods to reduce the dimension of the document vectors
dramatically. We also introduce decision functions for the centroid-based classification algorithm
and support vector classifiers to handle the classification problem where a document may belong to
multiple classes. Our substantial experimental results show that with several dimension reduction
methods that are designed particularly for clustered data, higher efficiency for both training and
testing can be achieved without sacrificing prediction accuracy of text classification even when the
dimension of the input space is significantly reduced.
Keywords: dimension reduction, support vector machines, text classification, linear discriminant
analysis, centroids

1. Introduction

Text classification is a supervised learning task for assigning text documents to pre-defined classes
of documents. It is used to find valuable information from a huge collection of text documents
available in digital libraries, knowledge databases, the world wide web (WWW), and company-wide
intranets, to name a few. Several characteristics have been observed in vector space based methods
for text classification (20; 21), including the high dimensionality of the input space, sparsity of
document vectors, linear separability in most text classification problems, and the belief that few
features are irrelevant. It has been conjectured that an aggressive dimension reduction may result in
a significant loss of information, and therefore, result in poor classification results (13).
Assume that training data (xi , yi ) with yi ∈ {−1, +1} for 1 ≤ i ≤ n are given. The dual formula-
tion of soft margin support vector machines (SVMs) with a kernel function K and control parameter

c 2005 Hyunsoo Kim, Peg Howland and Haesun Park.

K IM , H OWLAND AND PARK

C is
n
1 n
max
αi
∑ αi − 2 i,∑
αi α j yi y j K(xi , x j ), (1)
i=1 j=1
n
s.t. ∑ αi yi = 0, 0 ≤ αi ≤ C, i = 1, . . . , n.
i=1

The kernel function

K(xi , x j ) =< φ(xi ), φ(x j ) >,
where <, > denotes an inner product between two vectors, is introduced to handle nonlinearly
separable cases without any explicit knowledge of the feature mapping φ. The formulation (1) shows
that the computational complexity of SVM training depends on the number of training data samples
which is denoted as n. The dimension of the feature space does not influence the computational
complexity of training or testing due to the use of the kernel function.
However, an often neglected fact is that the computational complexity of training depends on
the dimension of the input space. This is clear when we consider some typical kernel functions such
as the linear kernel
K(x, xi ) =< x, xi >,
the polynomial kernel
K(x, xi ) = [< x, xi > +β]d ,
where d is the degree of the polynomial, and the Gaussian RBF (radial basis function) kernel

K(x, xi ) = exp(−γkx − xi k2 ),

where γ is a parameter to control. The evaluation of the kernel function depends on the dimension of
the input data, since the kernel functions contain the inner product of two input vectors for the linear
or polynomial kernels or the distance of two vectors for the Gaussian RBF kernel. Let α∗i denote
the optimal solution for (1). The optimal separating hyperplane f (x, α∗ , b) also requires evaluation
of the kernel function since
f (x, α∗ , b) = ∑ αi yi K(xi , x) + b
xi ∈SV

where SV denotes the set of support vectors, b is a bias given by

minyi =1 < w∗ , φ(xi ) > + maxyi =−1 < w∗ , φ(xi ) >

b=−
2
and
l
w∗ = ∑ yi αi ∗ φ(xi ).
i=1

Therefore, more efficient testing as well as training is expected from dimension reduction.
Throughout the paper, we will assume that the document set is represented in an m × n term-
document matrix A = (ai j ), in which each column represents a document, and each entry ai j repre-
sents the weighted frequency of term i in document j (1; 2). The clustering of data is assumed to be
performed previously.

38
D IMENSION R EDUCTION IN T EXT C LASSIFICATION WITH SVM S

In the next section, we review Latent Semantic Indexing (LSI) (2; 1), which uses the truncated
singular value decomposition (SVD) as a low-rank approximation of A. Although the truncated SVD
provides the closest approximation to A in Frobenius or L2 norm, LSI ignores the cluster structure
while reducing the dimension of the data. In contrast, in Section 3, we review several dimension
reduction methods that are especially effective for classification of clustered data: two methods
based on centroids (16; 12), and one method which is a generalization of linear discriminant analysis
(LDA) using the generalized singular value decomposition (GSVD) (10). With dimension reduction,
computational complexity can be dramatically reduced for all classifiers including support vector
machines and k-nearest neighbor classification. For k-nearest neighbor classification (kNN), the
distances of vector pairs need to be computed when finding k nearest neighbors. Therefore, one can
significantly reduce computational complexity by dimension reduction.
In many document data sets, documents can be assigned to more than one cluster upon clas-
sification. To handle this problem more effectively, we introduce a threshold based extension of
several classification algorithms in Section 4. Our numerical experiments illustrate that the cluster-
preserving dimension reduction algorithms we employ reduce the data dimension without any sig-
nificant loss of information. In fact, in many cases, they seem to have the effect of noise reduction,
since prediction accuracy becomes better after dimension reduction when compared to that in the
original high dimensional input space.

2. Low-Rank Approximation Using Latent Semantic Indexing

LSI is based on the assumption that there is some underlying latent semantic structure in the term-
document matrix that is corrupted by the wide variety of words used in documents and queries. This
is referred to as the problem of polysemy and synonymy (6). The basic idea is that if two document
vectors represent the same topic, they will share many associating words with a keyword, and they
will have very close semantic structures after dimension reduction via SVD. Thus LSI/SVD breaks
the original relationship of the data into linearly independent components (6), where the original
term vectors are represented by left singular vectors and document vectors by right singular vectors.
That is, if l ≤ rank(A), then
A ≈ Ul Σl VlT

, where the columns of Ul are the leading l left singular vectors, Σl is an l × l diagonal matrix with
the l largest singular values in nonincreasing order along its diagonal, and the columns of Vl are
the leading l right singular vectors. Then Σl VlT is the reduced dimensional representation of A, or
equivalently, a new document q ∈ Rm×1 can be represented in the l-dimensional space as q̂ = UlT q.
This low-rank approximation has been widely applied in information retrieval (2). Since the
complete orthogonal decomposition such as ULV or URV has computational advantages over the
SVD including easier updating (22; 23; 24) and downdating (17), dimension reduction by these
faster low-rank orthogonal decompositions has also been exploited (3). However, LSI ignores the
cluster structure while reducing the dimension. In addition, since there is no theoretical optimum
value for the reduced dimension, potentially expensive experimentation may be required to deter-
mine a reduced dimension l. As we report in Section 5, classification results after LSI vary de-
pending upon the reduced dimension, classification method, and similarity measure employed. The
experimental results confirm that when the data set is already clustered, the dimension reduction
methods we present in the next section are more effective for classification of new data.

39
K IM , H OWLAND AND PARK

Algorithm 1 : Centroid algorithm for Dimension Reduction

Given a data set A ∈ Rm×n with p clusters and a vector q ∈ Rm×1 , this algorithm computes a p
dimensional representation q̂ ∈ R p×1 of q.

1. Compute the centroid ci of the ith cluster, 1 ≤ i ≤ p

2. Set C = c1 c2 · · · c p

3. Solve minq̂ kCq̂ − qk2

Algorithm 2 : Orthogonal Centroid algorithm for Dimension Reduction

Given a data set A ∈ Rm×n with p clusters and a vector q ∈ Rm×1 , this algorithm computes a p
dimensional representation q̂ of q.

1. Compute the centroid ci of the ith cluster, 1 ≤ i ≤ p

2. Set C = c1 c2 · · · c p

3. Compute the reduced QR decomposition of C, which is C = Q p R

4. q̂ = QTp q

3. Dimension Reduction Algorithms for Clustered Data

To achieve greater efficiency in manipulating data represented in a high dimensional space, it is
often necessary to reduce the dimension dramatically. In this section, several dimension reduction
methods that preserve the cluster structure are reviewed. Each method attempts to choose a projec-
tion to a reduced dimensional space that will capture the cluster structure of the data collection as
much as possible.

3.1 Centroid-based Algorithms for Dimension Reduction of Clustered Data

Suppose we are given a data matrix A whose columns are grouped into p clusters. Instead of
treating each column of the matrix A equally regardless of its membership in a specific cluster as
in LSI/SVD, we want to find a lower dimensional representation Y of A so that the p clusters are
preserved in Y . Given a term-document matrix, the problem is to find a transformation that maps
each document vector in the m dimensional space to a vector in the l dimensional space for some
l < m. For this, either the dimension reducing transformation GT ∈ Rl×m is computed explicitly
or the problem is formulated as a rank reducing approximation where the given matrix A is to be
decomposed into two matrices B and Y . That is,

A ≈ BY (2)

where B ∈ Rm×l with rank(B) = l and Y ∈ Rl×n with rank(Y ) = l. The matrix B accounts for the
dimension reducing transformation. However, it is not necessary to compute the dimension reducing
transformation G from B explicitly, as long as we can find the reduced dimensional representation
of a given data item. If the matrix B is already determined, the matrix Y can be computed by solving

40
D IMENSION R EDUCTION IN T EXT C LASSIFICATION WITH SVM S

the least squares problem (8; 12; 16)

min kBY − AkF . (3)
B,Y

Any given document q ∈ Rm×1 can be transformed to the lower dimensional space by solving the
minimization problem
min kBq̂ − qk2 . (4)
q̂∈Rl×1
Latent Semantic Indexing that utilizes the SVD (LSI/SVD) can be viewed as a variation of the
model (2) with B = Ul (16), where Ul Σl VlT is the rank l truncated SVD of A. Then q̂ = UlT q is
obtained by solving the least squares problem
min kBq̂ − qk2 = min kUl q̂ − qk2 . (5)
q̂∈Rl×1 q̂∈Rl×1

In the Centroid dimension reduction algorithm (see Algorithm 1), the ith column of B is the
centroid vector of the ith cluster, which is the average of the data items in the ith cluster, for 1 ≤ i ≤ p.
This matrix B is called the centroid matrix. Then, any vector q ∈ Rm×1 can be represented in the
p dimensional space as q̂, the solution of the least squares problem (4), where B is the centroid
matrix. In the Orthogonal Centroid algorithm (see Algorithm 2), the p dimensional representation
of a data vector q ∈ Rm×1 is given as q̂ = QTp q where Q p is an orthonormal basis for the centroid
matrix obtained from its QR decomposition.
The centroid-based dimension reduction algorithms are computationally less costly than LSI/SVD.
They are also more effective when the data are already clustered. Although the centroid-based
schemes can be applied only when the data are linearly separable, they are suitable for text classifi-
cation problems, since text data is usually linearly separable in the original dimensional space (13).
For a nonlinear extension of the Orthogonal Centroid method that utilizes kernel functions, see (18).

3.2 Generalized Discriminant Analysis based on the Generalized Singular Value

Decomposition
Recently, a new algorithm has been developed for cluster-preserving dimension reduction based
on the generalized singular value decomposition (GSVD) (10). This algorithm generalizes classi-
cal discriminant analysis, by extending its application to very high-dimensional data such as that
encountered in text classification.
Classical discriminant analysis (7; 25) preserves cluster structure by maximizing the scatter
between clusters while minimizing the scatter within clusters. For this purpose, the within-cluster
scatter matrix Sw and the between-cluster scatter matrix Sb are defined. If we denote by Ni the set
of column indices that belong to the cluster i, ni the number of columns in cluster i, and c the global
centroid, then
p
Sw = ∑ ∑ (a j − ci )(a j − ci )T ,
i=1 j∈Ni
and
p
Sb = ∑ ∑ (ci − c)(ci − c)T
i=1 j∈Ni
p
= ∑ ni (ci − c)(ci − c)T .
i=1

41
K IM , H OWLAND AND PARK

Algorithm 3 LDA/GSVD
Given a data matrix A ∈ Rm×n with p clusters, this algorithm computes the columns of the matrix
G ∈ Rm×(p−1) , which preserves the cluster structure in the reduced dimensional space, and it also
computes the p − 1 dimensional representation Y of A.

1. Compute Hb ∈ Rm×p and Hw ∈ Rm×n from A according to Eqns. (7) and (6), respectively.

2. Compute the complete orthogonal decomposition of H = (Hb , Hw )T ∈ R(p+n)×m , which is

R 0
PT HQ = .
0 0

3. Let t = rank(H).

4. Compute W from the SVD of P(1 : p, 1 : t), which is U T P(1 : p, 1 : t)W = ΣA .

5. Compute the first p − 1 columns of

R−1W

0
X =Q ,
0 I

and assign them to G.

6. Y = GT A

Since
p
trace(Sw ) = ∑ ∑ ka j − ci k22
i=1 j∈Ni
measures the closeness within the clusters, and
p
trace(Sb ) = ∑ ∑ kci − ck22
i=1 j∈Ni

measures the remoteness between the clusters, the goal is to minimize the former while maximizing
the latter in the reduced dimensional space. Once again letting GT ∈ Rl×m denote the transformation
that maps a column of A in the m dimensional space to a vector in the l dimensional space, the
goal can be expressed as the simultaneous minimization of trace(GT Sw G) and maximization of
trace(GT Sb G).
When Sw is nonsingular, this simultaneous optimization is commonly approximated by maxi-
mizing
J1 (G) = trace((GT Sw G)−1 (GT Sb G)).
It is well known that the global maximum is achieved when the columns of G are the eigenvectors
of Sw−1 Sb that correspond to the l largest eigenvalues (7; 25). In fact, when the reduced dimension
l ≥ p − 1, trace(Sw−1 Sb ) is exactly preserved upon dimension reduction, and equals λ1 + · · · + λ p−1 ,
where each λi ≥ 0. Without loss of generality, we assume that the term-document matrix A is parti-
tioned as
A = [A1 , · · · , A p ]

42
D IMENSION R EDUCTION IN T EXT C LASSIFICATION WITH SVM S

where the columns of each block Ai ∈ Rm×ni belong to the cluster i. Defining the matrices

Hw = [a1 − c1 , a2 − c1 , . . . , an − c p ] ∈ Rm×n (6)

and
√ √
Hb = [ n1 (c1 − c), . . . , n p (c p − c)] ∈ Rm×p , (7)
then
Sw = Hw HwT and Sb = Hb HbT .
As the product of an m × n matrix with an n × m matrix, Sw will be singular when the number of
terms m exceeds the number of documents n. In that case, classical discriminant analysis fails.
However, if we rewrite the eigenvalue problem Sw−1 Sb xi = λi xi as

β2i Hb HbT xi = α2i Hw HwT xi ,

it can be solved by the GSVD.

The resulting algorithm, called LDA/GSVD, is summarized in Algorithm 3. It follows the
construction of the Paige and Saunders (15) proof, but only computes the necessary part of the
GSVD. The most expensive step of LDA/GSVD is the complete orthogonal decomposition of the
composite H matrix in Step 2. When max(p, n) m, the SVD of H = [HbT , HwT ] ∈ R(p+n)×m can be
computed by first computing the reduced QR decomposition H T = QH RH , and then computing the
SVD of RH ∈ R(p+n)×(p+n) as
ΣH 0

RH = Z PT .
0 0
This gives
ΣH

0
H= RTH QTH =P Z T QTH ,
0 0

where the columns of QH Z ∈ Rm×(p+n) are orthonormal. There exists othogonal Q ∈ Rm×m whose
first p + n columns are QH Z. Hence

ΣH 0

H =P QT ,
0 0

where there are now m − t zero columns to the right of ΣH . Since RH ∈ R(p+n)×(p+n) is a much
smaller matrix than H ∈ R(p+n)×m , the required memory is substantially reduced. In addition, the
computational complexity of the algorithm is reduced to O (mn2 ) + O (n3 ) (8), since this step is the
dominating part.

4. Classification Methods
To test the effect of dimension reduction in text classification, three different classification methods
were used: centroid-based classification, k-nearest neighbor (kNN), and support vector machines
(SVMs). Each classification method is modified by introducing some threshold values to perform
classification correctly when a document has membership in multiple classes. In this section, we
briefly review the three classification methods and discuss their modifications.

43
K IM , H OWLAND AND PARK

Algorithm 4 : Centroid-based Classification

Given a data matrix A with p clusters and p corresponding centroids, ci , 1 ≤ i ≤ p, and a vector
q ∈ Rm×1 , this method finds the index j of the cluster in which the vector q belongs.

• find the index j such that sim(q, ci ), 1 ≤ i ≤ p, is minimum (or maximum), where sim(q, ci )
is the similarity measure between q and ci . (For example, sim(q, ci ) = kq − ci k2 using the L2
norm, and we take the index with the minimum value. Using the cosine measure,

qT ci
sim(q, ci ) = cos(q, ci ) = ,
kqk2 kci k2

and we take the index with the maximum value.)

4.1 Centroid-based Classification

Centroid-based classification, summarized in Algorithm 4, is one of the simplest classification meth-
ods. A test document is assigned to a class that has the most similar centroid. Using the cosine
similarity measure, we can classify a test document q by computing

qT ci
arg max (8)
1≤i≤p kqk2 kci k2

where ci is the centroid of the ith cluster of the training data. When dimension reduction is per-
formed by the Centroid algorithm, the centroids of the full space become the columns ei ∈ R p×1 of
the identity matrix. Then the decision rule becomes

q̂T ei
arg max , (9)
1≤i≤p kq̂k2 kei k2

where q̂ is the reduced dimensional representation of the document q. This shows that classification
can be performed by simply finding the index i of the vector q̂ with the largest component. Centroid-
based classification has the advantage that the computation involved is extremely simple. We can
also classify using the L2 norm similarity measure by finding the centroid that is closest to q in L2
norm.
The original form of centroid-based classification finds the nearest centroid and assigns the
corresponding class as the predicted class. To allow an assignment of any document to multiple
classes, we introduce the decision rule for centroid-based classification as

y(x, j) = sign{sim(x, c j ) − θcj }, (10)

where y(x, j) ∈ {+1, −1} is the classification for document x with respect to class j (if y > 0 then
the class is j, else the class is not j), sim(x, c j ) is the similarity between the test document x and the
centroid vector c j for the class j, and θcj is the class specific threshold for the binary decision for
y(x, j) in centroid-based classification. In this way, document x will be a member of class j if its
similarity to the centroid vector c j for the class is above the threshold.

44
D IMENSION R EDUCTION IN T EXT C LASSIFICATION WITH SVM S

Algorithm 5 : k Nearest Neighbor (kNN) Classification

Given a data matrix A = [a1 , . . . , an ] with p clusters and a vector q ∈ Rm×1 , this method finds the
cluster in which the vector q belongs.

1. Using the similarity measure sim(q, a j ) for 1 ≤ j ≤ n, find the k nearest neighbors of q.

2. Among these k vectors, count the number belonging to each cluster.

3. Assign q to the cluster with the greatest count in the previous step.

4.2 k-Nearest Neighbor Classification

The kNN algorithm, summarized in Algorithm 5, is one of the most commonly used classification
methods. To correctly predict the membership of a document which belongs to multiple classes, we
used the following modified decision rule for kNN (29):

y(x, j) = sign{ ∑ sim(x, di )y(di , j) − θkNN

j } (11)
di ∈kNN

where kNN is the set of k nearest neighbors for document x, y(di , j) ∈ {+1, −1} is the classification
for document di with respect to class j (if y > 0 then the class is j, else the class is not j), sim(x, di )
is the similarity between the test document x and the training document di , and θkNN j is the class
specific threshold for kNN classification.

4.3 Support Vector Machines

The optimal separating hyperplane of the one-vs-rest binary classifier can be obtained by conven-
tional SVMs. We introduce the following decision rule for support vector machines as

y(x, j) = sign{ ∑ αi yi K(x, xi ) + b − θSV

j
M
}, (12)
xi ∈SV

where y(x, j) ∈ {+1, −1} is the classification for document x with respect to class j, SV is the set
of support vectors, and θSV
j
M is the class specific threshold for the binary decision. This threshold is

set so that a new document x must not be classified to belong to class j when it is located very close
to the optimal separating hyperplane, i.e. when the decision is made with a low reliability. We use
the linear kernel K =< x, xi >, the polynomial kernel K = [< x, xi > +1]d , where d is the degree of
the polynomial, and the Gaussian RBF (radial basis function) kernel K = exp(−γkx − xi k2 ), where
γ is a parameter that controls the width of the Gaussian function.

5. Experimental Results
Prediction results are compared for the test documents in the full space without any dimension re-
duction as well as those in the reduced space obtained by LSI/SVD, Centroid, Orthogonal Centroid,
and LDA/GSVD dimension reduction methods. For SVMs, we optimized the regularization param-
eter C, polynomial degree d for the polynomial kernel, and γ for the Gaussian RBF (radial basis
function) kernel for each full and reduced dimension data set.

45
K IM , H OWLAND AND PARK

classification The rank-l approximation of LSI/SVD

methods l=5 l=100 l=200 l=300 l=500 l=1000 l=1246 l=1247 Full
centroid (L2 ) 71.6 82.2 83.4 83.9 84.8 84.9 85.2 85.2 85.2
centroid (Cosine) 78.5 86.9 87.1 87.6 88.0 88.2 88.3 88.3 88.3
5NN (L2 ) 77.8 68.8 55.4 49.2 63.8 76.9 79.0 79.0 79.0
15NN (L2 ) 77.5 69.7 52.7 50.3 76.3 74.7 83.4 83.4 83.4
30NN (L2 ) 77.5 64.3 47.8 58.0 80.8 73.2 83.8 83.8 83.8
5NN (Cosine) 77.8 82.2 79.1 79.6 79.4 78.7 77.8 77.8 77.8
15NN (Cosine) 80.2 83.1 82.5 83.6 82.9 82.5 82.5 82.5 82.5
30NN (Cosine) 79.8 83.4 83.8 84.1 84.2 84.1 83.8 83.8 83.8
SVM 79.1 87.6 88.4 88.5 88.6 89.2 89.7 89.7 88.9

Table 1: Text classification accuracy (%) using centroid-based classification, k-nearest neighbor
classification, and SVMs, with LSI/SVD dimension reduction on the MEDLINE data set.
The Euclidean norm (L2 ) and the cosine similarity measure (Cosine) were used for the
centroid-based and kNN classification.

The first data set that we used was a subset of the MEDLINE database with 5 classes. Each class
has 500 documents. The set was divided into 1250 training documents and 1250 test documents.
After stemming and stoplist removal, the training set contains 22095 distinct terms. For this data,
each document belongs to only one class, and we used the original form of the three classification
algorithms without introducing the threshold.
The second data set was the “ModApte” split of the Reuter-21578 text collection. We only used
90 classes for which there is at least one training and one test example in each class. It contains
7769 training documents and 3019 test documents. The training set contains 11941 distinct terms
after preprocessing with stoplist removal and stemming. The Reuter data set contains documents
that belong to multiple classes, so the classification methods utilize thresholds.
We used a standard weight factor for each word stem:

t fi log(id fi )
φi (x) = , (13)
κ
where t fi is the number of occurrences of term i in document x, id fi = n/d is the ratio between
the total number of documents n and the number of documents d containing the term, and κ is the
normalization constant that makes kφk2 = 1.
Table 1 reports text classification accuracy for the MEDLINE data set using LSI/SVD with a
range of values for the reduced dimension. The smallest reduced dimension, l = 5, is included in
order to compare with centroid-based and LDA/GSVD methods, which reduce the dimension to 5
and 4, respectively. Since the training set has the nearly-full rank of 1246, we include the reduced
dimensions 1246 and 1247 at the high end of the range. For a training set of size 1250, the reduced
dimension l = 300 is generous. However, we observe that kNN classification with L2 norm simi-
larity produces poor classification results for l values from 100 to 500. This is consistent with the
common belief that cosine similarity performs better with unnormalized text data. Also, classifica-
tion accuracy using 5NN lags that for higher values of k, suggesting that k=5 is too small for classes

46
D IMENSION R EDUCTION IN T EXT C LASSIFICATION WITH SVM S

kernel Dimension reduction methods

Full Centroid Orthogonal LDA/ LDA/
Centroid GSVD4 GSVD5
22095×1250 5×1250 5×1250 4×1250 5×1250
linear (C=1.0) 88.1 88.9 85.9 86.5 86.6
linear (C=10.0) 88.9 88.5 88.3 86.7 86.7
linear (C=50.0) 88.9 87.7 88.8 87.1 87.1
linearopt 88.9 88.9 89.0 87.4 87.4
polynomial(d=2) 88.6 88.9 88.9 87.3 87.3
polynomial(d=3) 88.0 89.0 88.8 87.4 87.4
polynomial(d=4) 87.5 88.9 88.8 87.2 87.2
polynomial(d=5) 86.5 88.6 88.8 87.1 87.1
polynomialopt 88.6 89.0 88.9 87.4 87.4
RBF (γ = 0.5) 88.5 89.0 89.0 87.1 87.2
RBF (γ = 1.0) 87.6 89.2 89.0 87.3 87.2
RBF (γ = 1.5) 86.3 89.1 88.8 87.4 87.3
RBFopt 88.7 89.2 89.0 87.4 87.3

Table 2: Text classification accuracy (%) with different kernels in SVMs with and without dimen-
sion reduction on the MEDLINE data set. The regularization parameter C for each case
was optimized by numerical experiments. Dimension of each training term-document ma-
trix is shown. LDA/GSVD4 and LDA/GSVD5 represent the results from LDA/GSVD
where the reduced dimensions are 4 and 5, respectively.

of size 250. It is noteworthy that even with LSI, which makes no attempt to preserve the cluster
structure upon dimension reduction, SVM classification achieves very consistent classification re-
sults for reduced dimensions of 100 or greater, and the SVM accuracy exceeds that of the other
classification methods.
Table 2 shows text classification accuracy (%) with different kernels in SVMs, with and without
dimension reduction on the MEDLINE data set. Note that the linearopt values are optimal over all
the values of the regularization parameter C that we tried, and the RBFopt values are optimal over
all the γ values we tried. This table shows that the prediction results in the reduced dimension are
similar to those in the original full dimensional space, while achieving a significant reduction in
time and space complexity. In the reduced space obtained by the Orthogonal Centroid dimension
reduction algorithm, the classification accuracy is insensitive to the choice of the kernel. Thus, we
can choose the linear kernel in this case instead of the computationally more expensive polynomial
or RBF kernel.
Table 3 shows classification accuracy obtained by all three classification methods – centroid-
based, kNN with three different values of k, and the optimal result from SVM – for each dimension
reduced data set and the full space. For the LDA/GSVD dimension reduction method, the classi-
fication accuracy with cosine similarity measure is lower with centroid-based classification as well
as with kNN, while the results with L2 norm are better. This is due to the formulation of trace
optimization criteria in terms of the L2 norm. With LDA/GSVD, documents from the same class in

47
K IM , H OWLAND AND PARK

classification Dimension reduction methods

methods Full Centroid Orthogonal LDA/ LDA/
Centroid GSVD4 GSVD5
22095×1250 5×1250 5×1250 4×1250 5×1250
centroid (L2 ) 85.2 88.0 85.2 88.7 88.7
centroid (Cosine) 88.3 88.0 88.3 83.9 83.9
5NN (L2 ) 79.0 88.4 88.6 81.5 86.6
15NN (L2 ) 83.4 88.3 87.8 88.7 88.6
30NN (L2 ) 83.8 88.8 88.5 88.7 88.5
5NN (Cosine) 77.8 88.6 88.2 83.8 84.1
15NN (Cosine) 82.5 88.2 88.5 83.8 84.1
30NN (Cosine) 83.8 88.3 88.6 83.8 84.1
SVM 88.9 89.2 89.0 87.4 87.4

Table 3: Text classification accuracy (%) using centroid-based classification, k-nearest neighbor
classification, and SVMs, with and without dimension reduction on the MEDLINE data
set. The Euclidean norm (L2 ) and the cosine similarity measure (Cosine) were used for
centroid-based and kNN classification.

class Dimension reduction

Full Centroid Orthogonal LDA/ LDA/
Centroid GSVD4 GSVD5
22095×1250 5×1250 5×1250 4×1250 5×1250
heart attack 92.4 94.4 94.4 92.4 92.4
colon cancer 84.8 84.8 86.0 83.2 83.2
glycemic 95.6 97.6 98.0 95.2 95.2
oral cancer 82.0 75.2 73.6 78.8 78.8
tooth decay 89.6 94.0 92.8 87.2 87.2
microavg 88.9 89.2 89.0 87.4 87.4

Table 4: Text classification accuracy (%) of the 5 classes and the microaveraged performance over
all 5 classes on the MEDLINE data set. All results are from SVMs using optimal kernels.

the full dimensional space tend to be transformed to a very tight cluster or even to a single point in
the reduced space, since the LDA/GSVD algorithm tends to minimize the trace of the within cluster
scatter. This seems to make it difficult for SVMs to find a binary classifier with low generalization
error.
Table 4 shows text classification accuracy for the 5 classes using SVMs with and without dimen-
sion reduction methods on the MEDLINE data set. The colon cancer and oral cancer documents
were relatively hard to classify correctly.
The REUTERS data set has many documents that are classified to more than 2 classes, whereas
no document is classified to belong to more than one class in the MEDLINE data set. While we

48
D IMENSION R EDUCTION IN T EXT C LASSIFICATION WITH SVM S

classification Dimension reduction

methods Full Centroid Orthogonal
Centroid
11941×9579 90×9579 90×9579
centroid(L2 ) 78.89 73.32 78.00
centroid(Cosine) 80.45 74.79 80.46
15NN 78.65 81.70 85.51
30NN 80.21 81.94 86.19
45NN 80.29 81.01 84.79
SVM 87.11 84.54 87.03

Table 5: Comparison of micro-averaged F1 scores for 3 different classification methods with and
without dimension reduction on the REUTERS data set. The Euclidean norm (L2 ) and the
cosine similarity measure (Cosine) were used for the centroid-based classification. The
cosine similarity measure was used for the kNN classification. The dimension of the full
training term-document matrix is 11941×9579 and that of the reduced matrix is 90×9579.

could handle relatively large matrices using a sparse matrix representation and sparse QR decom-
position in the Centroid and Orthogonal Centroid dimension reduction methods, results for the
LDA/GSVD dimension reduction method are not reported, since we ran out of memory while com-
puting the GSVD. For this data set, we built a series of threshold-based classifiers, optimizing the
thresholds to capture the multiple class membership. All class specific thresholds (θkNN
j , θcj , θSV
j
M)

are determined by numerical experiments. Though we obtained precision/recall break even points
by optimizing the thresholds, we report values of the F1 measure (26) which is defined as

2rp
F1 = , (14)
r+ p

where r is recall and p is precision for a binary classification. Table 5 shows that the effectiveness
of classification was preserved for the Orthogonal Centroid dimension reduction algorithm, while it
became worse for the Centroid dimension reduction algorithm. This is due to a property of the Cen-
troid algorithm that the centroids of the full space are projected to the columns of the identity matrix
in the reduced space. This orthogonality between the centroids may make it difficult to represent the
multiclass membership of a document by separating closely related classes after dimension reduc-
tion. The pattern of prediction measure F1 for each class is also preserved by Orthogonal Centroid
in Table 6. The macro-averaged F1 and micro-averaged F1 for the 10 most frequent classes are also
presented.

6. Conclusion and Discussion

In this paper, we applied three methods, Centroid, Orthogonal Centroid, and LDA/GSVD, which are
designed for reducing the dimension of clustered data. For comparison, we also applied LSI/SVD,
which does not attempt to preserve cluster structure upon dimension reduction. We tested the ef-
fectiveness in classification with dimension reduction using three different classification methods:

49
K IM , H OWLAND AND PARK

class Dimension reduction

Full Centroid Orthogonal
Centroid
11941×9579 90×9579 90×9579
earn 98.25 97.49 96.60
acq 95.57 95.45 94.94
money-fx 75.78 77.97 79.44
grain 92.88 86.62 92.26
crude 88.11 86.49 87.70
trade 75.32 75.11 77.25
interest 77.99 78.13 83.21
ship 84.09 85.71 88.00
wheat 84.14 81.94 84.06
corn 87.27 74.78 89.47
microavg (top 10) 92.21 91.32 92.21
avg (top 10) 85.94 83.96 87.32
microavg(all) 87.11 84.54 87.03

Table 6: F1 scores of the 10 most frequent classes and micro-averaged performance over all 90
classes on the REUTERS data set. All results are from SVMs using optimal kernels.
The dimension of the full training term-document matrix is 11941×9579 and that of the
reduced matrix is 90×9579.

SVMs, kNN, and centroid-based classification. For the three cluster-preserving methods, the re-
sults show surprisingly high prediction accuracy, which is essentially the same as in the original
full space, even with very dramatic dimension reduction. They justify dimension reduction as a
worthwhile preprocessing stage for achieving high efficiency and effectiveness. Especially for kNN
classification, the savings in computational complexity in classification after dimension reduction
are significant. In the case of SVM the savings are also clear, since the distance between two pairs
of input data points need to be computed repeatedly with and without the use of the kernel function,
and the vectors become significantly shorter with dimension reduction.
We have also introduced threshold based classifiers for centroid-based classification and SVMs
in order to capture the overlap structure between closely related classes. Prediction results with the
Centroid dimension reduction method became better compared to those from the full space for the
completely disjoint MEDLINE data set, but became worse for the REUTERS data set. Since the
Centroid dimension reduction method maps the centroids to unit vectors ei which are orthogonal
to each other, it is helpful for the disjoint data set, but not for a data set which contains documents
belonging multiple classes. We observed that prediction accuracy with the Orthogonal Centroid di-
mension reduction algorithm was preserved for SVMs as well as with centroid-based classification.
The Orthogonal Centroid dimension reduction method maximizes the between cluster relationship
using the relatively inexpensive reduced QR decomposition, compared to LDA/GSVD which also
considers the within cluster relationship but requires a more expensive rank revealing decomposition
such as the singular value decomposition (10; 11).

50
D IMENSION R EDUCTION IN T EXT C LASSIFICATION WITH SVM S

The better prediction accuracy using SVMs is due to low generalization error by maximizing
the margin, and the capability to handle non-linearity by kernel choice. Although most classes of
the Reuters-21578 data set are linearly separable (13), there seems to be some level of non-linearity.
For non-linearly separable data, SVMs with appropriate nonlinear kernel functions would work as a
better classifier. Another way to handle non-linearly separable data is to apply nonlinear extensions
of the dimension reduction methods, including those presented in (18; 19). All of the dimension
reduction methods presented here can also be applied to visualize the higher dimensional structure
by reducing the dimension to 2- or 3-dimensional space.
We conclude that dramatic dimension reduction of text documents can be achieved, without
sacrificing classification accuracy. For the document sets we tested, the Orthogonal Centroid method
did particularly well at preserving the cluster structure from the full dimensional representation.
That is, the prediction accuracies for Orthogonal Centroid rival those of the full space, even though
the dimension is reduced to the number of clusters. The savings in computational complexity are
significant using either kNN classification or SVM.

Acknowledgments

This material is based upon work supported by the National Science Foundation Grant No. CCR-
0204109. Any opinions, findings and conclusions or recommendations expressed in this material
are those of the authors and do not necessarily reflect the views of the National Science Foundation
(NSF). The authors would also like to thank University of Minnesota Supercomputing Institute
(MSI) for providing the computing facilities.

References
[1] M. W. Berry, Z. Drmac, and E. R. Jessup. Matrices, vector spaces, and information retrieval.
SIAM Review, 41:335–362, 1999.

[2] M. W. Berry, S. T. Dumais, and G. W. O’Brien. Using linear algebra for intelligent information
retrieval. SIAM Review, 37:573–595, 1995.

[3] M. W. Berry and R. D. Fierro. Low-rank orthogonal decompositions for information retrieval
applications. Numerical Linear Algebra with Applications, 3(4):301–327, 1996.

[4] Å. Björck. Numerical Methods for Least Square Problems. SIAM, Philadelphia, PA, 1996.

[5] N. Cristianini and J. Shawe-Taylor. Support Vector Machines and Other Kernel-based Learn-
ing Methods. Cambridge University Press, 2000.

[6] S. Deerwester, S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman. Indexing by latent
semantic analysis. Journal of the Society for Information Science, 41:391-407, 1990.

[7] K. Fukunaga, Introduction to Statistical Pattern Recognition, Second ed., Academic Press,
1990.

[8] G. H. Golub and C. F. Van Loan. Matrix Computations, third edition. Johns Hopkins Univer-
sity Press, Baltimore, 1996.

51
K IM , H OWLAND AND PARK

[9] M. Heiler. Optimization Criteria and Learning Algorithms for Large Margin Classifiers.
Diploma Thesis, University of Mannheim., 2002.

[10] P. Howland, M. Jeon, and H. Park. Structure Preserving Dimension Reduction for Clustered
Text Data based on the Generalized Singular Value Decomposition. SIAM Journal of Matrix
Analysis and Applications, 25(1):165–179, 2003.

[11] P. Howland and H. Park. Generalizing discriminant analysis using the generalized singular
value decomposition, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(8):
995-1006, 2004.

[12] M. Jeon, H. Park, and J. B. Rosen. Dimensional reduction based on centroids and least squares
for efficient processing of text data. In Proceedings for the First SIAM International Workshop
on Text Mining. Chicago, IL, 2001.

[13] T. Joachims. Text categorization with support vector machines: Learning with many relevant
features. In Proceedings of the European Conference on Machine Learning, pages 137–142,
Berlin, 1998.

[14] H. Lodhi, N. Cristianini, J. Shawe-Taylor, and C. Watkins. Text classification using string
kernels. Advances in Neural Information Processing Systems, 13:563–569, 2000.

[15] C. C. Paige and M. A. Saunders, Towards a generalized singular value decomposition, SIAM
Journal of Numerical Analysis, 18, pp. 398–405, 1981.

[16] H. Park, M. Jeon, and J. B. Rosen. Lower dimensional representation of text data based on
centroids and least squares, BIT Numerical Mathematics, 42(2):1–22, 2003.

[17] H. Park and L. Eldén. Downdating the rank-revealing URV decomposition. SIAM Journal of
Matrix Analysis and Applications, 16, pp. 138–155, 1995.

[18] C. Park and H. Park. Nonlinear feature extraction based on centroids and kernel functions.
Pattern Recognition, to appear.

[19] C. Park and H. Park. Kernel discriminant analysis based on the generalized singular value
decomposition. Technical report 03-017, Department of Computer Science and Engineering,
University of Minnesota, 2003.

[20] G. Salton, The SMART Retrieval System, Prentice Hall, 1971.

[21] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill,

1983.

[22] G. W. Stewart. An updating algorithm for subspace tracking. IEEE Transactions on Signal
Processing, 40:1535–1541, 1992.

[23] G. W. Stewart. Updating URV decompositions in parallel. Parallel Computing, 20(2):151–

172, 1994.

[24] M. Stewart and P. Van Dooren. Updating a generalized URV decomposition. SIAM Journal of
Matrix Analysis and Applications, 22(2):479–500, 2000.

52
D IMENSION R EDUCTION IN T EXT C LASSIFICATION WITH SVM S

[25] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, 1999.

[26] C. J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979.

[27] V. Vapnik. The Nature of Statistical Learning Theory. Springer-Verlag, New York, 1995.

[28] V. Vapnik. Statistical Learning Theory. John Wiley & Sons, New York, 1998.

[29] Y. Yang and X. Liu. A re-examination of text categorization methods. In 22nd Annual Inter-
national SIGIR, pages 42–49, Berkeley, August 1999.

Digital Signal Processing by Ramesh Babu..
33% (3)
Digital Signal Processing by Ramesh Babu..
303 pages
Machine Learning
No ratings yet
Machine Learning
45 pages
Paper Oltean Gordan
No ratings yet
Paper Oltean Gordan
7 pages
22-Kernel Tricks Shit
No ratings yet
22-Kernel Tricks Shit
43 pages
Combining Support Vector Machines: 6.1. Introduction and Motivations
No ratings yet
Combining Support Vector Machines: 6.1. Introduction and Motivations
20 pages
SVM Tutorial
100% (1)
SVM Tutorial
34 pages
Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Machine Learning - Open Elective - Part III
No ratings yet
Machine Learning - Open Elective - Part III
90 pages
Chapter 7
No ratings yet
Chapter 7
64 pages
Support Vector Machine: Prof. Subodh Kumar Mohanty
No ratings yet
Support Vector Machine: Prof. Subodh Kumar Mohanty
52 pages
Research in Daily Life 1 Research in Daily Life 2: Flexible Instruction Delivery Plan (Fidp)
No ratings yet
Research in Daily Life 1 Research in Daily Life 2: Flexible Instruction Delivery Plan (Fidp)
9 pages
Taz TFG 2016 2057
No ratings yet
Taz TFG 2016 2057
52 pages
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
No ratings yet
Discriminative and Generative Methods For Bags of Features: Zebra Non-Zebra
40 pages
Introduction To: Support Vector Machines
No ratings yet
Introduction To: Support Vector Machines
53 pages
Support Vector Machine: Abinas Panda
No ratings yet
Support Vector Machine: Abinas Panda
52 pages
Lecture 18 - SVM
No ratings yet
Lecture 18 - SVM
54 pages
L5-Support Vector Machine
No ratings yet
L5-Support Vector Machine
61 pages
ML 41
No ratings yet
ML 41
49 pages
SVM Tutorial
No ratings yet
SVM Tutorial
31 pages
The Marketig Plan of Cocoon Viet Nam
No ratings yet
The Marketig Plan of Cocoon Viet Nam
36 pages
Lec5 Support Vector Machine
No ratings yet
Lec5 Support Vector Machine
28 pages
Module 6-Svm
No ratings yet
Module 6-Svm
47 pages
U20cs604 Machine Learning Unit II
No ratings yet
U20cs604 Machine Learning Unit II
50 pages
SVM Class
No ratings yet
SVM Class
33 pages
W12 SVM
No ratings yet
W12 SVM
52 pages
CS-13410 Introduction To Machine Learning
No ratings yet
CS-13410 Introduction To Machine Learning
33 pages
CBM342 BCI Unit IV
No ratings yet
CBM342 BCI Unit IV
22 pages
Fast Kernel Classifiers
No ratings yet
Fast Kernel Classifiers
41 pages
Generalization of Linear and Non-Linear Support Vector Machine in Multiple Fields: A Review
No ratings yet
Generalization of Linear and Non-Linear Support Vector Machine in Multiple Fields: A Review
14 pages
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
No ratings yet
27-Module 4 - Support Vector Machine and Naïve Bayes-20-09-2024
31 pages
Support Vector Machine
No ratings yet
Support Vector Machine
29 pages
Text Categorization Based On Regularized Linear Classification Methods
No ratings yet
Text Categorization Based On Regularized Linear Classification Methods
27 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
Support Vector Machine
No ratings yet
Support Vector Machine
52 pages
SVM Intro
No ratings yet
SVM Intro
23 pages
Support Vector Machine
No ratings yet
Support Vector Machine
31 pages
Using Linear Algebra For Intelligent Information Retrieval
No ratings yet
Using Linear Algebra For Intelligent Information Retrieval
23 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Support Vector Machines: Jeff Wu
No ratings yet
Support Vector Machines: Jeff Wu
35 pages
SVM Versus Least Squares SVM
No ratings yet
SVM Versus Least Squares SVM
8 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
SVM Notes Unit 4
No ratings yet
SVM Notes Unit 4
8 pages
Hearst SVM
No ratings yet
Hearst SVM
12 pages
UNIT-4 Information Retrieval Notes
No ratings yet
UNIT-4 Information Retrieval Notes
16 pages
Article 18 Colas
No ratings yet
Article 18 Colas
10 pages
Dimensionality Reduction For Bag-Of-Words Models PCA Vs LSA
No ratings yet
Dimensionality Reduction For Bag-Of-Words Models PCA Vs LSA
6 pages
What Is Information Retrieval (IR)
No ratings yet
What Is Information Retrieval (IR)
9 pages
Support Vector Machine (SVM)
No ratings yet
Support Vector Machine (SVM)
4 pages
Makalah
No ratings yet
Makalah
4 pages
Support Vector Machines Ymod
No ratings yet
Support Vector Machines Ymod
4 pages
Margin-Based Active Learning and Background Knowledge in Text Mining
No ratings yet
Margin-Based Active Learning and Background Knowledge in Text Mining
6 pages
Support Vector Machines For Text Categorization Based On Latent Semantic Indexing
No ratings yet
Support Vector Machines For Text Categorization Based On Latent Semantic Indexing
4 pages
SVM
No ratings yet
SVM
21 pages
Feature Selection For Nonlinear Kernel Support Vector Machines
No ratings yet
Feature Selection For Nonlinear Kernel Support Vector Machines
6 pages
Supervised Learning - Support Vector Machines and Feature Reduction
No ratings yet
Supervised Learning - Support Vector Machines and Feature Reduction
11 pages
Lecture Notes SVM
No ratings yet
Lecture Notes SVM
4 pages
Atc Lecture Tyliu
No ratings yet
Atc Lecture Tyliu
48 pages
Evaluation of Different Classifier
No ratings yet
Evaluation of Different Classifier
4 pages
SVM Tutorial
No ratings yet
SVM Tutorial
34 pages
Another Introduction SVM
No ratings yet
Another Introduction SVM
4 pages
Online Content Creation Workbook
100% (1)
Online Content Creation Workbook
8 pages
Ramp Check List
No ratings yet
Ramp Check List
1 page
2annual Student Outcome Goal Plan
No ratings yet
2annual Student Outcome Goal Plan
4 pages
EDF 222 - Philosophy of Education
No ratings yet
EDF 222 - Philosophy of Education
7 pages
Hippo 4 - Writing SF
No ratings yet
Hippo 4 - Writing SF
2 pages
Yellowstripe Scad
No ratings yet
Yellowstripe Scad
7 pages
Schrack RE030024
No ratings yet
Schrack RE030024
2 pages
Combo
No ratings yet
Combo
11 pages
CCS 2124 2202 Operating Systems I Course Outline January 2025 Se
No ratings yet
CCS 2124 2202 Operating Systems I Course Outline January 2025 Se
3 pages
3rd-5 Grade Lesson Plans
No ratings yet
3rd-5 Grade Lesson Plans
2 pages
How To Use The TIMESTAMPADD Parameter To Retrieve by Today - X Time in An Alma Analytics Report
No ratings yet
How To Use The TIMESTAMPADD Parameter To Retrieve by Today - X Time in An Alma Analytics Report
27 pages
Dawn 2
No ratings yet
Dawn 2
8 pages
Question 1: How Busy Is Your Schedule?
No ratings yet
Question 1: How Busy Is Your Schedule?
10 pages
Delhi Industrial Policy 2010-2021
No ratings yet
Delhi Industrial Policy 2010-2021
42 pages
Alchemical Imagery in The Works of Quiri PDF
No ratings yet
Alchemical Imagery in The Works of Quiri PDF
467 pages
Nddi2 LS - GS - 3ba23173
No ratings yet
Nddi2 LS - GS - 3ba23173
27 pages
Company Profile-Polybond
No ratings yet
Company Profile-Polybond
40 pages
Sony kv-27fs13 27fs17 27fv17 29fv17-c 32fs13 32fs17 34fs13c 34fs17 CH Ba-5
No ratings yet
Sony kv-27fs13 27fs17 27fv17 29fv17-c 32fs13 32fs17 34fs13c 34fs17 CH Ba-5
299 pages
Ex Inspections - A Journey For Maintenance Engineers: Shailesh Chauhan Shell Project &technology Stavanger Norway
No ratings yet
Ex Inspections - A Journey For Maintenance Engineers: Shailesh Chauhan Shell Project &technology Stavanger Norway
4 pages
Graven and Venkat
No ratings yet
Graven and Venkat
21 pages
TP5088 PDF
No ratings yet
TP5088 PDF
6 pages
PEC1-Format Prak Corporate 2024 (Autosaved)
No ratings yet
PEC1-Format Prak Corporate 2024 (Autosaved)
25 pages
Chapter4performanceparav2 28student 29
No ratings yet
Chapter4performanceparav2 28student 29
19 pages
Mscds Ad 2025
No ratings yet
Mscds Ad 2025
1 page
Webview
No ratings yet
Webview
3 pages
S&S Question Bank
No ratings yet
S&S Question Bank
2 pages
Membership Form: The Accredited Professional Organization in The Phils. (I-Apo No
No ratings yet
Membership Form: The Accredited Professional Organization in The Phils. (I-Apo No
1 page

Kim 05 A

Uploaded by

Kim 05 A

Uploaded by

Journal of Machine Learning Research 6 (2005) 37–53 Submitted 3/03; Revised 9/03; Published 1/05

Dimension Reduction in Text Classification

Hyunsoo Kim HSKIM @ CS . UMN . EDU

Editor: Nello Christianini

c 2005 Hyunsoo Kim, Peg Howland and Haesun Park.

The kernel function

where SV denotes the set of support vectors, b is a bias given by

minyi =1 < w∗ , φ(xi ) > + maxyi =−1 < w∗ , φ(xi ) >

2. Low-Rank Approximation Using Latent Semantic Indexing

Algorithm 1 : Centroid algorithm for Dimension Reduction

1. Compute the centroid ci of the ith cluster, 1 ≤ i ≤ p

3. Solve minq̂ kCq̂ − qk2

Algorithm 2 : Orthogonal Centroid algorithm for Dimension Reduction

1. Compute the centroid ci of the ith cluster, 1 ≤ i ≤ p

3. Compute the reduced QR decomposition of C, which is C = Q p R

3. Dimension Reduction Algorithms for Clustered Data

3.1 Centroid-based Algorithms for Dimension Reduction of Clustered Data

the least squares problem (8; 12; 16)

3.2 Generalized Discriminant Analysis based on the Generalized Singular Value

2. Compute the complete orthogonal decomposition of H = (Hb , Hw )T ∈ R(p+n)×m , which is

4. Compute W from the SVD of P(1 : p, 1 : t), which is U T P(1 : p, 1 : t)W = ΣA .

5. Compute the first p − 1 columns of

and assign them to G.

Hw = [a1 − c1 , a2 − c1 , . . . , an − c p ] ∈ Rm×n (6)

β2i Hb HbT xi = α2i Hw HwT xi ,

it can be solved by the GSVD.

Algorithm 4 : Centroid-based Classification

and we take the index with the maximum value.)

4.1 Centroid-based Classification

y(x, j) = sign{sim(x, c j ) − θcj }, (10)

Algorithm 5 : k Nearest Neighbor (kNN) Classification

2. Among these k vectors, count the number belonging to each cluster.

4.2 k-Nearest Neighbor Classification

y(x, j) = sign{ ∑ sim(x, di )y(di , j) − θkNN

4.3 Support Vector Machines

y(x, j) = sign{ ∑ αi yi K(x, xi ) + b − θSV

classification The rank-l approximation of LSI/SVD

kernel Dimension reduction methods

classification Dimension reduction methods

class Dimension reduction

classification Dimension reduction

6. Conclusion and Discussion

class Dimension reduction

[20] G. Salton, The SMART Retrieval System, Prentice Hall, 1971.

[21] G. Salton and M. J. McGill, Introduction to Modern Information Retrieval, McGraw-Hill,

[23] G. W. Stewart. Updating URV decompositions in parallel. Parallel Computing, 20(2):151–

[25] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Academic Press, 1999.

[26] C. J. van Rijsbergen. Information Retrieval. Butterworths, London, 1979.

You might also like