0% found this document useful (0 votes)

8 views

Text representation_ from vector to tensor

This paper introduces the Tensor Space Model (TSM) for text representation, which utilizes high-order tensors instead of traditional vectors to capture word order information and improve text classification performance. The TSM is supported by the High-Order Singular Value Decomposition (HOSVD) for dimensionality reduction, demonstrating superior results compared to the Vector Space Model (VSM) on the 20 Newsgroups dataset. The findings indicate that TSM effectively addresses limitations of existing models while enhancing information retrieval capabilities.

Uploaded by

Tough Hou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Text representation_ from vector to tensor

Uploaded by

Tough Hou

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Text Representation: from Vector to Tensor*

Ning Liu1, Benyu Zhang2, Jun Yan3, Zheng Chen2, Wenyin Liu4, Fengshan Bai1, Leefeng Chien5
1
Department of Mathematical Science, Tsinghua University, Beijing 100084, P.R. China
{liun01, fbai}@mails.tsignhua.edu.cn
2
Microsoft Research Asia, 49 Zhichun Road, Beijing 100080, P.R. China
{byzhang, zhengc}@microsoft.com
3
School of Mathematical Sciences, Peking University, Beijing 100871, P.R. China
[email protected]
4
Department of Computer Science, City University of Hong Kong, P.R. China
[email protected]
5
Institute of Information Science, Academia Sinica
[email protected]

Abstract the document by TFIDF indexing [5]. However, the

major limitation of BOW is that it only retains the
In this paper, we propose a text representation frequency of the words in the document and loses the
model, Tensor Space Model (TSM), which models the sequence information.
text by multilinear algebraic high-order tensor instead In the past decade, attempts have been made to
of the traditional vector. Supported by techniques of incorporate the word-order knowledge with the vector
multilinear algebra, TSM offers a potent mathematical space representation. N-gram statistical language
framework for analyzing the multifactor structures. model [3, 4] is a well-known one among them. The
TSM is further supported by certain introduced entries of the document vector by N-gram
particular operations and presented tools, such as the representation are strings of n consecutive words
High-Order Singular Value Decomposition (HOSVD) extracted from the collections. They are effective
for dimension reduction and other applications. approximations and they not only keep the word-order
Experimental results on the 20 Newsgroups dataset information but also solve the language independent
show that TSM is constantly better than VSM for text problem. However, the high-dimensional feature
classification. vectors of them make many powerful information
retrieval technologies, such as Latent Semantic
1. Introduction and Related Work Indexing (LSI) [2] and Principal Component Analysis
(PCA) [6], unfeasible for large dataset.
Information Retrieval (IR) [2] techniques have During the past few years, the IR researchers have
attracted much attention during the past decades since proposed a variety of effective representation
people are frustrated by being drowned in huge amount approaches for text documents based on VSM.
of data while still being unable to obtain useful However, since the volume of available text data is
information. Vector Space Model (VSM) [2] is the increasing very fast nowadays, more and more
footstone of many information retrieval techniques, researchers suggest that [1]:
which is used to represent the text documents and “Are the further improvements likely to require a
define the similarity among them. broad range of techniques in addition to IR area?”
Bag of Word (BOW) [2] is the earliest approach These motivate us to seek for a new model for text
used to represent document as a bag of words under the documents representation based on some new
VSM. In the BOW representation, a document is techniques. The requirements for new model are to
encoded as a feature vector, with each element in the grasp the context of the word, language independent
vector indicating the presence or absence of a word in and to allow large dataset. In this paper, we propose a

*
This work is done during the first author worked at Microsoft Research Asia as an intern.

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE
Authorized licensed use limited to: Fu Jen Catholic Univ.. Downloaded on April 08,2021 at 00:45:54 UTC from IEEE Xplore. Restrictions apply.
novel Tensor Space Model (TSM) for text document I1 ×I2 × ×I N
tensor as be N if A ∈ R . The entries of A are
representation. The proposed TSM is based on the
Ai1 ai1
algebraic character level high-order tensors (the natural denoted by in iN
or in iN
where 1 ≤ in ≤ I n for
generalization of matrices) and offers a potent 1≤ n ≤ N .
mathematical framework for analyzing the multifactor The traditional BOW cannot catch and utilize the
structure [9] . In contrast to VSM, TSM represents a valuable word order information. On the contrary,
text document by high-order tensors instead of vectors although the N-gram representation of documents can
(1-order tensors) and matrices (2-order tensors). The imply this word sequence information, the high
features of each coordinate are letter “a” to “z” and all dimensional vectors generated lead to very high
the other analphabetic symbols such as interpunctions storage and computational complexity. The high
are denoted by “_”. Moreover a dimensionality complexity can fail many powerful tools such as LSI
reduction algorithm is proposed based on a tensor and PCA in the text mining and information retrieval
extension of the conventional matrix Singular Value process. Hence, we propose to use higher order tensors
Decomposition (SVD), known as the High-Order SVD to represent the text documents so that both the word
(HOSVD) [8]. HOSVD technique can find some order information and the complexity problem are
underlying and latent structure of documents and make considered. Moreover, we will show that the proposed
some algorithms such as LSI and PCA easy to be TSM can give many other advantages compared with
implemented under TSM. Moreover, the theoretical the popular used BOW and N-gram models.
analysis and experiments tell us that HOSVD under The TSM is a model of text document
TSM can find some underlying and latent structure of representation. We start from a simple example.
documents and can significantly outperform VSM on Consider the simple document below, which consists
the problems of classification with small training data. of eleven words:
Another contribution of TSM is that it can involve “Text representation: from vector to tensor”
many multilinear algebra techniques to increase the i1i2i3A = {a } ∈ RI1 ×I2 ×I 3
performance of IR. We use a 3-order tensor to
The rest of this paper is organized as follows. In represent this document and index this document by
Section 2, we focus on the multilinear model. The the 26 English letters. All the other characters except
HOSVD algorithm, which is used for computing the the 26 characters such as interpunctions and spaces are
underlying space, is presented in Section 3. The treated as the same and denoted by “_”. The character
experimental results on the 20 Newsgroups [7] are string in this document could be separated by
given in Section 4. Conclusion and future work are characters as:
presented in Section 5. “tex, ext, xt_, t_re, _re, rep, …”
The 26 letters “a” to “z” and “_” scale each axis of
the tensor space. Then the document is represented by
2. Tensor Space Model
a 27 × 27 × 27 tensor. The “_” character corresponds to
“Tensor” is a term in multilinear algebra. It is a zero of each axis, “a” to “z” correspond to 1 to 26 of
generalization of the concepts of “vectors” and each axis. For example, the corresponding position of
“matrices” in the area of linear algebra. Intuitively, a “tex” is (20, 5, 24) since “t” is the 20th character among
vector data structure is called 1-order tensor and a all the 26 English letters, “e” is the 5th and “x” is the
matrix data structure is called a 2-order tensor. Then a 24th. Another example, “xt_” corresponds to (24, 20,
cube-like data structure is called a 3-order tensor and and 0). Then we use the TFIDF method to weight each
so on. In other words, the higher order tensors are position of the tensor in the same way for VSM. By
abstract data structures that are generalization of doing so, each document is represented by a character
vectors and matrices. TSM makes use of the tensor level 3-order tensor, as shown in Figure 1.
structure to describe the text documents and makes use If we put a corpus of documents together, it is a 4-
of the techniques of multilinear algebra to increase the order tensor in a 27 × 27 × 27 × m space, where m is the
performance of IR. number of documents, as illustrated in Figure 2.
To start with, we introduce the following notations. Figure 2 only shows a 4-order TSM. In our model,
In this paper, scalars are denoted by lower case the order of tensor for each document is not limited to
3, thus the order of a tensor for a corpus of documents
letters a, b, , vectors are denoted by normal font
is not limited to 4. Without loss of generality, a corpus
capital letters A, B, , matrices by bold capital letters A, with m documents could be represented as a character
B,…, and the higher-order tensors are denoted by A = {a } ∈ R 27 ×27 × ×27 × m
curlicue capital letters A , B , . We define the order of a
i1i2i3 iN
level tensor where each
document is represented by an (N-1)-order tensor.

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE
Authorized licensed use limited to: Fu Jen Catholic Univ.. Downloaded on April 08,2021 at 00:45:54 UTC from IEEE Xplore. Restrictions apply.
Step2. For n = 1 N , compute the matrix U n by
Text performing the SVD of the flattened matrix D(n )
representation:
from vector to where U n is the left matrix in the SVD result.
tensor Step3. Solve the core tensor as follows
Z = D ×1 U1T ×2 U 2T ×N U NT
where matrix U n contains the orthogonal vectors
Figure 1. A document is represented as a character spanning the column space of the matrix D(n ) . D(n ) is
level 3-order tensor the matrix unfolding of the tensor, which is the matrix
representations of that tensor in which all the column
vectors are ranked sequentially.

4. Experiments
A corpus of
documents
4.1. Experiments Setup

We have conducted experiments to compare the

proposed TSM with VSM on 20 Newsgroups Dataset,
which has become a popular data set for experiments in
text applications of machine learning techniques. In
this paper, we select the five classes of 20 Newsgroups
Figure 2. A corpus of documents is represented as a collection about computer science that are very closely
4-order tensor related to each other.
The widely used performance measurements for
3. HOSVD Algorithm text categorization problems are Precision, Recall and
Micro F1 [2]. Precision is a ratio, which could be
VSM represents a group of objects as a “term by computed by the number of right categorized data over
object matrix” and uses SVD technique to decompose the number of all testing data. Recall is a ratio, which
the matrix as D = U1 ∑ U 2T , which is the necessary could be computed by the number of right categorized
technique for PCA and LSI. Similarly, a tensor D in data over the number of all the assigned data. Micro F1
TSM undergoes Higher-Order SVD, which is an is a common measure in text categorization that
extension of matrix SVD. This process is illustrated in combines recall and precision. In this paper, we use the
Figure 3 for the case N=3. Micro F1 measure, which combines recall and
precision into a single score according to the following
formula:
Micro F1 = 2 P × R
P+R
where P is the Precision and R is the Recall [2].

4.2. Experiment Results

Each text document of this dataset is mapped into a

131,072-dimension vector under VSM. We use a 4-
Figure 3. Illustration of a Higher-Order SVD (N=3) order tensor to represent each document under TSM.
The HOSVD algorithm based on TSM is described The original dimension of each tensor is 274=531,441.
as follows: (In Figures, x^y means xy). We use the HOSVD
Step1. Represent a group of documents as a character technique to reduce the dimension of the tensor to
different dimensions (264, 204 and 104). In Figure 4, we
level tensor D = {di1i2i3 iN +1 } ∈ R27 ×27 × ×27 ×m , where m is report the results of VSM in contrast to TSM in
the number of documents. different reduced dimensions.
It can be seen that the result of 4-order tensor with
the original dimension (274) is better than VSM and the
result of 4-order tensor whose dimension is reduced to

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE
Authorized licensed use limited to: Fu Jen Catholic Univ.. Downloaded on April 08,2021 at 00:45:54 UTC from IEEE Xplore. Restrictions apply.
264 by HOSVD is the best one. This proves that by document matrix”, and the rank of this matrix
HOSVD can find the principal component and wipe off determines there are at most 165 singular vectors. On
the noise. However, though the performance of 104- the contrary, the HOSVD under TSM can reduce the
dimension reduced tensor is much lower than VSM data to any dimension and is not limited by the number
and original 4-order TSM, the result is acceptable and of samples. Figure 5 shows that the 12 × 12 × 12 reduced
this low dimensional representation could make the tensors can achieve outstanding performance than all
computation more efficient. the others while the SVD under VSM cannot reduce
0.82 data to such a dimension.
0.8
5. Conclusion and Future Works
0.78

0.76 In this paper, we propose to use Tensor Space

Micro F1

Model to represent text documents. By using TSM and

0.74
HOSVD, some underlying and latent structure of
0.72 documents can be found. Theoretical analysis and
0.7 experimental results show that the proposed TSM
keeps the merits but improves some disadvantages of
0.68
VSM for certain IR problems.
VSM(131,072) TSM(27^4) TSM(26^4) TSM(20^4) TSM(10^4)
The TSM proposed in this paper implies many new
Figure 4. Text categorization with VSM and TSM tasks to be done. For instances, design of kernel of
on 20 Newsgroups tensors for better similarity measurement, testing of the
We do not reduce the dimension of 20 Newsgroups performance of TSM on non-language dataset,
data by SVD under VSM since the huge term by customization and application of the techniques
document matrix for 20 Newsgroups to be decomposed originally applied to traditional VSM for TSM, and
is hard to be done due to its high time and space investigation and application of more multilinear
complexity. To compare the VSM by SVD with TSM algebra theorems to increase the performance of IR
by HOSVD, we randomly sampled a subset of the 20 under TSM, are on our future work agenda.
Newsgroups with the ratio of about 5% such that the
data dimension is about 8,000 under VSM re-indexing. 6. References
The subset contains 230 documents in two classes, 165
for training and 65 for testing. By doing so, we can [1] Aslam, J., Belkin, N., Zhai, C., Callan, J., Hiemstra, D.,
perform the matrix SVD on this sampled data. Figure 5 Hofmann, T., Dumais, S., Harper, D.J., et.al. Challenges in
information retrieval and language modeling: report of a
shows the results.
workshop held at the center for intelligent information
0.9 retrieval, University of Massachusetts Amherst, 2001.
[2] Baeza-Yates, R. and Ribeiro-Neto, B. Modern
0.85
Information Retrieval. Addison-Wesley, 1999.
[3] Cavnar, W.B. and Trenkle, J.M., N-Gram-Based
0.8
Text Categorization. In Proceedings of the SDAIR-94,
Micro F1

0.75 3rd Annual Symposium on Document Analysis and

Information Retrieval, (1994), 161--169.
0.7 [4] Croft, W.B. and Lafferty, J. Language Modeling for
Information Retrieval. Kluwer Academic, 2003.
0.65 [5] Gerard, S. and Chris, B. Term Weighting Approaches in
VSM(8,192) VSM(125) VSM(64) TSM(27^3) Automatic Text Retrieval, Technical Report TR87-881,
TSM(5^3) TSM(4^3) TSM(12^3) Department of Computer Science, Cornell University,1987
[6] Jolliffe, I.T. Principal Component Analysis. Spriger
Figure 5. Text categorization on a subset of 20 Verlag, New York, 1986.
Newsgroups [7] Lang, K., NewsWeeder: Learning to Filter Netnews. In
It can be seen that if we reduce the data under VSM Proceedings of the 12th International Conference on
and the data under TSM to the same dimension (125 Machine Learning (ICML 1995), 331-339.
versus 53, 64 versus 43, etc.), the reduced TSM data [8] Lathauwer, L.D., Moor, B.D. and Vandewalle, J. A
can always outperform its counterpart. Moreover, the Multilinear Singular Value Decomposition. SIAM Journal on
Matrix Analysis and Applications, 21. 1253-1278.
dimension of reduced data under VSM cannot larger
[9] Wrede, R.C. Introduction to Vector and Tensor Analysis.
than 165 since the number of documents (smaller than Wiley, 1963.
the term dimension) determines the rank of this “term

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

1550-4786/05 $20.00 © 2005 IEEE
Authorized licensed use limited to: Fu Jen Catholic Univ.. Downloaded on April 08,2021 at 00:45:54 UTC from IEEE Xplore. Restrictions apply.

Information Retrieval On Cranfield Dataset
No ratings yet
Information Retrieval On Cranfield Dataset
15 pages
Unit iv
No ratings yet
Unit iv
57 pages
Unit iv
No ratings yet
Unit iv
58 pages
Support Vector Machines For Text Categorization Based On Latent Semantic Indexing
No ratings yet
Support Vector Machines For Text Categorization Based On Latent Semantic Indexing
4 pages
He Laskar 2019
No ratings yet
He Laskar 2019
4 pages
Kim 2016
No ratings yet
Kim 2016
5 pages
Online Learning in Tensor Space
No ratings yet
Online Learning in Tensor Space
10 pages
A New Approach To Represent Textual Documents Using CVSM
No ratings yet
A New Approach To Represent Textual Documents Using CVSM
6 pages
2104.06901
No ratings yet
2104.06901
9 pages
Data Science Interview Preparation Questions (#Day06)
No ratings yet
Data Science Interview Preparation Questions (#Day06)
10 pages
Text Mining - Vectorization
No ratings yet
Text Mining - Vectorization
24 pages
Lect04
No ratings yet
Lect04
44 pages
Network-Based Bag-Of-Words Model For Text Classification
No ratings yet
Network-Based Bag-Of-Words Model For Text Classification
12 pages
Sentiment Analysis based on vector embeding
No ratings yet
Sentiment Analysis based on vector embeding
5 pages
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
No ratings yet
NLP An Intuitive Understanding of Word Embeddings From Count Vectors To Word2Vec
18 pages
Vector Space Model
No ratings yet
Vector Space Model
7 pages
WMD PDF
No ratings yet
WMD PDF
10 pages
TNML ACML2020Tutorial QibinZhao
No ratings yet
TNML ACML2020Tutorial QibinZhao
109 pages
ML UNIT-II
No ratings yet
ML UNIT-II
27 pages
Margin-Based Active Learning and Background Knowledge in Text Mining
No ratings yet
Margin-Based Active Learning and Background Knowledge in Text Mining
6 pages
04 - Text Representation
No ratings yet
04 - Text Representation
131 pages
Document Classification Utilising Ontologies and Relations Between Documents
No ratings yet
Document Classification Utilising Ontologies and Relations Between Documents
8 pages
sheet 3 (3)
No ratings yet
sheet 3 (3)
5 pages
Distributed Representations of Sentences and Documents: Quoc Le Tomas Mikolov
No ratings yet
Distributed Representations of Sentences and Documents: Quoc Le Tomas Mikolov
9 pages
Paragraph Vector PDF
No ratings yet
Paragraph Vector PDF
9 pages
Levy Improving Distributional
No ratings yet
Levy Improving Distributional
16 pages
词向量嵌入综述
No ratings yet
词向量嵌入综述
10 pages
f33 Cai PDF
No ratings yet
f33 Cai PDF
8 pages
Unit 2 Updated New
No ratings yet
Unit 2 Updated New
77 pages
Q16-1028
No ratings yet
Q16-1028
16 pages
Word Embedding Generation For Telugu Corpus
No ratings yet
Word Embedding Generation For Telugu Corpus
28 pages
Word Embeddings a Survey
No ratings yet
Word Embeddings a Survey
11 pages
Dynamic Embedding Projection-Gated
No ratings yet
Dynamic Embedding Projection-Gated
10 pages
What is Information Retrieval (IR) (2)
No ratings yet
What is Information Retrieval (IR) (2)
9 pages
DM Chapter 9 - word embedding
No ratings yet
DM Chapter 9 - word embedding
7 pages
NLP Notes
No ratings yet
NLP Notes
11 pages
Part 3
No ratings yet
Part 3
5 pages
Unit-2-TB
No ratings yet
Unit-2-TB
20 pages
A Survey On Word Representation In Natural Language
No ratings yet
A Survey On Word Representation In Natural Language
7 pages
Semisupervised Autoencoder For Sentiment Analysis12059-55631-1-PB
No ratings yet
Semisupervised Autoencoder For Sentiment Analysis12059-55631-1-PB
7 pages
word embedding
No ratings yet
word embedding
35 pages
Zhou 2020
No ratings yet
Zhou 2020
5 pages
Using Word Embeddings For Text Search
No ratings yet
Using Word Embeddings For Text Search
32 pages
Learning Representations That Convey Semantic and Syntactic Information
No ratings yet
Learning Representations That Convey Semantic and Syntactic Information
14 pages
GML Part2
No ratings yet
GML Part2
48 pages
GML Part3
No ratings yet
GML Part3
49 pages
DSSM Mediasearch HICSS Newedit01
No ratings yet
DSSM Mediasearch HICSS Newedit01
6 pages
NLP Asgn3
No ratings yet
NLP Asgn3
6 pages
Bag of Words
No ratings yet
Bag of Words
32 pages
Topic Modelling and LSA
No ratings yet
Topic Modelling and LSA
10 pages
NLP Asgn2
No ratings yet
NLP Asgn2
7 pages
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
100% (1)
Word2Vec - A Baby Step in Deep Learning But A Giant Leap Towards Natural Language Processing
12 pages
Word and Document Embeddings
No ratings yet
Word and Document Embeddings
94 pages
Document Classification Using Distributed Machine Learning
No ratings yet
Document Classification Using Distributed Machine Learning
4 pages
Product Classification in E-Commerce Using Distributional Semantics
No ratings yet
Product Classification in E-Commerce Using Distributional Semantics
17 pages
final_accept
No ratings yet
final_accept
11 pages
Text Classification Using Support Vector Machine IJERTV1IS3174
No ratings yet
Text Classification Using Support Vector Machine IJERTV1IS3174
4 pages
wete 2203.01570v2
No ratings yet
wete 2203.01570v2
17 pages
Efficient String Processing with Trie Structures: Definitive Reference for Developers and Engineers
From Everand
Efficient String Processing with Trie Structures: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Programming with X10: Definitive Reference for Developers and Engineers
From Everand
Programming with X10: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Concussion PBL Student Documents 1 Autosaved
No ratings yet
Concussion PBL Student Documents 1 Autosaved
9 pages
The Five-Decade Journey of EU-ASEAN Relations - Which Path For The Decades To Come_ Digital Version(Blue-Economy)
No ratings yet
The Five-Decade Journey of EU-ASEAN Relations - Which Path For The Decades To Come_ Digital Version(Blue-Economy)
320 pages
Manual of Clinical Phonetics - 1st Edition Direct Download
100% (13)
Manual of Clinical Phonetics - 1st Edition Direct Download
16 pages
Cream Pastel Palette Healthcare Center Characters
No ratings yet
Cream Pastel Palette Healthcare Center Characters
49 pages
Utbk 3 Ppu Pembahasan
No ratings yet
Utbk 3 Ppu Pembahasan
16 pages
Important Details On Neet 2020 NEET 2020 Highlights: Events Dates
No ratings yet
Important Details On Neet 2020 NEET 2020 Highlights: Events Dates
6 pages
Lecture 12
No ratings yet
Lecture 12
38 pages
Training Methods and Techniques
100% (3)
Training Methods and Techniques
15 pages
Particle Verbs in English - A Cognitive Linguistic Perspective PDF
100% (1)
Particle Verbs in English - A Cognitive Linguistic Perspective PDF
189 pages
10 PDF
No ratings yet
10 PDF
29 pages
TP-2 Fluid
No ratings yet
TP-2 Fluid
30 pages
Employment News 20-26 July 2024
No ratings yet
Employment News 20-26 July 2024
72 pages
2024-25 sem I Lab Manual -Diploma
No ratings yet
2024-25 sem I Lab Manual -Diploma
41 pages
Jonathan Xavier Inda, Renato Rosaldo - The Anthropology of Globalization - A Reader (Blackwell Readers in Anthropology) - Wiley-Blackwell (2002)
No ratings yet
Jonathan Xavier Inda, Renato Rosaldo - The Anthropology of Globalization - A Reader (Blackwell Readers in Anthropology) - Wiley-Blackwell (2002)
255 pages
Blue and Pink Outline Pastel Colors College Graduate Job Seeker Customer Service Representative Video Resume Talking Presentation
No ratings yet
Blue and Pink Outline Pastel Colors College Graduate Job Seeker Customer Service Representative Video Resume Talking Presentation
14 pages
World Journal of Microbiology and Biotechnology - Submission Guidelines
No ratings yet
World Journal of Microbiology and Biotechnology - Submission Guidelines
51 pages
Prelim Examinationprofed 2 Sped
No ratings yet
Prelim Examinationprofed 2 Sped
2 pages
ስልጠና
No ratings yet
ስልጠና
60 pages
History 101-Fast Food and Rise of China
No ratings yet
History 101-Fast Food and Rise of China
2 pages
Employee Onboarding: Neematte Media Corp
No ratings yet
Employee Onboarding: Neematte Media Corp
25 pages
Lead Designers Handbook-The Lead Designer and Design Management by Dale Sinclair (Author)
100% (1)
Lead Designers Handbook-The Lead Designer and Design Management by Dale Sinclair (Author)
270 pages
Ancient Egypt Religion Lesson Plan
No ratings yet
Ancient Egypt Religion Lesson Plan
3 pages
Developing A System For Down-Hole Seismic Testing Together With The CPTU
No ratings yet
Developing A System For Down-Hole Seismic Testing Together With The CPTU
14 pages
The Agility Construct On Project Management Theory
No ratings yet
The Agility Construct On Project Management Theory
4 pages
Cree Xlamp Xm-l2 Leds: Product Family Data Sheet
No ratings yet
Cree Xlamp Xm-l2 Leds: Product Family Data Sheet
12 pages
García Et Al., 2014
No ratings yet
García Et Al., 2014
7 pages
Vorwort Dissertation Jura
100% (2)
Vorwort Dissertation Jura
7 pages
DLL Math 1 Q2.W5
No ratings yet
DLL Math 1 Q2.W5
5 pages
Explicação EN 573-1
No ratings yet
Explicação EN 573-1
4 pages
94. Yang et al., 2024, From surface to deep learning approaches with Generative AI in higher education an analytical framework of student agency
No ratings yet
94. Yang et al., 2024, From surface to deep learning approaches with Generative AI in higher education an analytical framework of student agency
15 pages

Text representation_ from vector to tensor

Uploaded by

Text representation_ from vector to tensor

Uploaded by

Text Representation: from Vector to Tensor*

Abstract the document by TFIDF indexing [5]. However, the

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

We have conducted experiments to compare the

4.2. Experiment Results

Each text document of this dataset is mapped into a

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

0.76 In this paper, we propose to use Tensor Space

Model to represent text documents. By using TSM and

0.75 3rd Annual Symposium on Document Analysis and

Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05)

You might also like