0% found this document useful (0 votes)

76 views9 pages

Topic Classification and Sentiment Analysis For Vietnamese Education Survey System

Uploaded by

Quyền Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views9 pages

Topic Classification and Sentiment Analysis For Vietnamese Education Survey System

Uploaded by

Quyền Nguyễn

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/303786096

Topic classiﬁcation and sentiment analysis for Vietnamese education survey

system

Article in Asian Journal of Computer Science and Information Technology · May 2016

DOI: 10.15520/ajcsit.v6i3.44.g31

CITATIONS READS

11 1,666

4 authors, including:

Hung T. Vo Duc Dung Nguyen

Ho Chi Minh City University of Technology (HCMUT) Sungkyunkwan University
11 PUBLICATIONS 26 CITATIONS 34 PUBLICATIONS 218 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Hung T. Vo on 03 June 2016.

The user has requested enhancement of the downloaded file.

Asian Journal of Computer Science And Information Technology 6: 3, May (2016) 27 -34.

Contents lists available at www.innovativejournal.in

Asian Journal of Computer Science And Information Technology

Journal Homepage: http://innovativejournal.in/ajcsit/index.php/ajcsit

TOPIC CLASSIFICATION AND SENTIMENT ANALYSIS FOR VIETNAMESE

EDUCATION SURVEY SYSTEM
Hung T. Vo, Hai C. Lam, Duc Dung Nguyen, Nguyen Huynh Tuong
Faculty of Computer Science and Engineering
Ho Chi Minh city University of Technology, VNU-HCM
268 Ly Thuong Kiet St., Dist. 10, Ho Chi Minh city, Vietnam
[email protected]

ARTICLE INFO ABSTRACT

Corresponding Author: Collecting survey and feedback for analyzing useful information plays an
Nguyen Huynh Tuong important role in many fields such as business, market, manager, etc. In
Faculty of Computer Science and education, this analysis is the key in improving the teaching quality and the
Engineering management process. We are interested in comments on students which are
Ho Chi Minh city University of collected in the surveys. To evaluate the progress of students, surveys are
Technology, VNU-HCM collected from related person, including lecturers, teaching assistants, and
268 Ly Thuong Kiet St., Dist. 10, companies.Currently, the comments on students are processed manually. In
Ho Chi Minh city, Vietnam this work, we propose a method for classifying topicand analyzing the
[email protected] sentiment in these comments. We employ machine learning technique to solve
these two sub-problems.We introduce a new concept called Bag-of-Structure
Keywords: Topic Classification; (BoS) for sentiment mining and classification process.The experiment results
Education Survey; Machine demonstrate that our proposed method yields useful information for making
Learning; Sentiment Analysis. decision to improve the current education system.

DOI:http://dx.doi.org/10.15520
/ajcsit.v6i3.44
©2016, AJCSIT, All Right Reserved.
1. INTRODUCTION
Improving education quality is a critical task of any mixed or vague. Thus, it is hard to measure these opinions
qualified university.In our faculty, the senior students are using simple scales, such as positive, negative, or neutral
offered internship positions in company in order to gain [6].
more work experience. The company environment helps SA, also known as opinion mining, is the hottest topic in
students understand and apply their knowledge learn from text mining. The goal of SA is extracting sentiment of a
university,and improve their skills. The companies will given comment. This goal is achieved through two phases.
send feedback on students in form of comments in our In the first phase, the algorithm must check if the document
surveys. The collected comments help us improve the or sentence is either subjective or objective. The second
quality of our education program, the gap between phase focus on extracting sentiment from the document.
academic environment and the industry in order to Text classification, on the other hand, focus on assigning a
improve the education quality.Analysis these data, predefined label to a given text sentence or document. It is
however, is a nontrivial task. also called text categorization, and being widely used in
Since a huge amount of documents needs to be NLP.
processed, automatically categorizing and summarizing Related works.In education survey system, we need to
these documents will make the analyzing process easier. It extract the topic of statement and perform sentiment
is unfortunately that document processing algorithms are analysis on these comments. There are many text
still far from extracting information as human. In order to classification methods that can be used for extracting the
extract sentiment in comments, we must employ the statement topic. These techniques are often focusing on
textual properties and analyze the language structure, either statistic approach, or natural language processing
which is a challenge even for current state-of-the-art approach.
methods [21]. Recently, Sentiment Analysis (SA) in Natural At present, there have many text classification based on
Language Processing (NLP) has gained more attention in statistics. Some classic techniques in this field are Neural
the community. Previous research mostly discusses on Network (Li et al. [20]),Support Vector Machine (Joachims
identifying the polarity and subjectivity of a text [14]). Using another approach, Suli Zhang et al. had used
documentor a sentence. In practice, people's opinions are Mahalanobis distance on K Nearest Neighbors (KNN), and

Author(s) agree that this article remain permanently open access under the terms of the
Creative Commons Attribution License 4.0 International License Page 27
Tuong / Topic classification and sentiment analysis for Vietnamese education survey system

proposed MDKNN to improve KNN for text classifier [27].In The following section presents the problem discussed in
the work of Huang et al. [13], the authors apply collected our research. Section 3 declares our proposed model and
word list to improve Bayesian classification.In this method, details of algorithms.The experiments are discussed in
they combine both predefined and extracted word lists by Section 4.Finally, Section 5 provides the conclusion and
topic to obtain feature terms for classifying.Some of others discussion on future work.
researches employ genetic algorithm [24, 7], or Naive 2. Mining from education surveys
Bayes technique such as in the work of Altheneyan [3]. Continuously improving quality is the key of success in any
Other approaches use natural language processing education system. In order to perform this task, we have to
for text classification problem. Lexical dependency and collecting information from various sources. The available
pruning were proposed by Ozgur [22].In this study, lexical information is course grades, thesis quality, the number of
dependency was embedded in feature vector along the students getting job after graduation, etc. Some extra
standard Bag-of-Words (BoW) approach. From [16], a information collected from the companies, organizations,
comparison of Word-based and Sense-based shows that and students are extremely useful in mining process.
there is no big difference between Sense-based and Word- Evaluation of education quality can be obtained in many
basedapproaches.In Vietnamese, there are not many works ways. Beside the transcript of students, the comments in
in field. One highlightedresearch in Vietnamese text surveys from other sides are valuable responses for our
classification is the work of Hoang et al.[11]. In this study, education system. The comments contain a lot of valuable
Hoang et al. used BoW and Statistical N-Gram Language information but it is hard for analyzing due to the
Modeling (N-Gram) approaches. complexity of nature language. Automatically analyzing
Sentiment analysis is the second key in education these comments is a non-trivial task that requires expert
survey system. In fact, sentiment analysis is widely used in knowledge and state-of-art machine learning techniques. In
various applications. In [5], C. Bucur proposed a platform this work, we focus on classifying and extracting opinions
for extraction and summarizing of opinions in tourism from these documents.
business. Another work comes from E. D'Avanzo et al.[8] In natural language, we may address our opinion about
which focus on social network mining to obtain user subjects in various ways. One may use long sentences, or
opinions for customer-care services. There are three sometimes complex structure sentences to present their
approaches in sentiment analysis: knowledge-based ideas on different problems. In order to reduce the
techniques, statistical methods, and hybrid complexity, we restrict our problem to analyzing single
approach.Knowledge-based techniques use affect words sentence, i.e. with a simple and well defined structure. For
such as happy, sad, afraid, bored, etc. to determine each sentence, the following information needs to be
statement categories. In this case, dictionary should be extracted:
used, some of them includes: WordNet-Affect, • The skill of student/engineer mentioned in the
SentiWordNet, etc. Khan et al.[17] used SentiWordNet for sentence.
polarity detection. Jibran et al.[25] proposed a model for • The problem addressed in the sentence.
aspect-based opinion mining,and used SentiWordNet to get • Opinion implied in the sentence.
opinion of words. In another work, Khalid et al.[1] • Is the opinion positive or negative? Which level will we
introduced BiSAL, a bilingual sentiment analysis lexical, for assign to it?
cyber security domain. This is done by solving two sub-problems: topic
Some statistical methods for sentiment analyzing, which classification and sentiment analysis.In this work, we focus
are based on machine learning, are SVM, KNN. In [19], on solving these two problems and analyzing the results for
Youngjoong et al. studied on term weighting schemes and improving quality of our education system.
applied it in machine learning for sentiment analysis. In Different from detailed reports, the comments on student's
[26], dependency trees and named-entities were extracted skills are often restricted in some specific topics that we
for the sentiment classifier.Hybrid approaches, on the other should be able to classify. For computer science students in
hand, focus on both machine learning and knowledge our faculty, the skills that are mentioned most are
representation such as ontologies and semantic networks. programming, design, communication, presentation, foreign
In [18], Khan et al. proposed Twitter opinion mining language, etc. Thus, a finite set of skills is definedfor
framework using hybrid classification scheme.In this classifying the comments. We propose a method based on
research, there are many lexical steps, including Enhanced machine learning which utilize dictionary to recognize the
Emoticon Classifier, Improved Polarity Classifier and topic of given sentence.
SentiWordNet Classifier, that are processed to produce the For sentiment analysis, we concern about positive and
input for the machine learning algorithm. This machine negative opinions in the comments. We are also interested
learning algorithm is then applied to the tweet in the rank of those opinions. There rank can be evaluated
classification problem. in 1-5 scale, 1-10 scale, Alphabet scale, etc. In this case, we
BoW is a general model to present the document in select the 1-5 scale to evaluate rank of opinions extracted
Natural Language Processing (NLP). In this representation, from the comment. The rank is defined in the 1-5 scale as
words were collected from document(s) is used to make follows:5-excellent, 4-good, 3-normal, 2-bad, 1-too bad.
one dimension of feature vector. A text (document or 3. Proposed methods
sentence) is represented as the bag of its words, 3.1 System model
disregarding grammar and even word order but keeping Figure 1 describes the proposed system with two
multiplicity [23]. BoW points out to words that occur in main modules that can operate independently in parallel:
corpus but it does not care about the structure of • Text classification module: the mission of this module is
document. Although this model was successful on several to classify input into classes. In this case, it gives
applications, it could be improved for sentiment analysis. information about topic mentioned in the sentence.

28
Tuong / Topic classification and sentiment analysis for Vietnamese education survey system

• Sentiment analysis module: this module detects the opinion in a given sentence(e.g., good, bad, etc.)

Figure 1: Structure of the proposed system

These two modules perform difference training steps. The occurrences in two sentences are the same (["Nam":1,
training processes are presented by rectangles surrounded "tham gia":1, "vào":1, "các":1, "hoạt động":1, "học tập":1,
by dotted lines in Figure 1. The main technique depends on "không":1, "vi phạm":1, "quy chế":1]). However, the
the way to build sentence structure. The detail is presented meaning of first sentence is good, but in the second one, the
in section 3.2.A sentence structure may have one or many feeling is not good.
atomic structures (compound sentence e.g.).An atomic In order to express opinion, person uses many ways such
structure includes a set of objects and their relationship. as description, comparison, metaphor.Our approach
In figure 1, corpus is preprocessing at each training step. proposes to solve in prior to descriptive way. Two other
The BoS module was used to build structures of approaches are not considered in this study.
comments.After structures were built, we use vectors of all Figure 2 gives phrase structure “Tiến nên cải thiện khả
comments in training module to build model for topic năng lập trình tốt hơn” (Tien should improve programming
classification and sentiment analysis.Both learned models skill). The part-of-speech (POS) of the words are
will be used in future for comment analysis.In predicting ["Tiến"/np, "nên"/au, "cải thiện"/v, "khả năng"/n, "lập
step, comments are inputs of both topic classification and trình"/n, "tốt"/a, "hơn"/r]. Omitting noun phrases, the
sentiment analysis systems. These comments are structure of statement was obtained as in figure 2. In this
preprocessing before we push them into predicting module case, verb "cải thiện" is root of tree, nên/auand adjective
using trained models.We propose a structure model, phrase support to this verb. On the subtree, tốt/ais
named Bag-of-Structures (BoS), to collect atomic structures supported by hơn/r.
in the document.The occurrence of each structure is used
as a feature for training a classifier (as presented in section
3.3).
3.2 Building structure of comment
Many researches in the literature consider BoW (Bag-of-
Words) to present a document as vector used on NLP [2, 4,
15].BoW observes only the frequency of words in
document. In natural language, the order of words and
structure play important roles to understand the meaning
of sentence.
For example, let's consider two sentences: “Nam tham gia
vào các hoạt động học tập, không vi phạm quy chế” 1and
“Nam không tham gia vào các hoạt động học tập, vi phạm
quy chế” 2. They include the same set of words but different
meaning from each other. The list of words and number of Figure 2: Structure of a Vietnamese statement: 'Tien
should improve programming skill'
1
There are two kinds of phrases were used in our method:
Nam participate in course activities, not violate regulations adjective phrase and verb phrase.
2
Nam do not participate in course activities, violate regulations

29
Tuong / Topic classification and sentiment analysis for Vietnamese education survey system

• Adjective phrase is usually used to describe comparisons and searches in structure that must be done
opinion. For example, "đẹp" (beautiful), "tốt" in the next step - feature selection step.
(good), "chất lượng cao" (high quality),... In the proposed structure, root of the tree is the
• Verb phrase sometimes is used to give strong adjective word declared in the corresponding statement.
sentiment. For example: "cần cố gắng thêm" (need The child on the left if exists, is always the verb. The middle
to try more), "cần tuân thủ" (should comply). child refers to the adjunct. And other child on the right is
The general structure of one statement is given in for other adjunct.Notice that, not all sentences have full
figure3. Each node on structure is presented by a tuple structure. In that case, missing value will be null. Figure 4
<POS, text, Dep>. $POS$ is Part-of-Speech, e.g.: V, A, shows the normalized structure.
Au,...;text is word andDep contains information about
relation between this node and its parent.

Figure 4: Final structure of statement

From the comments in collected surveys, we observed the
following properties based on the structure of Vietnamese
language:
Figure 3: General structure of statement Observation 1 The depth of a tree representing the
There are some constraints in our method: structure of a simple statement is not greater than three.
• Each sentence is observed independently for Observation2The root of a tree representing a structure of
parsing structure. simple statement has at most three children.
• One statement may have one or more separated These two observations are discussed further in the study
structures. of Hieu [10], where the analysis is applied to single
After all needed patterns are detected, the normalize phase sentences.
is performed.It is an important step since there are many

Figure 5: Some sub-structures extracted from figure 4

30
Tuong / Topic classification and sentiment analysis for Vietnamese education survey system

3.3 Feature selection Up to now, we have 845 sentences ready to use. In our
We propose to use Bag-of-Structure (BoS) in feature approach, only simple sentence was examined.There is an
selection.BoS model is used similarly to BoW, so that from imbalance of dataset between classes. Hence, simple
corpus, all structures were collected.From built structures oversampling was used to balance it.In this work, we
and sub-trees extracted (as in figure 5),we can perform compare our proposed method with the classic BoW
vectorizing on each comment.For each structure, all sub- approach.
structures are extracted. Figure 5 gives some structures Evaluation criteriaTo evaluation effective of proposed
extracted from figure 4. method, accuracy, precision, recall, f-Measure and root
With list of words (in BoW) or tree (in BoS), the TF-IDF is mean squared error (RMSE) were collected.
used to make vector represented each document. TF is 4.2 Parameters
short for term frequency; and IDF is short for inverse In order to evaluate the efficiency of the structure building
document frequency.Let N be the number of all categories, process using comments, we use machine learning
and let n be the number of the texts which contains term techniques.We compare BoS to BoW, where BoW models
t.Equation 1 give the way to calculate IDF value. are constructed using Binary (BoW-Binary) or TF-IDF
𝑁𝑁
( ) (1)
𝐼𝐼𝐼𝐼𝐹𝐹 = log⁡ (BoW-TFIDF).Three popular classifiers are used in our
𝑛𝑛
experiment include NaiveBayes, KNN (IB), and Support
Combining TF and IDF we have equation 2 as follows.
𝑁𝑁 Vector Machine (SMO). In our experiment, Weka library [9]
TF − IDF = TF ∗ log ( ) (2) are used. Vietnamese tools such as word slipper are
𝑛𝑛
Beside TF-IDF, Binary weighting scheme can be used to provided by [12]. Parameter(s) for each algorithm are
construct a vector representing the document. The weight following:
is set to one if the word presents in the document, and zero • NaiveBayes: no parameter
otherwise. • IB: k = 1, W=0
Our proposed method employs BoS for representing the • SMO: PolyKernel function was used, all other
sentence with respect to the relationship between words. parameters were default
Comparing to previous methods that only use a collection 4.3 Results
of words, our method leads to improvements in Table 1 presents result of sentiment analysis. We compare
classification result and solves a part of the semantic BoW-Binary, BoW-TFIDF and proposed method BoS. From
problem. As we can observe later, BoS has a great impact the result table, our proposed methods, which used on
on the topic classification and sentiment analysis results. feature selection, produces dominant results among three
The following section demonstrates our experimental popular machine learning models. The results show
results using the proposed system. improvements in accuracy, and it outperforms other
4. Experimental results models. The accuracy increases 12.02% from 53.39% to
4.1 Dataset and evaluation criteria 59.81% for NaiveBayes, 11.94% from 68.88% to 77.11%
There are limited numbers of datasets for testing our for Nearest Neighbors,and with SMO, it extremely improves
system. Actually, it is difficult to found a public Vietnamese 21.55% from 64.08% to 77.89%.With SMO algorithm,
dataset in this domain. Hence a dataset was built from real precision improve 17.19% from 0.657 in BoW to 0.77 with
data of our department.Our data have been collected BoS. With IB k = 1, BoS method improve 12.28% from
through 8 semesters, from 2011 to 2015.Data includes 0.692 to 0.777.
comment of companies on students who participated in the
internship program during the period of 2-4 months.
Table 1: Sentiment analysis result

Figure 6 illustrates the classifier errors. The X-axis presents is 0.629. In fact, sometimes rating the comments is more
the original class, and the Y-axis presents the predicted difficult and inaccurate.For the same comment content,
class. We can see that in the first class, 100% result match there may be more than one rate. Observing error points,
with the original data. Classes 3 and 4 have many errors. we can see that most of error varies around the
The precision of class 4 is lowest, only 0.612 and for class 3 groundtruth.

Table 2: Topic classification result

31
Tuong / Topic classification and sentiment analysis for Vietnamese education survey system

Figure 6: Classifier error of SMO on BoS-TFIDF

Table 2 shows the result of topic classification over three As we demonstrated in the experiments, using BoS for
algorithms. The highest accuracy belongs to SMO algorithm feature selection has a significantimpact on the
with 87.95%, and IB with k = 1 give best RMSE with value performance of our proposed algorithms for both topic
0.1764. Among these three machine leaning algorithms, classification and sentiment analysis.
BoS model always yields the best result with accuracy 5. CONCLUSION
around 10%. In the best cases, with SMO algorithm, BoS The experimental results show that our proposed
method produces result with accuracy higher than other BoS model can help improving the quality of topic
methods up to 17%. classification on comments, and obtaining reliable results
in sentiment analysis.The results of BoS are often higher

32
Tuong / Topic classification and sentiment analysis for Vietnamese education survey system

than BoW about 10%, and in some cases, it is better by [12] Le Hong Phuong, Nguyen Thi Minh Huyen, Azim
20%. Our proposed method focus on adjective phrase and Roussanaly, and Ho Tuong Vinh. Language and
verb phrase to extract the structure of comment. From Automata Theory and Applications: Second International
analyzed results, it can be seen that the classification Conference, LATA 2008, Tarragona, Spain,
quality and the robustness of sentiment analysis is much March 13-19, 2008. Revised Papers, chapter A Hybrid
better than BoW. In addition, we have built the dependency Approach to Word Segmentation of Vietnamese
structure of adjective phrase and verb phrase from Texts, pages 240–249. Springer Berlin Heidelberg, Berlin,
comments in the Vietnamese education system.Our Heidelberg, 2008.
proposed method is especially useful for Vietnamese [13] Huan Huang, Qingtang Liu, Linjing Wu, Tao Huang, and
context, where the language structures are complex and Shuai Yuan. The Application Research of
unclear. Topic Word List In Text Automatic Classification. 2009
BoS, however, is not limited in this application and can be Second International Symposium on Knowledge
applied to various fieldsin the future. We also focus on Acquisition and Modeling, pages 111–114, 2009.
enrich the sentiment dictionary for our system. Our future [14] Thorsten Joachims. Text categorization with suport
work will also focus on advanced methods for the feature vector machines: Learning with many relevant features. In
selection in order to obtain better analysis results. Proceedings of the 10th European Conference on Machine
Learning, ECML ’98, pages 137–142,
ACKNOWLEDGMENT
London, UK, UK, 1998. Springer-Verlag.
This research is funded by Ho Chi Minh city University of
[15] Manika Kar, Sergio Nunes, and Cristina
Technology - Vietnam National University under grant
Ribeiro.Summarization of changes in dynamic text
number T-KHMT-2014-36
collections ´
REFERENCES
using Latent Dirichlet Allocation model.Information
[1] Khalid Al-Rowaily, Muhammad Abulaish, Nur Al-Hasan
Processing & Management, 2015.
Haldar, and Majed Al-Rubaian. BiSAL - A
[16] Athanasios Kehagias, Vassilios Petridis, Vassilis G
bilingual sentiment analysis lexicon to analyze Dark Web
Kaburlasos, and Pavlina Fragkou.A Comparison of
forums for cyber security.Digital Investigation,
Word- and Sense-Based Text Categorization Using
14:53–62, 2015.
Several.Journal of Intelligent Information Systems,
[2] Mayy M. Al-Tahrawi and Sumaya N. Al-Khatib. Arabic
21(3):227–247, 2003.
Text Classification Using Polynomial Networks.
[17] Farhan Hassan Khan, Usman Qamar, and Saba Bashir.
Journal of King Saud University - Computer and Information
SentiMI: Introducing point-wise mutual information with
Sciences, 27(4):437–449, 2015.
SentiWordNet to improve sentiment polarity detection.
[3] Alaa Saleh Altheneyan and Mohamed El Bachir
Applied Soft Computing Journal,
Menai.Na¨ ıve Bayes classifiers for authorship attribution
39:140–153, 2016.
of Arabic texts.Journal of King Saud University - Computer
[18] Saba; Qamar Usman Khan, Farhan Hassan; Bashir.
and Information Sciences, 26(4):473–484,
Tom: Twitter opinion mining framework using hybrid
2014.
classification scheme. Decision Support Systems, 57, 01
[4] Berna Altinel, Banu Diri, and Murat Can Ganiz. A novel
2014.
semantic smoothing kernel for text classification
[19] Y. Ko. A study of term weighting schemes using class
with class-based weighting.Knowledge-Based Systems,
information for text classification.Proceedings
89(July):265–277, 2015.
of the 35th international ACM SIGIR conference on Research
[5] Cristian Bucur. Using Opinion Mining Techniques in
and development in information retrieval -
Tourism.Procedia Economics and Finance, 23(October
SIGIR ’12, page 1029, 2012.
2014):1666–1673, 2015.
[20] Wei Li, B. Lee, F. Krausz, and K. Sahin. Text
[6] Wang D. and Liu Y. Opinion summarization on
classification by a neural network. In Proceedings of the
spontanous conversations.Computer Speech and Language,
23rd Annual Summer Computer Simulation Conference,
34:61–82, 2015.
pages 313–318, Baltimore, US, 1991.
[7] Saad M. Darwish, Adel A. EL-Zoghabi, and Doaa B.
[21] Turan M. and Sommez C. Automatize document topic
Ebaid.A Novel System for Document Classification
and subtopic detection with support of a corpus.
Using Genetic Programming.Journal of Advances in
Procedia – Social and Behavioral Sciences, 177:169–177,
Information Technology, 6(4):194–200, 2015.
2015.
[8] Ernesto D’Avanzo and Giovanni Pilato. Mining social
[22] Levent Ozg ¨ ur and Tunga G ¨ ung ¨ or. Optimization of
network users opinions’ to aid buyers’ shopping
dependency and pruning usage in text classification. ¨
decisions. Computers in Human Behavior, 51:1284–1294,
Pattern Analysis and Applications, 15(1):45–58, 2012.
2015.
[23] Josef Sivic and Andrew Zisserman. Efficient visual
[9] Mark Hall, Eibe Frank, Geoffrey Holmes, Bernhard
search of videos cast as text retrieval. IEEE Transactions on
Pfahringer, Peter Reutemann, and Ian H. Witten. The
Pattern Analysis and Machine Intelligence, 31(4):591–606,
WEKA data mining software: an update. SIGKDD
2009.
Explorations, 11(1):10–18, 2009.
[24] Borge Svingen. Using Genetic Programming for
[10] Nguyen C. Hieu. Mo hinh khai thac dac tinh ngon ngu
Document Classification.Late Breaking Papers at the
dich nham xac dinh cac cum danh tu co so tuong
1997 Genetic Programming Conference, pages 240–245,
ung Anh Viet. PhD thesis, Hochiminh city university of
1998.
Technology.
[25] Muhammad Usman. An Effective Model for Aspect
[11] Vu Cong Duy Hoang, Dien Dinh, Nguyen Le Nguyen,
Based Opinion Mining for Social Reviews.
and Hung Quoc Ngo. A comparative study on
(Icdim):49–56, 2015.
vietnamese text classification methods. In RIVF, pages 267–
273. IEEE, 2007.

33
Tuong / Topic classification and sentiment analysis for Vietnamese education survey system

[26] Ugan Yasavur, Jorge Travieso, Christine Lisetti, and Mahalanobis distance. 2011 3rd International Conference
Naphtali Rishe.Sentiment Analysis Using Dependency Trees on Computer Research and Development, pages 156–158,
and Named-Entities.Flairs ’14, pages 134–139, 2014. 2011.
[27] Suli Zhang. A novel text classification based on

How to cite Article: NGUYEN HUYNH TUONG, Hung T. Vo, Hai C. Lam Duc Dung Nguyen. Topic classification and
sentiment analysis for Vietnamese education survey system. Asian Journal of Computer Science and
Information Technology, [S.l.], v. 6, n. 3, may. 2016. ISSN 2249-5126. Available at:
<http://innovativejournal.in/ajcsit/index.php/ajcsit/article/view/44>. Date accessed: 31 May. 2016.
doi:10.15520/ajcsit.v6i3.44.

34
View publication stats