CNN vs. LSTM For Turkish Text Classification
CNN vs. LSTM For Turkish Text Classification
Abstract—In this paper, the efficiency of two states of the art II. R ELATED W ORK
text classification techniques, i.e., Convolutional Neural Networks
(CNN) and Long Short-Term Memory (LSTM) for supporting the In this section, we have summarized recent researches,
Turkish text classification has been investigated. In addition, the developments, and solutions related to text classification. The
effect of the main preprocessing steps such as Tokenization, Stop
Word Elimination, Stemming, etc. has also been studied. Several
study of Çelenli et al., aims to develop a centroid based
experiments using “TTC-3600” dataset were performed, and it classifier. In this study, documents are represented by vectors
has been observed that both CNN and LSTM can efficiently using the Paragraph Vector model (Doc2Vec). The results of
support the Turkish language and can achieve quite good their experiments indicate that using Distributed Bag of Words
performance. Related to data preprocessing, results indicated (DBOW) architecture with five epochs of classifiers paired
that such a process improves the performance, however, for the
Turkish language, it is preferred to exclude stemming. Also, by
with document embedding vectors obtains the best accuracy.
comparing the performance of feature extraction techniques for Also, it has been observed that using more epochs decreases
processing Turkish language, Word2Vec outperforms TF-IDF. the classification accuracy of the Doc2Vec interestingly. On
Index Terms—Text Classification, Turkish Language, Convo- the other hand, using scarce data amount leads the Doc2Vec to
lutional Neural Networks, Long Short-Term Memory, Natural outperform the SVM classifiers that use tf-idf representations
Language Processing. [6].
Another study was done by Şahin, which compares the use
I. I NTRODUCTION of word2vec in the classification of seven different categories
of Turkish texts with the classical bag of words (BoW) text
In today’s world, the possession of knowledge or infor- representation. Here, each sample was expressed by a vector
mation holds an important place for people, companies, or that has the average of the sample’s words, then, SVM was
even states. However, the extraction of this information is used as the classifier. The experiments were conducted for
quite an essential and hard task. To overcome this problem different parameter settings of word2vec and its effect on
and to obtain the requested information, information retrieval classification success was examined. The study observed the
(IR) systems were developed. In this paper, we are going to accuracy of word2vec which is at the best-measured value was
investigate two text classification techniques, i.e., Convolu- 0.92F is better than tf-idf weighted BoW method which is at
tional Neural Networks (CNN) and Long Short-Term Memory the best-measured value of 0.89F [2].
(LSTM), for supporting the Turkish text classification. The classification performance of heterogeneous classifier
In general, text classification can be defined as the process ensembles for Turkish and English languages was investigated
of stating previously declared categories to text documents. by Kilimci et al. For this purpose, some base learners such
Text classification can be exemplified with the classification as multinominal naive Bayes (MNB), support vector machine
of e-mail messages as spam or not. Another example is that (SVM), multivariate Bernoulli naive Bayers (MVBN), convo-
it will automatically tag all incoming news on a subject for lutional neural network (CNN), and random forest (RF) were
example “art”, “football” or “movies”. Text classification is used. Here, to merge the determination of these base learners,
also one of the most popular study topics in the field of Nat- both majority voting and stacking methods were used. Also,
ural Language Processing (NLP), which aims to classify the Word2vec and TF-IDF were used for feature representation.
tagged texts into the related categories (classes). Nowadays, By applying base learners and heterogeneous ensemble sys-
Naive Bayes, Support Vector Machine [3], Neural Network tems with majority voting and stacking methods on 8 different
[4] and K-nearest neighbor [5] are frequently used for text datasets represented by TF-IDF or Word2vec, RF and CNN
classification. However, the impressive performance archived obtained the best results, and stacking outperforms majority
by Neural Networks especially CNN and LSTM in fields such voting [7].
as image classifications, content-based image retrieval, self- Similar to the previous study, in [8], the effect of en-
derived cars, and many others fields, has attracted researchers semble models while classifying Turkish texts using some
to use such approaches for text processing tasks such as classification algorithms such that naive bayes (NB), J48 –
translations, classification, etc. Decision Tree, K – Nearest Neighbor (K-NN), and support
B. Text Pre-Processing
Stemming: Another text preprocessing technique is stem-
In general, text preprocessing is one of the important steps ming. Stemming is basically a method that finds the root of the
in information retrieval and analysis systems. It basically words [10]. Some different techniques are used to perform this
prepares the text into more useful, workable, and proper form. process. For the Turkish language, the most common algorithm
Turkish language belongs to a branch of the Altai language is the SnowBall algorithm. In this algorithm, there are some
family, and it is an additive language, in which words are made rules that the coder has followed due to the Turkish language
and withdrawn by suffixes. Also, Turkish language has some morphology [12]. The rules are;
specific characteristics such as 1) There is no masculinity or
femininity feature like in Arabic and German languages. 2)The • Turkish language has only one affix type which is the
names which came after the numbers do not take the plural suffix.
suffix. 3) There are thickness-thinness and flatness-roundness • In Turkish, it is not possible to have a plural suffix after
harmony in Turkish. According to the first harmony, vowels a possessive suffix.
in a word are either thick or thin, and according to the second • In Turkish, a suffix can have more than one allomorph to
harmony, they are always flat or round. 4) The consonants f, have sound harmony.
j, and h do not exist in the ordinal Turkish words, while they • In Turkish, each vowel expresses a different syllable.
exist in words that were included from other languages. 5) • In Turkish, most of the monosyllabic words are the stem.
The number of consonants that can be found at the beginning • In Turkish, if a word possesses nominal verb suffixes, it
of the word is limited. These consonants are “b, c, d, g, k, comes at the end of the word.
s, t, v, y”. 6) In the case the consonant c is at the beginning • In Turkish, a suffix can be treated as a noun suffix and a
of the word, it will be changed to another consonant ç. 7) nominal verb suffix [12].
The n consonant letter contains only ”what” and its derivative
words: what, when, why, how, and where( In Turkish, they The different sound structures of a morpheme (although it
mean ne, ne zaman, neden, nasıl, and nerede respectively). comes to mind at the first moment as a word, we can say
8)The consonant p is found at the beginning of some words that it is a fragmented form of the word in a sense) is called
Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Suffix Allomorphs [12]. TABLE V: Noun Suffixes [12].
Letter Allomorph a/a Suffixes
U I,i,u,ü 1 -lAr
C c,ç 2 -(U)m
A a,e 3 -(U)mUz
D d,t 4 -(U)n
I ı,I 5 -(U)nUz
6 -(s)U
7 -lArl
8 -(y)U
allomorph. There are some different versions of allomorphs in 9 -nU
Table II. 10 -(n)Un
11 -(y)A
TABLE III: Derivational Suffixes [12]. 12 -nA
13 -DA
a/a Suffixes 14 -Nda
1 -lUk 15 -Dan
2 -CU 16 -nDAn
3 -Cuk 17 -(y)lA
4 -lAş 18 -ki
5 -lA
6 -lAn
7 -CA
8 -lU be solved. There are some text normalization methods such
9 -sUz as dictionary mappings, statistical machine translation, and
spelling – correction based approaches that can be used in such
Derivational suffixes create nouns like the suffixes -tion or case [10]. For the Turkish language, there is an open-source
-ness in English. Different types of suffixes are shown in Table library named Zemberek. This library is using a spelling –
III. correction based approach to check if a word is correctly
written and gives proposals for a word. In other words,
TABLE IV: Nominal Verb Suffixes [12]. Zemberek uses some heuristics look-up tables and language
a/a Suffix models for text normalization [13]. It is worth mentioning that
1 -(y)Um based on our ongoing experimental work, it has been noticed
2 -sUn that word correction in general, and using the Zemberek tool,
3 -(y)Uz
4 -sUnUz
in particular, can improve the overall performance of Turkish
5 -lAr text classification systems by approximately 5%.
6 -md
7 -n C. Feature Extraction
8 -k In the processing of texts, the words in the text show
9 -nUz
10 -Dur categorical and discrete features. It is important to encode
11 -cAsInA such data to use it in the preferred algorithms. The process of
12 -(y)DU subtracting a list of words from the text, and mapping them
13 -(y)sA
to the feature set which can be used by a classifier is called
14 -(y)mUş
15 -(y)ken text feature extraction. Different types of feature extraction
methods are mentioned below.
Also, there are some verb suffixes that are used to create 1) Traditional Methods: Count Vectorization, TF-IDF Vec-
time tenses. Some of the verb suffixes are shown in Table IV. torizer, and HashingVectorizer are the traditional methods of
the feature extraction for Text Classification [15]. In this study,
On the other hand, there are some noun suffixes. These
TF-IDF which is considered as the state of the art traditional
suffixes change words and meanings. Some of the noun
feature extraction method is used.
suffixes are shown in Table V.
TF-IDF Vectorizer: Term Frequency can be explained as
Normalization: One of the important processes of text
the number of appearances of a word in the related text
preprocessing is the normalization step. Normalization is a
document [15]. Equation 1 can be used as the calculation of
method that transforms a text into a standard form [9].
Term frequency.
Normalization is substantial for the text processes, especially
in informal writing where miswrites, abbreviations occur too
much. This process affects the analysis of text dramatically O(wi )
T F (wi ) = (1)
[9], where people are generally using the letters “c, g, i, o, u” N
instead of dotted ones “ç, ğ, ı, ö, ü”. This situation may cause O(wi ) represents the occurrence of the ith word, N is the
some problems when classifying such samples. Also, most total number of words that existed in the used vector.
people do not use vowels when they are texting or posting Inverse document frequency (IDF) measures how important
something. This is another problem that it is preferred to a term is. In general, it is proved that stop words appear in
Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.
most texts frequently but have little importance. Hence, in the connected layer works on generating output based on the rep-
case of IDF, the highest score is assigned to the rare words, and resentation(vector) from the previous layers. In other words,
the low score is assigned to the frequent words [15]. Inverse each layer produces some features based on the result of the
document frequency can be calculated using Equation 2. previous layer(s) and the overall structure(model) can learn the
feature hierarchy by combining and training all layers. The aim
N here, starting with the low-level details, is to achieve effective
IDF (wi ) = log (2) learning up to high-level details [16].
T
T represents number of documents that includes ith word.
To get the overall score, i.e., TF-IDF, as shown in Equation
3, IDF is multiplies by the TF [15].
Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.
0.91 accuracy. After all these attempts, it is clear that pre-
processing without stemming allows CNN to have the highest
accuracy.
Experiment 2: The Effects of Pre-Processing the Turkish F. Conclusion and Future Work
Text In this study, various experimental examinations were per-
In the second experiment, the effect of pre-processing on the formed to observe the effect of different text classification
accuracy of the classification process was measured. As shown steps and methods on the accuracy rates of Turkish text
in TABLE VII, in the first step, none of the pre-processing classification. The TTC-3600 dataset, which contains the news
methods were used. According to this process, CNN had 0.896 collected from six different agencies and news portals, and is
accuracy and LSTM had 0.8 accuracy. Secondly, full pre- also available online is used.
processing steps were used. After applying full pre-processing, Also, two popular classifiers, CNN and LSTM, are used.
CNN had 0.913 accuracy whereas LSTM had 0.9 accuracy. To find the best accuracy rates, different versions of pre-
As the last part, pre-processing without stemming was also processing and feature extraction methods are used with the
investigated and CNN had 0.922 accuracy while LSTM had mentioned classifiers. First of all, different feature extraction
Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.
methods, which are word level TF-IDF, N-gram level TF- [17] K. . N. Fukushima, “A self-organizing neural network model for a
IDF, characters level TF-IDF, word2vec, are used. After this mechanism of pattern recognition unaffected by shift in position.,” Biol.
Cybern., vol. 36, no. 4, pp. 193–202, 1980.
experiment, it was found that word2vec method has the best [18] D. H. Hubel and T. N. Wiesel, “Receptive fields and functional architec-
accuracy rate. After that, the effect of pre-processing methods ture of monkey striate cortex,” J. Physiol., vol. 195, no. 1, pp. 215–243,
is evaluated by calculating the accuracy of classifiers with pre- Mar. 1968.
[19] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
processing, without pre-processing, and pre-processing with- applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp.
out stemming, it is clear that pre-processing without stemming 2278–2324, 1998.
has the best accuracy rate. [20] Naili, Marwa and Habacha, Anja and Ben Ghezala, Henda. (2017).
Comparative study of word embedding methods in topic segmentation.
As the last step, five iterations were made with CNN and Procedia Computer Science. 112. 340-349. 10.1016/j.procs.2017.08.009.
LSTM with pre-processing without stemming and word2vec [21] Rhanoui, Maryem and Mikram, Mounia and Yousfi, Siham and Barzali,
method to find the best approach. As a result, it is seen that the Soukaina. (2019). A CNN-BiLSTM Model for Document-Level Senti-
ment Analysis. Machine Learning and Knowledge Extraction. 1. 832-
average accuracy of CNN is 0.9278, and the average accuracy 847. 10.3390/make1030048.
of LSTM is 0.9294. It is observed that the accuracies are [22] Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Com-
closed, but related to the execution time of the CNN requires put. 1997, 9, 1735–1780.
[23] Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, G.s and
almost 1/3 the time of LSTM. Dean, Jeffrey. (2013). Distributed Representations of Words and Phrases
For future work, our study can be extended by using some and their Compositionality. Advances in Neural Information Processing
state-of-the-art word embedding methods like ELMo and XL- Systems. 26.
[24] Doğru, Hasibe Büşra and Hameed, Alaa and Tilki, Sahra and Jamil,
Net. Also, for the system classifier, some other artificial intel- Akhtar. (2021). Comparative Analysis of Deep Learning and Traditional
ligence algorithms such that Support Vector Machine (SVM), Machine Learning Models for Turkish Text Classification.
Decision Tree, Bayesian Classifier can be integrated with deep
features(i.e., extracting the output of selected layer(s) of the
used deep learning model).
R EFERENCES
[1] Alqaraleh, Saed, and MERVE IŞIK.(2020). Efficient Turkish tweet
classification system for crisis response. Turkish Journal of Electrical
Engineering & Computer Sciences 28, no. 6 (2020): 3168-3182.
[2] Sahin G., “Turkish document classification based on Word2Vec and
SVM classifier”, 2018 3rd International Conference on Computer Sci-
ence and Engineering (UBMK), 2018, Antalya, Turkey.
[3] Sebastiani, F..” Machine Learning in Automated Text Categorization”,
ACM Computing Survey. pp. 1-47, 2002
[4] S. N.Sivanandam, S. N. Deepa “Principles of Soft Computing”
[5] Gonde Guo, Hui Wang, David Bell,Yaxin Bi and Kieran Greer “KNN
Model Based Approach in classification. pp .986-996 , 2003
[6] Celenli H. I. , Ozturk S. T. , Sahin G., Gerek A., GANİZ M. C.,
“Document Embedding Based Supervised Methods for Turkish Text
Classification”. 3rd International Conference on Computer Science and
Engineering (UBMK), 20 - 23 September 2018, p.477-482, Sarajevo,
Bosnia-Hercegovina.
[7] Kilimci, Z., Akyokus, S.,: Deep Learning- and Word Embedding-Based
Heterogeneous Classifier Ensembles for Text Classification. Complexity
2018, 1-10 (2018).
[8] Kılınç, D . (2016). The Effect of Ensemble Learning Models on Turkish
Text Classification. Celal Bayar University Journal of Science , 12 (2).
[9] Torunoğlu, D., Çakırman, E., Ganiz, M.C., Akyokuş, S., Gürbüz, M.Z.
(2011). Analysis of Preprocessing Methods on Classification of Turkish
Texts. INISTA 2011, June, 2011, Istanbul, Türkiye
[10] KDnuggets. (2019). All you need to know about text prepro-
cessing for NLP and Machine Learning - KDnuggets. [online]
Available at: https://www.kdnuggets.com/2019/04/text-preprocessing-
nlp-machine-learning.html [Accessed 8 Nov. 2019].
[11] Eteration. (2019). Türkçe Doğal Dil İşlemede Zemberek – eter-
ation. [online] Avaliable at: https://www.turkedebiyati.org/turkcenin-
ozellikleri.html
[12] O. Tunçelli, (2019). Turkish Stemmer for Python – GitHub. [online]
Avaliable at: https://github.com/otuncelli/turkish-stemmer-python
[13] TH. Tuna, (2019). Turkish Text Normalization – GitHub.
[online] Avaliable at: https://github.com/ahmetaa/zemberek-
nlp/tree/master/normalization
[14] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, In-
troduction to Information Retrieval, Cambridge University Press. 2008.
[15] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12,
pp. 2825-2830, 2011.
[16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
pp. 436–444, 2015.
Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.