0% found this document useful (0 votes)
35 views6 pages

CNN vs. LSTM For Turkish Text Classification

This document compares the performance of CNN and LSTM models for text classification in Turkish. It experiments with the TTC-3600 dataset and finds that both CNN and LSTM can efficiently classify Turkish text while achieving good performance. Related work in the document discusses previous research on text classification with Doc2Vec, Word2Vec, and TF-IDF features.

Uploaded by

Erdem Altun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views6 pages

CNN vs. LSTM For Turkish Text Classification

This document compares the performance of CNN and LSTM models for text classification in Turkish. It experiments with the TTC-3600 dataset and finds that both CNN and LSTM can efficiently classify Turkish text while achieving good performance. Related work in the document discusses previous research on text classification with Doc2Vec, Word2Vec, and TF-IDF features.

Uploaded by

Erdem Altun
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CNN vs.

LSTM for Turkish Text Classification


2021 International Conference on INnovations in Intelligent SysTems and Applications (INISTA) | 978-1-6654-3603-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/INISTA52262.2021.9548407

Melih Yayla Mustafa Diyar Demirkol Saed Alqaraleh


Computer Engineering Department Electrical-Electronics Engineering Department Computer Engineering Department
Hasan Kalyoncu University Hasan Kalyoncu University Hasan Kalyoncu University
Gaziantep, Turkey Gaziantep, Turkey Gaziantep, Turkey
[email protected] [email protected] [email protected]

Abstract—In this paper, the efficiency of two states of the art II. R ELATED W ORK
text classification techniques, i.e., Convolutional Neural Networks
(CNN) and Long Short-Term Memory (LSTM) for supporting the In this section, we have summarized recent researches,
Turkish text classification has been investigated. In addition, the developments, and solutions related to text classification. The
effect of the main preprocessing steps such as Tokenization, Stop
Word Elimination, Stemming, etc. has also been studied. Several
study of Çelenli et al., aims to develop a centroid based
experiments using “TTC-3600” dataset were performed, and it classifier. In this study, documents are represented by vectors
has been observed that both CNN and LSTM can efficiently using the Paragraph Vector model (Doc2Vec). The results of
support the Turkish language and can achieve quite good their experiments indicate that using Distributed Bag of Words
performance. Related to data preprocessing, results indicated (DBOW) architecture with five epochs of classifiers paired
that such a process improves the performance, however, for the
Turkish language, it is preferred to exclude stemming. Also, by
with document embedding vectors obtains the best accuracy.
comparing the performance of feature extraction techniques for Also, it has been observed that using more epochs decreases
processing Turkish language, Word2Vec outperforms TF-IDF. the classification accuracy of the Doc2Vec interestingly. On
Index Terms—Text Classification, Turkish Language, Convo- the other hand, using scarce data amount leads the Doc2Vec to
lutional Neural Networks, Long Short-Term Memory, Natural outperform the SVM classifiers that use tf-idf representations
Language Processing. [6].
Another study was done by Şahin, which compares the use
I. I NTRODUCTION of word2vec in the classification of seven different categories
of Turkish texts with the classical bag of words (BoW) text
In today’s world, the possession of knowledge or infor- representation. Here, each sample was expressed by a vector
mation holds an important place for people, companies, or that has the average of the sample’s words, then, SVM was
even states. However, the extraction of this information is used as the classifier. The experiments were conducted for
quite an essential and hard task. To overcome this problem different parameter settings of word2vec and its effect on
and to obtain the requested information, information retrieval classification success was examined. The study observed the
(IR) systems were developed. In this paper, we are going to accuracy of word2vec which is at the best-measured value was
investigate two text classification techniques, i.e., Convolu- 0.92F is better than tf-idf weighted BoW method which is at
tional Neural Networks (CNN) and Long Short-Term Memory the best-measured value of 0.89F [2].
(LSTM), for supporting the Turkish text classification. The classification performance of heterogeneous classifier
In general, text classification can be defined as the process ensembles for Turkish and English languages was investigated
of stating previously declared categories to text documents. by Kilimci et al. For this purpose, some base learners such
Text classification can be exemplified with the classification as multinominal naive Bayes (MNB), support vector machine
of e-mail messages as spam or not. Another example is that (SVM), multivariate Bernoulli naive Bayers (MVBN), convo-
it will automatically tag all incoming news on a subject for lutional neural network (CNN), and random forest (RF) were
example “art”, “football” or “movies”. Text classification is used. Here, to merge the determination of these base learners,
also one of the most popular study topics in the field of Nat- both majority voting and stacking methods were used. Also,
ural Language Processing (NLP), which aims to classify the Word2vec and TF-IDF were used for feature representation.
tagged texts into the related categories (classes). Nowadays, By applying base learners and heterogeneous ensemble sys-
Naive Bayes, Support Vector Machine [3], Neural Network tems with majority voting and stacking methods on 8 different
[4] and K-nearest neighbor [5] are frequently used for text datasets represented by TF-IDF or Word2vec, RF and CNN
classification. However, the impressive performance archived obtained the best results, and stacking outperforms majority
by Neural Networks especially CNN and LSTM in fields such voting [7].
as image classifications, content-based image retrieval, self- Similar to the previous study, in [8], the effect of en-
derived cars, and many others fields, has attracted researchers semble models while classifying Turkish texts using some
to use such approaches for text processing tasks such as classification algorithms such that naive bayes (NB), J48 –
translations, classification, etc. Decision Tree, K – Nearest Neighbor (K-NN), and support

978-1-6654-3603-8/21/$31.00 ©2021 IEEE


Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.
vector machine (SVM) as base classifiers was investigated. For was obtained by changing ”b”. 9)In Turkish alphabet, there
the ensemble learning models. In this study, TTC-360 dataset are no “x,w,q” letters whereas there are the letters “ç, ğ, ı,
which is consisted of 13 categories such as economy, sport, ö, ü” different from English alphabet. 10) Turkish words are
art, etc. was used. Results of [8], showed that base classifiers read as written [11].
with Boosting and Rotation Forest ensemble models were able Some of the text preprocessing steps that are investigated
to achieve the best accuracy rate. in order in this study are:
On the other hand, Torunoglu et al. [9] studied the effect of Tokenization: The first step of preprocessing is tokeniza-
different preprocessing steps on Turkish texts classification. tion, i.e., the input text is turned into word tokens [14].
For preprocessing, stemming, stop word filtering and word Stop Word Elimination: Stop words can be defined as
weighting steps were applied. For the classification, Naı̈ve the most used words in a language. However, most of the
Bayes, Naı̈ve Bayes Multinomial (mnNB), Support Vector stop words have no meaning by themselves. If these words
Machines (SVM), And K-Nearest Neighbor were used. Ac- are eliminated, it would be easier to use the most meaningful
cording to their results, stemming has the lowest impact on and semantic words. For the stop word elimination, there are
text classification. However, they stated that stemming is more several libraries like sklearn that can be used to eliminate such
appropriate for information retrieval tasks. words [10].
Lowercasing: One of the common text preprocessing tech-
III. M ECHANISM OF T EXT C LASSIFICATION
niques is lowercasing all characters in the text. This method
In general, the following components can be considered as helps us to increase the stability of inevitable outcomes. It is an
the main ones of Text Classification. appropriate technique for most of the NLP issues [10].In other
words, as shown in Table I, lowercasing basically creates a
A. Text Gathering
standard for the datasets. For example, it assists search engines
This step is working on collecting the samples and datasets to create search indexes in a standard way which improves the
that can be used for building the classification system and effectiveness [10].
also for investigating the performance of such a system.
In this study, as we aim to process the Turkish language, TABLE I: Lowercase Example.
the “TTC-3600” Turkish data set that was constructed using Raw Lowercased
3600 Turkish news and articles and humanly annotated to İsTanBuL
the following topics Ekonomi (Economy), Kültür-Sanat(Art İSTANBUL istanbul
and Culture), Sağlık (Health), Siyaset (Politics), Spor (Sport), İsTaNbUl
KiTAP
Teknoloji (Technology), where each one has 600 articles are KitAp kitap
used. KiTaP

B. Text Pre-Processing
Stemming: Another text preprocessing technique is stem-
In general, text preprocessing is one of the important steps ming. Stemming is basically a method that finds the root of the
in information retrieval and analysis systems. It basically words [10]. Some different techniques are used to perform this
prepares the text into more useful, workable, and proper form. process. For the Turkish language, the most common algorithm
Turkish language belongs to a branch of the Altai language is the SnowBall algorithm. In this algorithm, there are some
family, and it is an additive language, in which words are made rules that the coder has followed due to the Turkish language
and withdrawn by suffixes. Also, Turkish language has some morphology [12]. The rules are;
specific characteristics such as 1) There is no masculinity or
femininity feature like in Arabic and German languages. 2)The • Turkish language has only one affix type which is the
names which came after the numbers do not take the plural suffix.
suffix. 3) There are thickness-thinness and flatness-roundness • In Turkish, it is not possible to have a plural suffix after
harmony in Turkish. According to the first harmony, vowels a possessive suffix.
in a word are either thick or thin, and according to the second • In Turkish, a suffix can have more than one allomorph to
harmony, they are always flat or round. 4) The consonants f, have sound harmony.
j, and h do not exist in the ordinal Turkish words, while they • In Turkish, each vowel expresses a different syllable.
exist in words that were included from other languages. 5) • In Turkish, most of the monosyllabic words are the stem.
The number of consonants that can be found at the beginning • In Turkish, if a word possesses nominal verb suffixes, it
of the word is limited. These consonants are “b, c, d, g, k, comes at the end of the word.
s, t, v, y”. 6) In the case the consonant c is at the beginning • In Turkish, a suffix can be treated as a noun suffix and a
of the word, it will be changed to another consonant ç. 7) nominal verb suffix [12].
The n consonant letter contains only ”what” and its derivative
words: what, when, why, how, and where( In Turkish, they The different sound structures of a morpheme (although it
mean ne, ne zaman, neden, nasıl, and nerede respectively). comes to mind at the first moment as a word, we can say
8)The consonant p is found at the beginning of some words that it is a fragmented form of the word in a sense) is called

Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.
TABLE II: Suffix Allomorphs [12]. TABLE V: Noun Suffixes [12].
Letter Allomorph a/a Suffixes
U I,i,u,ü 1 -lAr
C c,ç 2 -(U)m
A a,e 3 -(U)mUz
D d,t 4 -(U)n
I ı,I 5 -(U)nUz
6 -(s)U
7 -lArl
8 -(y)U
allomorph. There are some different versions of allomorphs in 9 -nU
Table II. 10 -(n)Un
11 -(y)A
TABLE III: Derivational Suffixes [12]. 12 -nA
13 -DA
a/a Suffixes 14 -Nda
1 -lUk 15 -Dan
2 -CU 16 -nDAn
3 -Cuk 17 -(y)lA
4 -lAş 18 -ki
5 -lA
6 -lAn
7 -CA
8 -lU be solved. There are some text normalization methods such
9 -sUz as dictionary mappings, statistical machine translation, and
spelling – correction based approaches that can be used in such
Derivational suffixes create nouns like the suffixes -tion or case [10]. For the Turkish language, there is an open-source
-ness in English. Different types of suffixes are shown in Table library named Zemberek. This library is using a spelling –
III. correction based approach to check if a word is correctly
written and gives proposals for a word. In other words,
TABLE IV: Nominal Verb Suffixes [12]. Zemberek uses some heuristics look-up tables and language
a/a Suffix models for text normalization [13]. It is worth mentioning that
1 -(y)Um based on our ongoing experimental work, it has been noticed
2 -sUn that word correction in general, and using the Zemberek tool,
3 -(y)Uz
4 -sUnUz
in particular, can improve the overall performance of Turkish
5 -lAr text classification systems by approximately 5%.
6 -md
7 -n C. Feature Extraction
8 -k In the processing of texts, the words in the text show
9 -nUz
10 -Dur categorical and discrete features. It is important to encode
11 -cAsInA such data to use it in the preferred algorithms. The process of
12 -(y)DU subtracting a list of words from the text, and mapping them
13 -(y)sA
to the feature set which can be used by a classifier is called
14 -(y)mUş
15 -(y)ken text feature extraction. Different types of feature extraction
methods are mentioned below.
Also, there are some verb suffixes that are used to create 1) Traditional Methods: Count Vectorization, TF-IDF Vec-
time tenses. Some of the verb suffixes are shown in Table IV. torizer, and HashingVectorizer are the traditional methods of
the feature extraction for Text Classification [15]. In this study,
On the other hand, there are some noun suffixes. These
TF-IDF which is considered as the state of the art traditional
suffixes change words and meanings. Some of the noun
feature extraction method is used.
suffixes are shown in Table V.
TF-IDF Vectorizer: Term Frequency can be explained as
Normalization: One of the important processes of text
the number of appearances of a word in the related text
preprocessing is the normalization step. Normalization is a
document [15]. Equation 1 can be used as the calculation of
method that transforms a text into a standard form [9].
Term frequency.
Normalization is substantial for the text processes, especially
in informal writing where miswrites, abbreviations occur too
much. This process affects the analysis of text dramatically O(wi )
T F (wi ) = (1)
[9], where people are generally using the letters “c, g, i, o, u” N
instead of dotted ones “ç, ğ, ı, ö, ü”. This situation may cause O(wi ) represents the occurrence of the ith word, N is the
some problems when classifying such samples. Also, most total number of words that existed in the used vector.
people do not use vowels when they are texting or posting Inverse document frequency (IDF) measures how important
something. This is another problem that it is preferred to a term is. In general, it is proved that stop words appear in

Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.
most texts frequently but have little importance. Hence, in the connected layer works on generating output based on the rep-
case of IDF, the highest score is assigned to the rare words, and resentation(vector) from the previous layers. In other words,
the low score is assigned to the frequent words [15]. Inverse each layer produces some features based on the result of the
document frequency can be calculated using Equation 2. previous layer(s) and the overall structure(model) can learn the
feature hierarchy by combining and training all layers. The aim
N here, starting with the low-level details, is to achieve effective
IDF (wi ) = log (2) learning up to high-level details [16].
T
T represents number of documents that includes ith word.
To get the overall score, i.e., TF-IDF, as shown in Equation
3, IDF is multiplies by the TF [15].

T F − IDF (wi ) = T F (wi ) ∗ IDF (wi ) (3)


2) Word Embedding Methods: Word embedding is a natural
language modeling technique that matches words or expres-
sions to an equivalent numerical vector(s). This process helps
machine learning methods to understand the given inputs by
contributing the vector representation of the inputs. Also, this
method has some other advantages like reducing the dimension Fig. 1: Structure of the used CNN
of words and prevent similarity of contextual words [20].
Word2vec, GloVe, and Fasttext are examples of the Word 2) Long Short-Term memory: Recurrent Neural Network
embedding approaches. Word2vec which is considered the (RNN) is a concomitant and collaborate neuron networks,
most suitable option for the Turkish language as stated in [1] where neurons are connected with each other by weights.
is used in this study. These kinds of neural networks are very helpful in the event
Word2vec: It is an unsupervised and prediction-based of inputs of changing sizes, self-acting translation, self-acting
model that expresses words in vector spaces. It was invented pattern recognition, etc. The orientation of transmission of
in 2013 by Google researcher Tomas Mikolov and his team. knowledge in These kinds of neural networks are bidirectional,
Word2vec has two sub-methods: CBOW (Continuous Bag of which withholds the order of the data, and can connect with
Words) and Skip-Gram. Both methods are similar in general high sequence inputs as such network is grounded on a loop
[23], and its output is represented by Equation 4. by courtesy of interior memory. In 1997, Hochreiter and
Schmidhuber proposed a new method named Long Short Term
Memory (LTSM)that can be defined as an extension(improved
W ord2vec(Wi ) = [F1 , F2 , F3 , . . . . . . . . . .Fm ] (4) version) of RNN. LTSM can deal with the problem of fading
Where, in our case, m is set to 300, and F is a float number. gradient by the virtue of its memory that enables deleting,
writing, and reading the info through three gates; Input gate
D. Classification that permits or obstruct the updates; Forget gate that deac-
A document’s automatic classification according to prede- tivates an insignificant neuron depending on weights learned
fined categories is currently attracted researchers’ attention. from the algorithm; and output gate which is the control gate
Unsupervised, supervised and semi-supervised are the three of neurons [20] [21] [22].
main methods for text classification. In the last decade, the
automatic text classification task has some significant improve- E. Test & Result
ments using artificial intelligence algorithms such as Neural In this study, multiple experiments were performed us-
Networks, Bayesian classifiers, Decision Tree, support vector ing the previously mentioned dataset “TTC-3600”, In the
machines (SVMs), etc. In this study, the performance of CNN first experiment, a comparison between the studied feature
and LSTM was investigated, and their details are summarized extraction methods, i.e., Word level TF-IDF, N-gram level
below. TF-IDF, Characters level TF-IDF, and the word2vec Word
1) Convolution Neural Network: Convolutional Neural embedding, to find out the most suitable approach for the
Network which is a kind of Multilayered Perceptron is a feed- Turkish language was performed. In the second one, the effect
forward neural network, was inspired by the visual center of of text preprocessing on the performance of both CNN and
the animals [18] and its mathematical convolution process can LSTM was investigated. Finally, the comparison between the
be considered as the response of a neuron to stimuli from two states of the art CNN and LSTM classification approaches
the stimulus field [17], [19]. The architecture of CNN sets in is studied as well.
one or more convolutional layers, sub-sampling layers pursued Experiment 1: Comparing the Performance of Feature
by fully connected one(s) [16]. In convolutional layers, the Extraction Techniques for Processing Turkish Language
input is filtered and feature maps are obtained. In the sub- In this experiment, the accuracy of feature extraction tech-
sampling layers, feature maps are sampled. Finally, the fully niques was measured. For this experiment, four feature ex-

Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.
0.91 accuracy. After all these attempts, it is clear that pre-
processing without stemming allows CNN to have the highest
accuracy.

TABLE VII: The accuracy of classification with and without


the pre-processing operation.
Accuracy
Approach Without Pre-processing Full Pre-processing Pre-processing Without Stemming
CNN 0.896 0.913 0.922
LSTM 0.8 0.9 0.91

Experiment 3: CNN vs. LSTM


In this experiment, based on the result of the previous two
experiments, two systems for CNN and LSTM have been
implemented. In addition, these systems use Word2vec feature
extraction technique, which was found to be best among the
other studied methods. Also, pre-processing without stemming
was used which provided the best accuracy among the other
methods. To obtain more accurate results, this experiment was
repeated in five iterations. The results of all the iterations
and their average are shown in Table VIII. Overall, LSTM
had an accuracy of 0.9294 whereas CNN had an accuracy
Fig. 2: Structure of the used LSTM of 0.9278. However, even that the difference between the
obtained accuracies is not much, the execution time of CNN
is almost 1/3 the time of LSTM.
traction methods which are Word level TF-IDF method, N-
gram level TF-IDF method, Characters level TF-IDF method, TABLE VIII: The Accuracy Comparison of CNN and LSTM.
and Word2vec word embedding were used. As a first attempt, Accuracy
Iteration
Word level TF - IDF method was used. After applying this CNN LSTM
1st 0.93 0.926
method, CNN had 0.2 accuracy and LSTM had 0.178 accuracy. 2nd 0.935 0.939
After that, N-gram level TF-IDF method was used. By using 3rd 0.922 0.917
this approach, CNN and LSTM had the same accuracy which 4th 0.926 0.93
is 0.2. As a third attempt, the characters level TF- IDF method 5th 0.926 0.935
avg 0.9278 0.9294
was used. By applying this approach, CNN had 0.178 accuracy
whereas LSTM had 0.26 accuracy. As a last attempt, Word2vec
method was used. After applying this method, CNN had 0.861 Overall, based on our results and the results obtained by
accuracy and LSTM had 0.822 accuracy. Overall, as shown in Doğru H.B. et al. [24], where the two studies were conducted
TABLE VI, it is clear that CNN with Word2vec method has using the same dataset, i.e., ”TTC-3600”, using deep learning
the highest accuracy. such as CNN and LSTM improves the performance of Turkish
text classification. In more detail, the traditional methods such
TABLE VI: The accuracy of feature extraction with the pre- as Support Vector Machines(SVM), Naive Bayes, and Random
processing operations. Forest were able to archive an accuracy of 86.39%, 85.00%,
Accuracy
84.17% respectively. Hence, on average CNN increases the
Approach
Word level TF-IDF N-gram level TF-IDF Characters level TF-IDF Word2Vec accuracy by at least 6.39%, similarly, LSTM increases the
CNN 0.2 0.2 0.178 0.861
LSTM 0.178 0.2 0.26 0.822 accuracy by 6.55%.

Experiment 2: The Effects of Pre-Processing the Turkish F. Conclusion and Future Work
Text In this study, various experimental examinations were per-
In the second experiment, the effect of pre-processing on the formed to observe the effect of different text classification
accuracy of the classification process was measured. As shown steps and methods on the accuracy rates of Turkish text
in TABLE VII, in the first step, none of the pre-processing classification. The TTC-3600 dataset, which contains the news
methods were used. According to this process, CNN had 0.896 collected from six different agencies and news portals, and is
accuracy and LSTM had 0.8 accuracy. Secondly, full pre- also available online is used.
processing steps were used. After applying full pre-processing, Also, two popular classifiers, CNN and LSTM, are used.
CNN had 0.913 accuracy whereas LSTM had 0.9 accuracy. To find the best accuracy rates, different versions of pre-
As the last part, pre-processing without stemming was also processing and feature extraction methods are used with the
investigated and CNN had 0.922 accuracy while LSTM had mentioned classifiers. First of all, different feature extraction

Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.
methods, which are word level TF-IDF, N-gram level TF- [17] K. . N. Fukushima, “A self-organizing neural network model for a
IDF, characters level TF-IDF, word2vec, are used. After this mechanism of pattern recognition unaffected by shift in position.,” Biol.
Cybern., vol. 36, no. 4, pp. 193–202, 1980.
experiment, it was found that word2vec method has the best [18] D. H. Hubel and T. N. Wiesel, “Receptive fields and functional architec-
accuracy rate. After that, the effect of pre-processing methods ture of monkey striate cortex,” J. Physiol., vol. 195, no. 1, pp. 215–243,
is evaluated by calculating the accuracy of classifiers with pre- Mar. 1968.
[19] Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning
processing, without pre-processing, and pre-processing with- applied to document recognition,” Proc. IEEE, vol. 86, no. 11, pp.
out stemming, it is clear that pre-processing without stemming 2278–2324, 1998.
has the best accuracy rate. [20] Naili, Marwa and Habacha, Anja and Ben Ghezala, Henda. (2017).
Comparative study of word embedding methods in topic segmentation.
As the last step, five iterations were made with CNN and Procedia Computer Science. 112. 340-349. 10.1016/j.procs.2017.08.009.
LSTM with pre-processing without stemming and word2vec [21] Rhanoui, Maryem and Mikram, Mounia and Yousfi, Siham and Barzali,
method to find the best approach. As a result, it is seen that the Soukaina. (2019). A CNN-BiLSTM Model for Document-Level Senti-
ment Analysis. Machine Learning and Knowledge Extraction. 1. 832-
average accuracy of CNN is 0.9278, and the average accuracy 847. 10.3390/make1030048.
of LSTM is 0.9294. It is observed that the accuracies are [22] Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Com-
closed, but related to the execution time of the CNN requires put. 1997, 9, 1735–1780.
[23] Mikolov, Tomas and Sutskever, Ilya and Chen, Kai and Corrado, G.s and
almost 1/3 the time of LSTM. Dean, Jeffrey. (2013). Distributed Representations of Words and Phrases
For future work, our study can be extended by using some and their Compositionality. Advances in Neural Information Processing
state-of-the-art word embedding methods like ELMo and XL- Systems. 26.
[24] Doğru, Hasibe Büşra and Hameed, Alaa and Tilki, Sahra and Jamil,
Net. Also, for the system classifier, some other artificial intel- Akhtar. (2021). Comparative Analysis of Deep Learning and Traditional
ligence algorithms such that Support Vector Machine (SVM), Machine Learning Models for Turkish Text Classification.
Decision Tree, Bayesian Classifier can be integrated with deep
features(i.e., extracting the output of selected layer(s) of the
used deep learning model).

R EFERENCES
[1] Alqaraleh, Saed, and MERVE IŞIK.(2020). Efficient Turkish tweet
classification system for crisis response. Turkish Journal of Electrical
Engineering & Computer Sciences 28, no. 6 (2020): 3168-3182.
[2] Sahin G., “Turkish document classification based on Word2Vec and
SVM classifier”, 2018 3rd International Conference on Computer Sci-
ence and Engineering (UBMK), 2018, Antalya, Turkey.
[3] Sebastiani, F..” Machine Learning in Automated Text Categorization”,
ACM Computing Survey. pp. 1-47, 2002
[4] S. N.Sivanandam, S. N. Deepa “Principles of Soft Computing”
[5] Gonde Guo, Hui Wang, David Bell,Yaxin Bi and Kieran Greer “KNN
Model Based Approach in classification. pp .986-996 , 2003
[6] Celenli H. I. , Ozturk S. T. , Sahin G., Gerek A., GANİZ M. C.,
“Document Embedding Based Supervised Methods for Turkish Text
Classification”. 3rd International Conference on Computer Science and
Engineering (UBMK), 20 - 23 September 2018, p.477-482, Sarajevo,
Bosnia-Hercegovina.
[7] Kilimci, Z., Akyokus, S.,: Deep Learning- and Word Embedding-Based
Heterogeneous Classifier Ensembles for Text Classification. Complexity
2018, 1-10 (2018).
[8] Kılınç, D . (2016). The Effect of Ensemble Learning Models on Turkish
Text Classification. Celal Bayar University Journal of Science , 12 (2).
[9] Torunoğlu, D., Çakırman, E., Ganiz, M.C., Akyokuş, S., Gürbüz, M.Z.
(2011). Analysis of Preprocessing Methods on Classification of Turkish
Texts. INISTA 2011, June, 2011, Istanbul, Türkiye
[10] KDnuggets. (2019). All you need to know about text prepro-
cessing for NLP and Machine Learning - KDnuggets. [online]
Available at: https://www.kdnuggets.com/2019/04/text-preprocessing-
nlp-machine-learning.html [Accessed 8 Nov. 2019].
[11] Eteration. (2019). Türkçe Doğal Dil İşlemede Zemberek – eter-
ation. [online] Avaliable at: https://www.turkedebiyati.org/turkcenin-
ozellikleri.html
[12] O. Tunçelli, (2019). Turkish Stemmer for Python – GitHub. [online]
Avaliable at: https://github.com/otuncelli/turkish-stemmer-python
[13] TH. Tuna, (2019). Turkish Text Normalization – GitHub.
[online] Avaliable at: https://github.com/ahmetaa/zemberek-
nlp/tree/master/normalization
[14] Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, In-
troduction to Information Retrieval, Cambridge University Press. 2008.
[15] Scikit-learn: Machine Learning in Python, Pedregosa et al., JMLR 12,
pp. 2825-2830, 2011.
[16] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,
pp. 436–444, 2015.

Authorized licensed use limited to: Istanbul Sabahattin Zaim Univ. Downloaded on November 03,2023 at 13:59:19 UTC from IEEE Xplore. Restrictions apply.

You might also like