Review 3 - Journal Submission Format: Team Number Title (New)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 28

Review 3 – Journal Submission Format

Team number 09

Text Categorization Techniques:


Title(new)
Literature Review and Current
trends
Reg.No. 1 17BCE0777 Name 1 Abhisu Jain

Reg.No. 2 17BCE0769 Name 2 Aditya Goyal

Reg.No. 3 17BCE0111 Name 3 Vikrant Singh

Reg.No. 4 18BCE0148 Name 4 Anshul Tripathi

Contact person [email protected]


email

Contact person 8764172578


mobile

Journal name J.UCS Journal of Universal Computer Science

Journal URL http://www.jucs.org/

Is your No
submission your
special issue

If yes, special NIL


issue detail
Reference style ACM
required by the
journal

Indexed in Yes
SCOPUS?

Indexed in No
Emerging
sources
citations index
(Thomson
Reuters)

If yes, the 1.066


impact factor
value of the
journal

Number of 12
issues per year
published by the
journal
Text Categorization Techniques: Literature Review and
Current Trends

Saravanakumar Kandasamy
(Vellore Institute of Technology, Vellore, Tamil Nadu
[email protected])

Abhisu Jain
(Vellore Institute of Technology, Vellore, Tamil Nadu
[email protected])

Aditya Goyal
(Vellore Institute of Technology, Vellore, Tamil Nadu
aditya.goyal2017@ vitstudent.ac.in)

Vikrant Singh
(Vellore Institute of Technology, Vellore, Tamil Nadu
vikrant.singh2017@ vitstudent.ac.in)

Anshul Tripathi
(Vellore Institute of Technology, Vellore, Tamil Nadu
anshul.tripathi2018@ vitstudent.ac.in)

Abstract: Text Categorization is a task for categorizing text mining and it has been important
for effective analysis of textual data frameworks. The archives can be ordered in three different
ways unsupervised, supervised and semi supervised techniques. Text categorization alludes to
the procedure of dole out a classification or a few classes among predefined ones to each
archive, naturally. For the given text data, these words that can be expressed in the  correct
meaning of a word in different documents are usually considered as good features. We are
going to review different papers on the basis of different text categorization sections and a
comparative and conclusive analysis is presented in this paper. This paper will present
classification on various kinds of ways to deal and compare with text categorization.

Keywords: Random Forrest, Singular Value Decomposition, Convolutional Neural Network,


Recurrent Neural Network, Logistic Regression Classifier, BRCAN, Attention Mechanism,
Few Short Learning, Tsetlin Machine, Feature Evaluation Function, Mutual Information based
Feature Selection, Joint Mutual information method, Interaction weight-based Feature Selection
Method, Word embedding based Models, Term Frequency-Inverse document Frequency.
Categories: I.2.6, I.2.7, I.5.1, I.5.3, I.5.4, I.7.0, I.7.3
1 Introduction
As we step into the advanced world, technology is advancing at an exponential level.
Therefore, with the advancement in the Internet and multimedia technology, huge
amounts of data come along and because of the rapid and steady growth, it consists of
junk data too which is unrelated, irrelevant and usage of memory. With the
advancement of the new era of social media, text content has seen a rise over the
years. The semantic information includes commentaries, news articles, and important
information, which may have varying commercial and societal value [S. Wang, Cai,
Lin and Guo 2019]. There are so many different types of security patterns seen, and it
is not easy to select proper patterns. And also choosing of the following patterns
needs knowledge in the security area.  Basically, the common software people are not
very expert in the security field. [Ali et al. 2018].So we choose this problem in this
paper so as to resolve the problem of developers in finding a proper and secure design
pattern on the basis of their particular SRS.As the developers are not specialized in
deciding this so this solution is a major solution for a bigger problem .So this paper
has the description of why this secure design pattern is to be used. As security as
always been the biggest concern in every field so it is providing solution for
developing a software and maintain security [Ali et al. 2018].
The paper has basically described the problem with the example of how
twitter works. how people put their different opinions over there and they can be
divided in different categories with the help of these improved techniques. The above-
mentioned techniques basically help us in describing them in a better way. The
mentioned algorithms are giving us the better F1-score on twitter dataset whereas the
modified technique of modOR helps us in evaluating the performance of efficient best
scores. [Samant, Bhanu Murthy and Malapati 2019]. People tweet about real world
problems. Some viewpoints are negative and some are positive so this gives us an
opinion that what we are thinking and now it has become influencing us. Tweets are
now used by researchers for different predictions on the basis of datasets they derive
from twitter. The researchers proposed a twitter news service in their research the data
from twitter.[Samant et al. 2019]. People tweet about real world problems some
viewpoints are negative and some are positive so this gives us an opinion that what
we are thinking and now it has become influencing us. Tweets are now used by
researchers for different predictions on the basis of datasets they derive from twitter.
This problem is chosen so as to divide the text on the basis of different categories
whether it is yahoo questions or different kinds of opinions on twitter. As mentioned
tf decreases the efficiency of question categorization. So, modifying it to our
techniques will give us a better way to understand and find this. This paper is dealing
with the problem of searching and categorizing on the basis of different items posted
on social media.[Samant et al. 2019]. This problem is chosen as on the web a lot of
text data is there which cannot be classified manually. It needs the help of a computer
and requires a method to do it automatically and the text can be classified according
to the predefined categories based on the content.
The two main problems addressed are that when the document is presented
in the form of a bag of words, it comprises high dimensionality and noisy features
[Guru, Suhil, Raju and Kumar 2018]. Feature representation is the key problem in text
classification. We manually define the traditional text feature based on the bag-of-
words (BoW). However, the text classification accuracy is challenged by sparseness
of traditional text feature, ignorance of the word order, and the failed attempt to
capture the semantic information of the word.[Jin et al. 2019] Problem concerns the
selection of the various features that we get from the given text as a lot of features can
be extracted from a given text so to select the most relevant subset of the feature is
selected  so as the algorithm for selection of features progressed, dimensionality
reduction and weighing of terms have become very important so as to remove the
unwanted terms. So, these things made the feature selection process very important
[Guru et al. 2018]. One of the most common models for text categorization is Bag of
words (BOW). This model faces limitations as the number of features involved in this
model is large causing influence on text categorization performance. As a large
number of features are involved the necessity of feature selection arises. The
maximization of time needed for computation and enhanced performance of
categorization tasks helps in reducing the dimensionality of the problem. There exists
2N combinations of feature subsets in a search space of N features. As observed in a
study on Combinatorial Testing, the two-way and three-way feature interactions
contribute to nearly fifty percent of software faults. Hence it is important to introduce
higher-order interactions to improve the performance of feature selection. Therefore,
our interest lies more towards the Mutual Information (MI)-based feature selection. It
maximizes the multidimensional joint mutual information by choosing the feature
subset between the selected features.[X. Tang, Dai and Xiang 2019]
The RCNNA approach is a model made for text classification which is
having a same structure of conditional reflexes to build our network in which we
replace the receptors, effectors with a BLSTM and CNN [Jin et al. 2019].In earlier
stages text classification consisted of two stages namely feature extraction and
classification stage like bag of words SVM and Naïve Bayes probability predictor but
these methods ignored the sequence of the texts in it because of which classification
accuracy was affected. So, after that a better model was used that used deep learning
concepts such as CNN and RNN in which Recurrent Neural Network was a good
model of sequential data and was good for building effective text representation
whereas CNN was better at learning local features from words. CNN is trained faster
than RNN so some of the methods combine them both. But these methods give equal
importance to all the words because of which we now start using attention models
[Zheng and Zheng 2019].
Security is one of the most important factors in deciding the operating of the
system for improvement in the programming life cycle. The present-day use of
programming advancement cycle and security prerequisites, now remembered for
each period of the product improvement cycle. The advancement in the technologies
has expanded concerns for security. So, we keep a check on the security and the
engineers should have knowledge in this field to improve the security of a product.
[Ali et al. 2018] The deep neural networks are used widely these days but there are
some attacks reported on them which can raise questions on the reliability of these
networks and this attack can cause misclassification of the images [Dai, Chen and Li
2019].
Applications of medical science have been demanding high accuracy and
ease-of-interpretation. These requirements act as challenges in text categorization
techniques. Deep learning techniques have been providing some growth in terms of 
its accuracy but the problem of interpretability still persists[Berge et al. 2019].This
problem is chosen so as to help the individuals on a big scale by text classification on
social media as social media is an influencing platform so for more efficient way to
give back to people is what has been trying to be done in this paper. People will be
able to take care of them and can get influence with the help of what we can is small
texts which will give info about the medical terms and this solution will give the best
out of it by first training the dataset and then simulating it[K. Liu and Chen
2019].Medical applications require both high accuracy and ease of interpretation.
These requirements come with challenges in text categorization in techniques. Deep
learning techniques have provided accuracy in this regard but they fail to solve the
problem of interpretability. Also, realistic vocabularies are rich which leads to high-
dimensional input spaces. The accuracy in text categorization is ahead due to the
result of convolution neural network (CNN), recurrent neural network (RNN), and
Long Short-Term Memory (LSTM) which is possible because of interpretability and
complexity raised in computations.[Berge et al. 2019]
In the review paper, to address the problem of text categorization, we
provide a detailed overview of the methods which comprises unsupervised deep
feature selection To start off with, we name autoencoder networks, which is a type of
unsupervised deep feature selection, we discuss its intrinsic mechanisms and 
invariants which are highly distinguished. The other types of unsupervised deep
feature selection are the deconvolutional networks, deep belief nets, and RBMs.[S.
Wang et al. 2019]. In Feature Selection such as GA, Particle Swarm Optimization
(PSO) and ACO we characterize important search algorithms based on population. To
improve their solutions iteratively, they start with by initializing a count of the agent
population. Using an efficient formula, the positions in the search space are updated
by the agents. Until we achieve a threshold value in result or until the time a
maximum limit is achieved on the number of iterations, we apply the same steps and
formulas [Belazzoug, Touahria, Nouioua and Brahimi 2019a]. The method of finding
the medical terminologies have always been underutilized as we always use a single
channel for finding it so to solve that problem a double channel will be used so as to
increase efficiency. As because of the information available after a single channel is
used, approximately one third of users can change their viewpoints after what they see
post classification.[K. Liu and Chen 2019].
2 Contribution
2.1 Bidirectional Recurrent Neural Network BRCAN

2.1.1 Model

A new approach known as the Hybrid Bidirectional Recurrent Convolutional Neural


Network Attention-Based Model also called BRCAN constitutes the features of all
the widely used model such as CNN RNN and the Attention based model to make
the best in class text classification model which solves the various problems such as
sentiment analysis and multiclass categorization which is very helpful in our day to
day life such as the user can get relevant news data when he/she searches about it.
The proposed model is in this we use a classified document which consist of
sentence from which we derive some words based on a threshold ,by using these
words as an input to the network word2vec model we learn these vectors after that
these vector is entered into the Bi-LSTM to determine the dependence among the
sentences then the sentence representations are given to the CNN to identify the
local features of the words after this training is complete the words goes to the final
layer called the attention layers that are used to give more weights to the key
components to the texts that are useful in the categorization and finally logistic
regression classifier gives the text classification based on the format we want. In this
various old algorithm are used such as CNN, RNN, Logistic Regression Classifier
that help in building this model. This model was tested using six widely used
databases such as Yelp, Sogou, Yahoo answers etc because These datasets are easy
to use as the datasets are already divided into various text classification tasks which
are required by the model to find the performance difference between the proposed
model and the traditional ones. Various metrics such as max pooling size and
convolutional network feature size were used as a metric to show how this model
outperformed the others. easily grab knowledge from long texts and its semantics to
decrease the problem of information imbalance. This method is not only good for
text classification but it produced good results for sentiment analysis as well. Less
parameters are used for interaction between hidden layers. It also picks high level
features for categorization and finally the accuracy of various models was compared
to show how our new model wins.

2.1.2 Problems Addressed

In this model the problem of tagging a sentence based on its context is solved using
the CNN with trained word vectors and task specific vectors are used in this. Initially
we keep the word vectors static and learn the other parameters. After that we learn the
word vectors [Kim 2011].
Problem-High-Performance Word-based text classification. In this problem
the classification is done on word level and it solved using the CNN with less
computational complexity so that means using less layers and as the computational
cost is less it produces more accurate results so in this a model is created with less
computational cost and more layers called deep pyramid CNN[Johnson and Zhang
2017].
Problem-Character based text categorization. In this problem text
classification is done on character level using Very deep convolutional networks
which operates directly at the character level using small convolutions and pooling
using up to 29 convolutional layers and about size 3 max pooling [Conneau,
Schwenk, Barrault and Lecun 2016].
Problem-Neural networks need a lot of training data and if there is a shift in
data it handles very poorly because of which text categorization can be very difficult
so a generative model is made which adapts to the shifting data distributions. Using
RNN whereas earlier a bag of words was used which was just finding conditional
relation among the words [Yogatama, Dyer, Ling and Blunsom 2017].
Problem- Training of the RNN takes a lot of time because of which we use
Hierarchical Convolutional Attention Networks to increase training speed without
compromising its accuracy in this we combine the CNN and self-attention-based text
categorization (in this we pay attention to the target and gives this more weight)[Gao,
Ramanathan and Tourassi 2018].

2.2 Grouped Local Feature Evaluation Function

2.2.1 Model

The problem of selecting relevant features from such a large set of text is a very
difficult task. In the starting the text classification was done by the bag of words in
which dependencies between the words were found out in which there were a lot of
noisy features were there and the dimensionality was also large enough so as the
algorithm for selection of features progressed dimensionality reduction and weighing
of terms have become very important so as to remove the unwanted terms. So, these
things made the feature selection process very important. Its aim is to select the
smallest set of features that differs the most in property and can classify easily.
Multivariate feature detection algorithms are not very scalable and can be
computationally very expensive so we use the univariate feature selection method. In
univariate selection methods generally, the features are scored based on different FEF
and as each feature is scored some of the times the redundant features surfaced up so
it does not really help in classification of text. The model is proposed by the author in
which the ranking of the features is done according to the various classes known.
Therefore, we don't rank the features individually, we do so by taking the top K
ranked features for each class and K represents the number of classes the features are
ranked in groups where a group with higher rank consists of K top ranked features
one for each class where K is the number of classes present. Whereas in the
traditional models just the top features with maximum ranks are used and select the
top P groups for the processing part. And a relevancy matrix is generated and every
column gives us the scores related to the classes. The datasets used are the
benchmarking text datasets on which the conventional model is tested so to compare
the performance of the conventional models and the proposed one. And in these
datasets, there are a lot of features because of which the testing of and selecting of
various features can be done. On the basis of different metrics such as the precision,
recall and the F-measures we were able to see the performance of the proposed model
with the conventional model. The result of proposed method is compared using the
different FEF and we see that DFS selects least features per group and the MI selects
the most number of features per group and it was seen that the proposed framework
beat the conventional framework in almost every case and the best FEF of all is the
chi square whereas the performance with MI is least and as the feature groups are
increasing the performance increases. And after 7 groups it reaches a saturation point
and the performance just stabilizes.

2.2.2 Problems Addressed

Problem is feature selection to increase the efficiency of computation and


classification. This method proposes a new idea in which a latent semantic indexing
method is used to overcome the problem of poor categorization accuracy by using a
2-stage feature firstly we reduce the dimension of the features then we make a new
relatable space among the terms using latent semantic indexing [Meng, Lin and Yu
2011].
Problem is dimensionality reduction of the vector space without reducing the
performance of the classifier so in this approach a new method called CMFS is
proposed which measures the importance of the term both in the inter and intra
category [Yang, Liu, Zhu, Liu and Zhang 2012].
Problem is the feature selection in which we generally ignore the semantic
relation between the documents and the features. So, in this a new approach is used in
which first we select the features in a document with discriminative power then we
calculate the semantic behaviour between them by using SVM [Zong, Wu, Chu and
Sculli 2015].
Feature selection for large scale text classification the traditional approaches
are not reliable for low frequency terms so this method is proposed to solve this
drawback and it uses feature selection based on the frequency between categories and
the whole document by using the t-test [D. Wang, Zhang, Liu, Lv and Wang 2014].
Feature selection with reduced time and complexity so in these two methods
are proposed namely Maximum f Features per Document (MFD), and Maximum
Features per Document – Reduced (MFDR) these are used to find the number selected
features of using a global ranking feature evaluation function [Pinheiro, Cavalcanti
and Ren 2015].

2.3 Black-Box Backdoor Attack

2.3.1 Model

The deep neural networks are widely used these days but there are some attacks
reported on them which can raise question on the reliability of these networks and this
attack can cause to misclassify the images or the text so to keep the people aware of
such attacks and there were other researches of these attacks on CNN but this one
focuses on RNN and as RNN plays a crucial role in text categorization and many
other applications so it was important to tell about these attack. So, a backdoor attack
model is made to attack the RNN we select a sentence as the backdoor trigger (It is a
state in which the model is trained on a poisoned dataset which is sent to a backdoor,
the attacker's goal is such that the model handles the inputs having certain features
known as trigger sentence incorrectly) and create poisoning samples by randomly
inserting these triggers. In this model the system is manipulated in such a way that it
misclassifies only that result that contains the trigger sentence while other inputs are
classified properly. The opponent determines the trigger sentence and target class then
it sees the poisoning samples which are not as same as target class (malicious mails)
then these samples are added to the training set and then the user can use backdoor
instances that can trigger the sentences to attack the system. Various metrics are used
to see the success of the attack such as in the proposed method the trigger length and
the poisoning rate is changed on two types of sets i.e the positive review set and the
negative review set and then we see the test accuracy and the attack success rate and it
was seen that as the poisoning rate is directly proportional to the success rate of attack
and highest success rate of 96% was achieved and also as we increase the trigger
length the attack success rate increases. So, finally we can see that this work can
spread awareness about the attack

2.3.2 Problems Addressed

In this an attack is investigated against the Support vector machines these attacks
injects modified data that can increase the SVM test set error and this is due to the
fact that we think that our data is coming from a trusted source and a well-behaved
distribution [Biggio, Nelson and Laskov 2012].
Deep learning algorithms produce very good results in the presence of a
large dataset and perform better than most of the algorithms but some imperfections
in the training phase can make them vulnerable to some adversarial samples. These
make the learning algorithms to misjudge so this new algorithm is made to reduce this
vulnerability. And some defences are also described by measuring distance between
input and the target classification [Papernot et al. 2015].
ML is used in various spheres of life such as driverless cars and aviation
where these adversaries can cause some serious harm to life and property so a method
is developed in which we use gradient descent to find out the adversaries and a metric
is also defined to measure quality of adversarial samples[Jang, Wu and Jha 2017].
In deep classifiers small changes in images data can cause harm and can lead
to misclassification of data so in this method a Deep Fool algorithm is generated to
find the perturbations that fool our deep network [Moosavi-Dezfooli, Fawzi, Frossard
´, Polytechnique and de Lausanne n.d.].
When malicious inputs are given to the sample it can yield wrong model
output which cannot be seen by human observers. So, in this an attack is planned
which does not need any knowledge of the model; in this input are synthetically
generated and are tagged with target class [Papernot et al. 2017].
2.4 Feature Selection

2.4.1 Model

While studying data analytics, we come across huge amounts of data which requires
more computational time and memory constraints apply too. Feature selection is the
solution proposed in this paper which does not change the physicality of the original
features. Therefore, as compared to feature extraction, the feature selection possesses
better readability and interpretability. We focus on a special type of feature selection
which is mutual information (MI) based feature selection which has higher-
dimensionality of joint mutual information and uses the ‘maximum of the minimum’
method to enhance the feature selection problem. We introduce the FJMI (Five way
joint mutual Interaction) feature selection algorithm and discuss its performance
metrics.

2.4.2 Problems Addressed

MI based feature selection makes use of MI terms which have low dimensionality and
includes relevancy and conditional redundancy to extracts information between
selected features and class labels and thus we cannot directly calculate the features
and class labels [Hu, Gao, Zhao, Zhang and Wang 2018].
Interaction Weight based Feature Selection (IWFS) method is introduced
which considers three-way interactions. To assess interaction between features and
measure redundancy, the method uses interaction weight factor mechanism.[Zeng,
Zhang, Zhang and Yin 2015]

2.5 Unsupervised Deep Feature learning

2.5.1 Model

As we step into the advanced world, technology is having advancements at an


exponential level. Therefore, with the advancement in the Internet and multimedia
technology, huge amounts of data come along and because of the rapid and steady
growth, it consists of junk data too which is unrelated and uses a lot of memory which
ultimately hampers the performance of specific learning tasks. We aim to provide a
comprehensive overview of various methods under unsupervised deep learning and
compare their performances in text categorization. We study the autoencoder and its
variants along with deconvolutional networks, restricted Boltzmann machines, and
deep belief nets.

2.5.2 Problems addressed


Encoder and Decoder are two steps of an autoencoder neural network. The mini-batch
gradient descent method is used to solve the optimization problem [Xiao, Li, Wang,
Xu and Liu 2018].
The neurons present in an autoencoder neural network are mostly not
activated and hence Sparse autoencoder neural network works on the assumption that
only a small number of neurons are activate.[Le et al. 2011]
For a neural network, the indispensable point is the robustness of the hidden
variables and therefore, an approach was formulated to tackle the robustness of the
low-dimensional feature representation.[Rifai, Vincent, Muller, Glorot and Bengio
2011]
Deep neural networks frequently fail when encountered with partially
destroyed data and thus to reconstruct clean data from noisy data, we perform
denoising of autoencoder neural network [Vincent and Larochelle 2008]
Some beneficial details of residual neural networks are missed out due to
multiple down sampling operations and hence this problem can be overcome by
Residual Autoencoder neural network.[He, Zhang, Ren and Sun n.d.]

2.6 Conditional Reflection Approach

2.6.1 Model

Everyday technology is having advancements at an exponential level, and thus with


the advancement in the Internet and the technology concerned with multimedia and
with the overuse of social media sites, the amount of data retrieved has increased over
the years consisting of news articles, commentaries etc. The model we propose is an
integrated model which is based on text classification tasks and is known as RCNNA.
To analyse the result of text classification, the model works with local and global
information of text. To build our own network of conditioned reflexes, we try
imitating human physiological structure. In order to do that, we propose introducing
the BLSTM, attention mechanism, and CNN layers which replaces the receptors,
nerve centres, and effectors respectively. The BLSTM obtains the text information
and the attention mechanism weighs the word. Finally, the important features in order
to obtain text classification results are extracted by CNN. Other model proposed is a
biased model named as RNN, in which the former words are less dominant than the
latter ones and since the keyword may appear at any position, the semantics of the
text cannot be captured.

2.6.2 Problems Addressed

K-Nearest Neighbour (kNN), Decision Trees, Naive Bayes (NB), etc are some
traditional text classification methods and all have sparsity problems. The elimination
of sparsity problems by the distribution of words is indeed triggered with the
advancement in deep learning [Bengio et al. 2003].
LSTM networks, a tree structured network is proposed due to the problems
faced in Recurrent Neural Networks (RNN).[Tai, Socher and Manning n.d.]
To improve the generalization of the network, we apply Recurrent Neural
Networks to classification.[P. Liu, Qiu and Huang 2016]
For sentiment analysis, we use CNN to obtain sentence vectors, and
following that for classification, the bidirectional LSTM discovers the document
vectors [D. Tang, Qin and Liu 2015].

2.7 TF-IDF and RF for Text Classification

2.7.1 Model

The paper has basically described the problem with the example of how twitter works,
how people put their different opinions over there and they can be divided in different
categories with the help of these improved techniques. The above-mentioned
techniques basically help us in describing them in a better way. The mentioned
algorithms are giving us the better F1-score on twitter dataset whereas the modified
modOR I have been found as a consistent performer for giving the best results. Idf is
not enough to reflect the important terms on the basis of different categories. The
modification is important as tf decreases the performance of question
categorization.so these algorithms or techniques are modified so that to make it more
effective in text classification. This problem is chosen so as to divide the text on the
basis of different categories whether it is yahoo questions or different kinds of
opinions on twitter. As mentioned tf decreases the efficiency of question
categorization. So, modifying it to our techniques will give us a better way to
understand and find this. This paper is dealing with the problem of searching and
categorizing on the basis of different items posted on social media. Now we have
different social media platforms which helps us in connecting or answering, Because
of social media people have got their views. Maybe we can say they can put their
views in a small blog on social media. Short texts or messages have now become an
important form of communication. Reviews about online products or criticism on
social media has now been a big part of social media. People tweet about real world
problems some viewpoints are negative and some are positive so this gives us an
opinion that what we are thinking and now it has become influencing us. Tweets now
are used by researchers for different predictions on the basis of datasets they derive
from twitter. A news service based on twitter has been proposed in the research by
researchers only using the data from twitter. The proposed solution is the modification
of these techniques with the help of different approaches. They are using K-NN
classifiers, Single value decomposition and calculating different statistical data with
the mentioned formulas which will be used. Firstly, the supervised alternative of TF-
IDF is mentioned in it. There are three short text datasets. They have a common
attribute in them and they have written short text by users. These texts have different
event discussions, reviews of products and questions. First dataset is the Twitter event
dataset. It has groups of more than 5000 groups. Second dataset we have is an
opinosis dataset. It has short reviews of products. The 3rd dataset we have is yahoo’s
question answer dataset. The TF component gives us the weight for words in a
document by taking in account the occurrence of the word locally in the document
and the IDF helps in balancing it by taking the number of occurrences of the word
globally. IFN-TP-ICF is the second proposed technique in this paper so as to get it
categorized. In this it matters on what is the value of TP as the ratio of TP to TN is
considered. As tp for higher categories is high.

2.7.2 Problem Addressed

As we are going to modified the existing algorithm and in our base paper TF-IDF is
one of the major equations being used so we can see that for term weighting when
modification id=s being done this paper is helpful for taking care of modification.
[Chen, Zhang, Long and Zhang 2016]
The graphs given in our base paper are taken with the help of this paper and
how they have been analysed is seen on the basis of this as approach to abstractive
summarization of highly redundant opinions. this helped in data from the opinosis for
the product reviews.[Ganesan, Zhai and Han n.d.]
This cited paper is taken into consideration so as to get the data and event
detection on twitter as we have a dataset related to twitter in our base paper so it is
important to use a large-scale corpus for dataset.[McMinn, Moshfeghi and Jose 2013]
As we are going to analyse everything on the basis of vector space model so
this resolves the problem of how we can analyse in a vector space model the result we
will get from the TF-IDF weighting.[Soucy and Mineau n.d.]

2.8 Text categorization on the basis of the dataset created.

2.8.1 Model

There are so many secured patterns, and they are not easy to choose appropriately for
a pattern. whereas, selection of the given patterns needs knowledge of security. So, to
select a proper and secure pattern of Design on the basis of its SRS. As mentioned
above There are so many different types of secure patterns. and also, determination of
these examples needs security information. Basically, software developers are
generally not trained to take care of these problems in the domain of security
knowledge. This paper can give a suggestion and insight in the generalization of
secure patterns on the basis of transaction of the secure pattern using text
categorization. An archive of secure structure designs is utilized as an informational
index and store of necessities ancient rarities as programming prerequisites
determination are utilized. A book order conspiracy, which starts with pre-processing,
ordering of secure examples, winds up by questioning SRS highlights for recovering
secure structure design utilizing the archive recovery model. For the assessment of the
proposed model, we have utilized three distinct areas' SRS. This problem is chosen in
this paper so as to resolve the problem of developers in finding a proper and secure
design pattern on the basis of their particular SRS.as the developers are not
specialized in deciding this so this solution is a major solution for a bigger problem
.so this paper has the description of why this secure design pattern is to be used. As
security as always been the biggest concern in every field so it is providing solution
for developing a software and maintain security and confidentiality. Security is
regarded in the form of non-functional necessity programming improvement life
cycle. In the present day of programming advancement life cycle, security
prerequisites are remembered for each period of the product improvement life cycle.
As the progression of advances has additionally expanded security issues. So as to
adapt to security concerns, engineers need to learn security prerequisites of a product
and must have security area information to endorse a safe improvement arrangement.
Security concerns and dangers are commonly arranged in 5 unique parameters:
Identification and Authentication of clients, Access Control components and
Authorization Rules, Cryptography Intrusion Detection and Logging. It characterized
security prerequisites properties into four classes for example secrecy, Integrity,
accessibility, responsibility. This classification of security issues, various measures to
be taken so as to meet security necessities. Secure advancement security concerns
must be portrayed in a solid manner in every improvement stage. Proposed solution is
effective so as to help software developers in a sector they are not specialized in. If
the developers don’t have the specialization in the security sector then when our
algorithm describes the things. They need to understand if the machine has chosen
this pattern then what is the reason for it.

2.8.2 Problem Addressed

This is helping in finding the different security engineering problems so as to define


our dataset on different parameters to find a desired dataset.[Abrams 1999]
This has helped in security pattern evaluation as which pattern will be used on the
which results of our calculations.[Duncan and de Muijnck-Hughes n.d.]
Secure designs give an answer for the security prerequisite of the product.
There are a huge number of secure examples, and it is very hard to pick a suitable
pattern. Moreover, determination of these examples needs security information.
[Weiss and Mouratidis 2008].
This large number of patterns has created a problem in selecting patterns that
are appropriate for different security requirements. In this paper, we present a
selection approach for security patterns.[X. Liu and Chen 2015]

2.9 Structures with Double Channel are Used for Training

2.9.1 Model

The methods of medical terminologies are underutilized for features including


consumer health terminology in texts in social media. this paper is proposing a
medical social media text classification (MSMTC) algorithm that integrates consumer
health terminology. The solution of this problem is given on the basis of categorizing
the terms and then making a dataset to find them. The main way is to take in
consideration an adversarial network to extract consumer health terminology with
medical terminology for creating a dictionary. Words which are there in the dictionary
will not be shown in the original sentence for making a noise data channel that does
not have medical information. Finally, the outputs from the complete data channel
and noise data channel are input to the classification task after subtraction.
2.9.2 Problem Addressed

This paper is proposing an approach based on a dictionary that uses content’s info to
generate possible varieties and pairs for normalization. They are mentioned on the
basis of string similarity and it is considered to increase the dictionary’s content.
[Sarker et al. 2015]
The problem to perform multiple tasks character-level attentional systems to
contemplate the standardization of clinical ideas [F. Liu, Weng and Jiang 2012]
These advancements have prompted the development of new research-based
clinical internet-based life content mining, including pharmacovigilance [Goodfellow
et al. n.d.]
In certainty, there is an absence of pertinent preparing corpora. Along these
lines, we propose utilizing an ill-disposed system which will help in paper for
preparing sets [Belazzoug, Touahria, Nouioua and Brahimi 2019b]

2.10 Feature Selection Using Sine Cosine Algorithm

2.10.1 Model

One of the most common models for text categorization is Bag of words (BOW). This
model faces limitations as the number of features involved in this model is large
causing influence on text categorization performance. ISCA is the result of some
added improvements of the powerful algorithm Sine Cosine Algorithm (SCA) which
helps in discovering new regions of the search space in comparison to the original
SCA algorithm. This algorithm evaluates on the basis of two positions to find the best
solution: the position of the best solution found till now, and the second one is a given
random position from the search space whereas the original SCA focuses only on the
best solution to generate a new solution. This combination allows in avoiding
premature convergence and improving the performance.

2.10.2 Problems Addressed

We presents an extensive empirical comparison of twelve feature selection methods.


Discusses a feature selection metric called ‘‘Bi-Normal Separation” (BNS) which
outperforms all the other methods including IG and X2 metrics [Belazzoug et al.
2019a].
SCA is a global optimization approach that uses a mathematical model based
on sine and cosine functions to iteratively update a set of candidate solutions. The
SCA algorithm has shown high efficiency in various applications. One of the best
examples is the optimization of continuous function. In this context SCA has been
tested on several datasets. It can also be seen in solving real problems such as the air
foil design problem.[. [Mirjalili 2016]
2.11 Tsetlin Machine for Text classification.

2.11.1 Model

Both high accuracy and ease of interpretation is required to perform medical


applications. These requirements come with challenges in text categorization in
techniques. The procedure used for understanding text should be explicable for health
specialists for their assistance in medical decision-making. It provides usage of
method based on Tsetlin Machine for Text categorization in which conjunctive
clauses are formed to gather complex patterns in natural language. It provides pattern
recognition which is easy to understand by composing patterns in propositional logic.
It is far better than state-of-the-art pattern recognition techniques in many ways like
pattern discrimination, image recognition, and optimal prediction of moves for board
games. It also solves the problem of interpretability. This model is simple to interpret
as input patterns as well as outputs are represented as sequences of bits. This feature
highly increases computational speed.

2.11.2 Problem Addressed

Using VDCNN-Very Deep CNN to classify text using up to 29 layers.[Conneau et al.


2016].
This paper presents a machine learning algorithm for building classifiers that are
composed of a small number of short rules. These are restricted disjunctive normal
form models. [T. Wang, Rudin, Liu, Klampfl and Macneille 2017]
This study focuses on the effectiveness of CNN on text categorization and
explains why CNN is suitable for the task.[T. Wang et al. 2017]
This approach of BiLSTM classifier model is quite similar to the approach
used by DL15 for text classification. Its paper proposes a training strategy that can
achieve accuracy competitive with the previous purely supervised models, but without
the extra pretraining step. [Johnson and Zhang 2017]

2.12 Text Classification Using Few Shot Learning

2.12.1 Model

For modelling text sequences many deep learning architectures have been
implemented but the main problem faced by them is the fact that they require a great
amount of unsupervised data for training their parameters which makes them
infeasible when large number annotated samples do not exist or they cannot be
accessed. SWEMs are able to extract representations for text classification with the
help of only few support examples. A modified approach of applying hierarchical
pooling method is proposed for few-shot text classification and which shows high
performance on long text datasets [Pan et al. 2019].
2.12.2 Problem Answered

The Model Agnostic Meta-Learner model has the main purpose to meta-learn initial
conditions for the subsequent fine-tunings about the problems of few shots. [Finn,
Abbeel and Levine 2017]
The paper proposes a model known as LSTM-based meta-learner model with
main focus to learn the exact optimization algorithm which can be used to train
another learner neural network classifier of the same regime.[ [Ravi and Larochelle
n.d.]
In this paper ideas from metric learning and from recent advances that
combines neural networks with external memories are employed.[Vinyals et al. n.d.]
This framework maps a small labelled support set to an unlabelled example
to its label. Then it defines one-shot learning problems on vision and language tasks.
[Snell, Swersky and Zemel n.d.]
This paper presents a short text classification framework which uses Siamese
CNN. The Siamese CNNs will learn the discriminative text encoding which will help
classifiers to distinguish informal sentences. To improve classifier's generalization
few shots, take different sentence structures and different descriptions of a topic as
prototypes.[Yan, Zheng and Cao 2018]

3 Evaluation
All the algorithms are evaluated using various methods and the results of experiments
are compared on the basis of different datasets for example 20Newsgroups, Reuters,
Yahoo Answers these datasets are among some of the benchmarking text datasets on
which the conventional model is tested so to take care of the performance of the
conventional models which are the one proposed here and in these datasets, there are
a lot of features because of which the testing of and selecting of various features can
be done. Precision and recall measures are used for evaluating algorithms for
categorization. Precision is the ratio of the quantity of records effectively allocated in
category C to the complete number of docs classified having a place with category C.
Review presents the ratio of the number of archives accurately appointed in category
C to the complete number of records really having a place with category C. There is a
third basic measure known as the F-measure (FM) which indicates the harmonic mean
of exactness and review. These three measures are described in the given equations.

Here TP represents the number of true positives, FP represents the number of false
positives and FN denotes the number of false negatives. In multiclass categorization,
macro averaging and micro averaging of precision and recall are used. In macro
averaging technique all the given classes are weighted equally, regardless of no. of
documents belonging to it while the micro average has equally all the given
documents, thus favouring the given performance on these classes. Also, Micro F1
basically depends on these given categories while macro F1 is taken care of by each
and every category. micro averaging and macro averaging on precision, recall and F-
measure for a |c| generally independent classification problems.

Broad tests are led to give a reasonable correlation with the unaided profound element
portrayal strategies. In this manner, a dataset with an enormous number of content
archives have been used. CNAE is an assortment of 1080 content reports.
20Newsgroups comprises 18821 content archives. Reuters is a subset of the database
Reuters21578 of content archives.

4 Comparison

Ref Dataset Proposed Compa Perform Limitation Advantages


Techniqu red ance s
e Algorit Metrics
hm(s)
[Zhen Yelp BRCAN SVM Accuracy Complex Easily grab
g and Sogou Deterioratin knowledge
Zhen Yahoo RNN g in the from long
g answers case of very texts and its
2019] , CNN long semantics.
Douban sentences Less
Movie CRAN parameters
Review Picks high
level features
[Guru 20New Grouped MFD Precision Number of The best
, sgroups Local MFDR Accuracy Features subset of
Suhil, Reuters Feature LFEF F-score may vary. features is
Raju TDT2 Evaluatio found.
and n Features are
Kum Function highly
ar relevant.
2018]
[Dai, IMDB Black- NIL Poisoning Dangerous Success rate-
Chen Movie Box rate 96%.
and Review Backdoor No large
Li Attack information is
2019] needed

[X. Wine Feature JMI FJMI FJMI and FJMI reduced


Tang, Parkins selection Relax Relax computational
Dai on based on MRMR MRMR complexity
and feature have higher
Xian interactio complexity.
g ns
2019]
[S. CNAE Unsupervi Autoen Deep Fail to learn Mapping with
Wang sed Deep coder Feature discriminati maximal and
et al. 20 Feature Neural represent ve features. minimum
2019] Newsgr Represent Networ ation between-
oups ation and k cluster
Deep Graph distance.
Reuters Learning Regular
ized
RCV1 Auto
encoder
s
[Jin Movie Condition CNN Results Difficult to Best
et al. Review al RNN are determine accuracy.
2019] s Reflection achieved the window
DBpedi Approach based on size
a the size
Hotel of the
Comme data set.
nt
[Bela Reuters Improved GA, Precision, Statistically Flexible
zzoug -21578 sine ACO recall weak Easy to adapt
, TREC cosine SCA and F1-
Toua OHUS algorithm OBL- measure
hria, MED SCA
Noui
oua
and
Brahi
mi
2019
a]
[Berg 20New Tsetlin Naïve Precision, Nonlinear Better
e et sgroups Machine Bayes recall patterns outcomes
al. IMDb Logisti F1- need to be expected in
2019] c measure better the future.
regressi analysed
on,

[Pan NetEas Few-Shot TF- Processin Inefficient Lightweight


et al. e Transfer IDF, g for short SWEMs are
2019] Cnews Learning Mean Memory text effective and
Pooling Running documents efficient
Max Time
Pooling
Twitter K-NN IFN F1-score Use a Different
[Sam Event classifier TP-ICF Macro different criteria as to
ant, SVD averaged dataset make an
Bhan Opinosi RFR efficient
u s result.
Murt Yahoo modOR
hy questio
and n
Mala
pati
2019]
[Ali 79 SRS SVM F-Score Precision Less Effective so
et al. Recall specializati as to help
2018] on in the software
security developers in
sector a sector they
are not
specialized in.
[K. DingXi MSMT BiLST Accuracy Cannot be More efficient
Liu angyish algorithm M Precision used for a way for
and eng’s Recall wide range medical data
Chen questio F1- of data
2019] n and Measure
answer

Table1: Comparison between various methods for text classification

Table 1 shows comparison of all the techniques used in our base papers. All the
algorithms have been compared on the basis of their advantages and limitations along
with their datasets and performance matrices.

Method Precision Recall F-Measure


Multinomial 82.8±0.0 80.0±0.0 79.8±0.0
Naive Bayes
Random Forest 69.9±0.0 68.2±0.0 68.3±0.0
KNN 56.0±0.0 43.3±0.0 45.9±0.0
LSTM 80.4±0.0 72.6±0.0 76.3±0.0
LSTM CNN 82.8±0.0 72.8±0.0 76.7±0.0
Bi-LSTM 80.9±0.0 72.6±0.0 76.5±0.0
Bi-LSTM CNN 81.8±0.0 72.3±0.0 76.7±0.0
Tsetlin Machine 82.6±0.0 80.9±0.0 81.7±0.0

Table 2: 20 Newsgroups dataset with 20 classes

Table 2 shows a comparison of various methods on different evaluation metricises.


Except for KNN and random Forest all the methods have shown their precision rates
in a compact range of approximately 80-83 in which Multinomial Naive Bayes has
shown the highest precision rate of 82.8±0.0. While KNN and random forest have
shown comparatively low precision rates 56.0±0.0 and 69.9±0.0 respectively. LSTM
based methods like LSTM, LSTM CNN , Bi-LSTM and Bi-LSTM CNN have shown
quite similar Recall rates with all the rates between 72 and 73. KNN has the lowest
Recall rate as 43.3±0.0 while Tsetlin Machine has shown the best Recall rate of
80.9±0.0.F measure rates also show the same trend as Recall rates with LSTM based
methods having similar rates around 76-77 whereas Tsetlin machine and KNN having
highest and lowest values as 81.7±0.0 and 45.9±0.0 respectively.
Overall analysis 20 Newsgroups dataset with 20 classes on the above
evaluation metrics Precision, Recall and F- measure shows that the Teslin machine
and Multinomial Naive Bayes have outperformed all the other methods compared to
them. Also, KNN has shown the least performance in this context.

5 Conclusions
On studying and closely analysing the various text classification techniques, we
identified various methods and highlighted their strengths and weaknesses in
extracting useful information from data. It is also important to realize the problems
present in text classification techniques in order to make a comparative study of
various classifiers and their performance and it is interesting to infer that it is
impossible to attach one single classifier for a specific problem. The semi-supervised
text classification reduces temporal costs and is important in the field of text mining.
We addressed some of the other crucial issues in the paper which includes
enhancement of performance, feature selection and zones of document.
In this paper we surveyed various approaches on text categorization and
feature selection based on an renown dataset known as 20Newsgroup and the results
were astonishing as we saw that when we were finding out the precision the
traditional Naive Bayes, Tsetlin Machine and BRCAN were among the best methods
whereas when we calculated the recall there was a huge difference in the recall of
Tsetlin Machine and the other algorithms and Tsetlin Machine outperformed all the
other algorithms by a great margin. But this caused confusion as we were not able to
compare the whole scenario so we used an evaluation method known as the F-
Measure that combines both the approaches. So, from the results of all the evaluation
methods we can conclude that the Tsetlin Machine is the best among the all methods
compared. 

6 Future Work
In future work, we plan to reduce the complexity while calculating the computational
time of MI terms of which the main challenge to be faced is of estimating the joint
probability of MI terms. We also plan to propose the ISCA algorithm along with other
search algorithms to study some aspects of feature selection problems. For further
study on pattern selection, we will consider using the techniques of unsupervised
learning, and will increase the feature vector size and state difference of different
techniques to reduce the sparsity terms. We plan to study the Tsetlin Machine and its
usage for unsupervised learning of word embeddings. We can build text
categorization using more efficient method of selection and increase performance
using MI on which we can change the way the word vector is created and by adding
some features to it we can also do sentiment analysis. Developing a defence
mechanism against this backdoor attack and studying the influence of trigger sense
content on the solution.

Acknowledgements

We are sincerely thankful to Vellore Institute of Technology, Vellore for providing us


the opportunity to write a review paper in the form of a dissertation on the topic “Text
Categorization Techniques: Literature Review and Current Trends”. We are also
thankful to our faculty in charge Mr. Saravanakumar Kandasamy for guiding us in
every stage of this review paper. Without his support it would have been very difficult
for us to prepare the paper so informative and interesting. Through this research paper
we have learnt a lot about text categorization and how it can be achieved and its
advantages and disadvantages. I hope this review paper inspire young minds and
could be useful for future innovations.

References

[Abrams 1999] Abrams, M. D.: “Security Engineering in an Evolutionary Acquisition


Environment”; (1999).
[Ali, Asif, Shahbaz, Khalid, Rehman and Guergachi 2018] Ali, I., Asif, M., Shahbaz, M.,
Khalid, A., Rehman, M., Guergachi, A.: “Text Categorization Approach for Secure Design
Pattern Selection Using Software Requirement Specification”;, Vol. 6 (2018).
[Belazzoug, Touahria, Nouioua and Brahimi 2019a] Belazzoug, M., Touahria, M., Nouioua, F.,
Brahimi, M.: “An improved sine cosine algorithm to select features for text categorization”;
Journal of King Saud University - Computer and Information Sciences, , No. xxxx (2019a).
https://doi.org/10.1016/j.jksuci.2019.07.003
[Belazzoug, Touahria, Nouioua and Brahimi 2019b] Belazzoug, M., Touahria, M., Nouioua, F.,
Brahimi, M.: “An improved sine cosine algorithm to select features for text categorization”;
Journal of King Saud University - Computer and Information Sciences, , No. xxxx (2019b).
https://doi.org/10.1016/j.jksuci.2019.07.003
[Bengio, Ducharme, Vincent, Jauvin, Ca, Kandola, et al. 2003] Bengio, Y., Ducharme, R.,
Vincent, P., Jauvin, C., Ca, J. U., Kandola, J., et al.: “A Neural Probabilistic Language Model”;
Journal of Machine Learning Research (Vol. 3) (2003).
[Berge, Granmo, Tveit, Goodwin, Jiao and Matheussen 2019] Berge, G. T., Granmo, O.-C.,
Tveit, T. O., Goodwin, M., Jiao, L., Matheussen, B. V.: “Using the Tsetlin Machine to Learn
Human-Interpretable Rules for High-Accuracy Text Categorization With Medical
Applications”; IEEE Access, Vol. 7 (2019), pp. 115134–115146.
https://doi.org/10.1109/access.2019.2935416
[Biggio, Nelson and Laskov 2012] Biggio, B., Nelson, B., Laskov, P.: “Poisoning Attacks
against Support Vector Machines”; (2012).
[Chen, Zhang, Long and Zhang 2016] Chen, K., Zhang, Z., Long, J., Zhang, H.: “Turning from
TF-IDF to TF-IGM for term weighting in text classification”; Expert Systems with
Applications, Vol. 66 (2016), pp. 1339–1351. https://doi.org/10.1016/j.eswa.2016.09.009
[Conneau, Schwenk, Barrault and Lecun 2016] Conneau, A., Schwenk, H., Barrault, L., Lecun,
Y.: “Very Deep Convolutional Networks for Text Classification”; (2016). Retrieved from
http://arxiv.org/abs/1606.01781
[Dai, Chen and Li 2019] Dai, J., Chen, C., Li, Y.: “A backdoor attack against LSTM-based text
classification systems”; IEEE Access, Vol. 7 (2019), pp. 138872–138878.
https://doi.org/10.1109/ACCESS.2019.2941376
[Duncan and de Muijnck-Hughes n.d.] Duncan, I., de Muijnck-Hughes, J.: “Security Pattern
Evaluation”; (n.d.).
[Finn, Abbeel and Levine 2017] Finn, C., Abbeel, P., Levine, S.: “Model-Agnostic Meta-
Learning for Fast Adaptation of Deep Networks”; (2017).
[Ganesan, Zhai and Han n.d.] Ganesan, K., Zhai, C., Han, J.: “Opinosis: A Graph-Based
Approach to Abstractive Summarization of Highly Redundant Opinions”; (n.d.). Retrieved
from http://timan.cs.uiuc.edu/
[Gao, Ramanathan and Tourassi 2018] Gao, S., Ramanathan, A., Tourassi, G.: “Hierarchical
Convolutional Attention Networks for Text Classification”; (2018).
[Goodfellow, Pouget-Abadie, Mirza, Xu, Warde-Farley, Ozair, et al. n.d.] Goodfellow, I. J.,
Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al.: “Generative
Adversarial Nets”; (n.d.). Retrieved from http://www.github.com/goodfeli/adversarial
[Guru, Suhil, Raju and Kumar 2018] Guru, D. S., Suhil, M., Raju, L. N., Kumar, N. V.: “An
alternative framework for univariate filter based feature selection for text categorization”;
Pattern Recognition Letters, Vol. 103 (2018), pp. 23–31.
https://doi.org/10.1016/j.patrec.2017.12.025
[He, Zhang, Ren and Sun n.d.] He, K., Zhang, X., Ren, S., Sun, J.: “Deep Residual Learning for
Image Recognition”; (n.d.). Retrieved from http://image-net.org/challenges/LSVRC/2015/
[Hu, Gao, Zhao, Zhang and Wang 2018] Hu, L., Gao, W., Zhao, K., Zhang, P., Wang, F.:
“Feature selection considering two types of feature relevancy and feature interdependency”;
Expert Systems with Applications, Vol. 93 (2018), pp. 423–434.
https://doi.org/10.1016/j.eswa.2017.10.016
[Jang, Wu and Jha 2017] Jang, U., Wu, X., Jha, S.: “Objective metrics and gradient descent
algorithms for adversarial examples in machine learning”; In ACM International Conference
Proceeding Series (Vol. Part F132521). Association for Computing Machinery (2017), pp. 262–
277. https://doi.org/10.1145/3134600.3134635
[Jin, Luo, Guo, Xie, Wu and Wang 2019] Jin, Y., Luo, C., Guo, W., Xie, J., Wu, D., Wang, R.:
“Text Classification Based on Conditional Reflection”; IEEE Access, Vol. 7 (2019), pp.
76712–76719. https://doi.org/10.1109/ACCESS.2019.2921976
[Johnson and Zhang 2017] Johnson, R., Zhang, T.: “Deep pyramid convolutional neural
networks for text categorization”; In ACL 2017 - 55th Annual Meeting of the Association for
Computational Linguistics, Proceedings of the Conference (Long Papers) (Vol. 1). Association
for Computational Linguistics (ACL) (2017), pp. 562–570. https://doi.org/10.18653/v1/P17-
1052
[Kim 2011] Kim, Y.: “Convolutional Neural Networks for Sentence Classification”; (2011).
[Le, Ranzato, Monga, Devin, Chen, Corrado, et al. 2011] Le, Q. v., Ranzato, M., Monga, R.,
Devin, M., Chen, K., Corrado, G. S., et al.: “Building high-level features using large scale
unsupervised learning”; (2011). Retrieved from http://arxiv.org/abs/1112.6209
[F. Liu, Weng and Jiang 2012] Liu, F., Weng, F., Jiang, X.: “A Broad-Coverage Normalization
System for Social Media Language”; Association for Computational Linguistics (2012).
[K. Liu and Chen 2019] Liu, K., Chen, L.: “Medical Social Media Text Classification
Integrating Consumer Health Terminology”; IEEE Access, Vol. 7 (2019), pp. 78185–78193.
https://doi.org/10.1109/ACCESS.2019.2921938
[P. Liu, Qiu and Huang 2016] Liu, P., Qiu, X., Huang, X.: “Recurrent Neural Network for Text
Classification with Multi-Task Learning”; (2016). Retrieved from
http://arxiv.org/abs/1605.05101
[X. Liu and Chen 2015] Liu, X., Chen, H.: “A research framework for pharmacovigilance in
health social media: Identification and evaluation of patient adverse drug event reports”;
Journal of Biomedical Informatics, Vol. 58 (2015), pp. 268–279.
https://doi.org/10.1016/j.jbi.2015.10.011
[McMinn, Moshfeghi and Jose 2013] McMinn, A. J., Moshfeghi, Y., Jose, J. M.: “Building a
large-scale corpus for evaluating event detection on twitter”; In International Conference on
Information and Knowledge Management, Proceedings (2013), pp. 409–418.
https://doi.org/10.1145/2505515.2505695
[Meng, Lin and Yu 2011] Meng, J., Lin, H., Yu, Y.: “A two-stage feature selection method for
text categorization”; Computers and Mathematics with Applications, Vol. 62, No. 7 (2011), pp.
2793–2800. https://doi.org/10.1016/j.camwa.2011.07.045
[Mirjalili 2016] Mirjalili, S.: “SCA: A Sine Cosine Algorithm for solving optimization
problems”; Knowledge-Based Systems, Vol. 96 (2016), pp. 120–133.
https://doi.org/10.1016/j.knosys.2015.12.022
[Moosavi-Dezfooli, Fawzi, Frossard´, Polytechnique and de Lausanne n.d.] Moosavi-Dezfooli,
S.-M., Fawzi, A., Frossard´, P. F., Polytechnique, F., de Lausanne, F.: “DeepFool: a simple and
accurate method to fool deep neural networks”; (n.d.). Retrieved from
http://github.com/lts4/deepfool
[Pan, Huang, Gong and Yuan 2019] Pan, C., Huang, J., Gong, J., Yuan, X.: ‘Few-Shot Transfer
Learning for Text Classification with Lightweight Word Embedding Based Models’; IEEE
Access, Vol. 7 (2019), pp. 53296–53304. https://doi.org/10.1109/ACCESS.2019.2911850
[Papernot, McDaniel, Goodfellow, Jha, Celik and Swami 2017] Papernot, N., McDaniel, P.,
Goodfellow, I., Jha, S., Celik, Z. B., Swami, A.: “Practical black-box attacks against machine
learning”; In ASIA CCS 2017 - Proceedings of the 2017 ACM Asia Conference on Computer
and Communications Security. Association for Computing Machinery, Inc (2017), pp. 506–
519. https://doi.org/10.1145/3052973.3053009
[Papernot, McDaniel, Jha, Fredrikson, Celik and Swami 2015] Papernot, N., McDaniel, P., Jha,
S., Fredrikson, M., Celik, Z. B., Swami, A.: “The Limitations of Deep Learning in Adversarial
Settings”; (2015). Retrieved from http://arxiv.org/abs/1511.07528
[Pinheiro, Cavalcanti and Ren 2015] Pinheiro, R. H. W., Cavalcanti, G. D. C., Ren, T. I.:
“Data-driven global-ranking local feature selection methods for text categorization”; Expert
Systems with Applications, Vol. 42, No. 4 (2015), pp. 1941–1949.
https://doi.org/10.1016/j.eswa.2014.10.011
[Ravi and Larochelle n.d.] Ravi, S., Larochelle, H.: “OPTIMIZATION AS A MODEL FOR
FEW-SHOT LEARNING”; (n.d.).
[Rifai, Vincent, Muller, Glorot and Bengio 2011] Rifai, S., Vincent, P., Muller, X., Glorot, X.,
Bengio, Y.: “Contractive Auto-Encoders: Explicit Invariance During Feature Extraction”;
(2011).
[Samant, Bhanu Murthy and Malapati 2019] Samant, S. S., Bhanu Murthy, N. L., Malapati, A.:
“Improving Term Weighting Schemes for Short Text Classification in Vector Space Model”;
IEEE Access, Vol. 7 (2019), pp. 166578–166592.
https://doi.org/10.1109/ACCESS.2019.2953918
[Sarker, Ginn, Nikfarjam, O’Connor, Smith, Jayaraman, et al. 2015] Sarker, A., Ginn, R.,
Nikfarjam, A., O’Connor, K., Smith, K., Jayaraman, S., et al.: “Utilizing social media data for
pharmacovigilance: A review”; Journal of Biomedical Informatics, Vol. 54 (2015), pp. 202–
212. https://doi.org/10.1016/j.jbi.2015.02.004
[Snell, Swersky and Zemel n.d.] Snell, J., Swersky, K., Zemel, T. R.: “Prototypical Networks
for Few-shot Learning”; (n.d.).
[Soucy and Mineau n.d.] Soucy, P., Mineau, G. W.: “Beyond TFIDF Weighting for Text
Categorization in the Vector Space Model”; (n.d.).
[Tai, Socher and Manning n.d.] Tai, K. S., Socher, R., Manning, C. D.: “Improved Semantic
Representations From Tree-Structured Long Short-Term Memory Networks”; (n.d.).
[D. Tang, Qin and Liu 2015] Tang, D., Qin, B., Liu, T.: “Document Modeling with Gated
Recurrent Neural Network for Sentiment Classification”; Association for Computational
Linguistics (2015). Retrieved from http://ir.hit.edu.cn/
[X. Tang, Dai and Xiang 2019] Tang, X., Dai, Y., Xiang, Y.: “Feature selection based on
feature interactions with application to text categorization”; Expert Systems with Applications,
Vol. 120 (2019), pp. 207–216. https://doi.org/10.1016/j.eswa.2018.11.018
[Vincent and Larochelle 2008] Vincent, P., Larochelle, H.: “Extracting and Composing Robust
Features with Denoising Autoencoders”; (2008), pp. 1096–1103.
[Vinyals, Deepmind, Blundell, Lillicrap, Kavukcuoglu and Wierstra n.d.] Vinyals, O.,
Deepmind, G., Blundell, C., Lillicrap, T., Kavukcuoglu, K., Wierstra, D.: “Matching Networks
for One Shot Learning”; (n.d.).
[D. Wang, Zhang, Liu, Lv and Wang 2014] Wang, D., Zhang, H., Liu, R., Lv, W., Wang, D.:
“T-Test feature selection approach based on term frequency for text categorization”; Pattern
Recognition Letters, Vol. 45, No. 1 (2014), pp. 1–10.
https://doi.org/10.1016/j.patrec.2014.02.013
[S. Wang, Cai, Lin and Guo 2019] Wang, S., Cai, J., Lin, Q., Guo, W.: “An Overview of
Unsupervised Deep Feature Representation for Text Categorization”; IEEE Transactions on
Computational Social Systems. Institute of Electrical and Electronics Engineers Inc. (2019,
June 1), pp. 504–517. https://doi.org/10.1109/TCSS.2019.2910599
[T. Wang, Rudin, Liu, Klampfl and Macneille 2017] Wang, T., Rudin, C., Liu, Y., Klampfl, E.,
Macneille, P.: “A Bayesian Framework for Learning Rule Sets for Interpretable Classification”;
Journal of Machine Learning Research (Vol. 18) (2017). Retrieved from
http://jmlr.org/papers/v18/16-003.html.
[Weiss and Mouratidis 2008] Weiss, M., Mouratidis, H.: “Selecting security patterns that fulfill
security requirements”; In Proceedings of the 16th IEEE International Requirements
Engineering Conference, RE’08 (2008), pp. 169–172. https://doi.org/10.1109/RE.2008.32
[Xiao, Li, Wang, Xu and Liu 2018] Xiao, Y., Li, X., Wang, H., Xu, M., Liu, Y.: “3-HBP : A
Three-Level Hidden Bayesian Link Prediction Model in Social Networks”;, Vol. 5, No. 2
(2018), pp. 430–443.
[Yan, Zheng and Cao 2018] Yan, L., Zheng, Y., Cao, J.: “Few-shot learning for short text
classification”; Multimedia Tools and Applications, Vol. 77, No. 22 (2018), pp. 29799–29810.
https://doi.org/10.1007/s11042-018-5772-4
[Yang, Liu, Zhu, Liu and Zhang 2012] Yang, J., Liu, Y., Zhu, X., Liu, Z., Zhang, X.: “A new
feature selection based on comprehensive measurement both in inter-category and intra-
category for text categorization”; Information Processing and Management, Vol. 48, No. 4
(2012), pp. 741–754. https://doi.org/10.1016/j.ipm.2011.12.005
[Yogatama, Dyer, Ling and Blunsom 2017] Yogatama, D., Dyer, C., Ling, W., Blunsom, P.:
“Generative and Discriminative Text Classification with Recurrent Neural Networks”; (2017).
Retrieved from http://arxiv.org/abs/1703.01898
[Zeng, Zhang, Zhang and Yin 2015] Zeng, Z., Zhang, H., Zhang, R., Yin, C.: “A novel feature
selection method considering feature interaction”; Pattern Recognition, Vol. 48, No. 8 (2015),
pp. 2656–2666. https://doi.org/10.1016/j.patcog.2015.02.025
[Zheng and Zheng 2019] Zheng, J., Zheng, L.: “A Hybrid Bidirectional Recurrent
Convolutional Neural Network Attention-Based Model for Text Classification”; IEEE Access,
Vol. 7 (2019), pp. 106673–106685. https://doi.org/10.1109/ACCESS.2019.2932619
[Zong, Wu, Chu and Sculli 2015] Zong, W., Wu, F., Chu, L. K., Sculli, D.: “A discriminative
and semantic feature selection method for text categorization”; International Journal of
Production Economics, Vol. 165 (2015), pp. 215–222.
https://doi.org/10.1016/j.ijpe.2014.12.035
 

You might also like