Toxic Comment Classification Using Natural Language Processing IRJET-V7I61123
Toxic Comment Classification Using Natural Language Processing IRJET-V7I61123
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6007
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072
b) Activation Function: - The output of a convolutional layer 4.1 Naive Bayes-Support Vector Machines
is passed to an activation function. An activation function is In this approach, we first compute TF-IDF scores for the
used to add non-linearity to the output of the Convolutional words in the train data using the TF-IDF Vectorizer. This
Layer. The most common activation function is the Rectified generates a matrix that contains an array of scores for each
Linear Unit (ReLu) function. A ReLu function can be defined training example.
as follows: Then we use the Multinomial Naive Bayes theorem on each
f(x) = max (0, x) column of the labels to generate the NB probabilities from
the TF-IDF matrix. This is then fed to SVM to predict the
c)Pooling Layer: - A pooling layer is used to reduce the probabilities of each label.
dimensions of the input by preserving the important After predicting the test set NB-SVM achieved an accuracy of
features. A Convolutional layer is often succeeded by a 97.61 %.
pooling layer to reduce the size and number of parameters
from the previous layer.
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6008
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072
4.2 Fasttext
The fasttext library takes the input in a text format, so all
the comments from the train data are converted to a text
document with each training example starting with ’
__label__’ followed by the respective label of the comment
and then the comment itself. This text file is fed into the
fasttext model and after fine-tuning the hyper parameters of
the number of epochs and learning rate to 5 and 0.1
respectively, the model achieved an accuracy of 95.4%.
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6009
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 07 Issue: 06 | June 2020 www.irjet.net p-ISSN: 2395-0072
5. CONCLUSION
REFERENCES
[1] Siwei Lai, Liheng Xu, Kang Liu, Jun Zhao, ” Recurrent
Convolutional Neural Networks for Text Classification”
Proceedings of the Twenty-Ninth AAAI Conference on
Artificial Intelligence, Austin, Texas, 2015.
[2] Sida Wang and Christopher D. Manning, “Baselines and
Bigrams: Simple, Good Sentiment and Topic
Classification”, Stanford, CA,
[3] Mujahed A. Saif, Alexander N. Medvedev, Maxim A.
Medvedev, and Todorka Atanasova, “Classification of
online toxic comments using the logistic regression and
neural networks models”, AIP Conference Proceedings
2048, 060011 (2018)
[4] R. Nicole, “Title of paper with only first word
capitalized,” J. Name Stand. Abbrev., in press.
[5] Sepp Hochreiter, Jurgen Schmidhuber, “LONG SHORT-
TERM MEMORY”, Neural Computation 9(8):1735-1780,
1997
[6] Navaney, P., Dubey, G., &Rana, A. (2018). "SMS Spam
Filtering Using Supervised Machine Learning
Algorithms." 2018 8th International Conference on
Cloud Computing, Data Science & Engineering
(Confluence).
© 2020, IRJET | Impact Factor value: 7.529 | ISO 9001:2008 Certified Journal | Page 6010