Hate Speech Detection Using Machine Learning2
Hate Speech Detection Using Machine Learning2
Assistant Professor, Department of Student, Department of Computer Student, Department of Computer Student, Department of Computer
Computer Science and Engineering Science and Engineering Science and Engineering Science and Engineering
SRM Institute of Science and SRM Institute of Science and SRM Institute of Science and SRM Institute of Science and
Technology Technology Technology Technology
Ramapuram Campus, Chennai, Ramapuram Campus, Chennai, Ramapuram Campus, Chennai, Ramapuram Campus, Chennai,
Tamil Nadu, India Tamil Nadu, India Tamil Nadu, India Tamil Nadu, India
[email protected] [email protected] [email protected] [email protected]
manual feature engineering; the more contemporary deep In the form of comma-separated values files, the data includes
learning paradigm, on the other hand, uses deep learning tweets and the attitudes they express. Tweet id is a unique
methods or neural networks on raw data to automatically identifier, and sentiment is either 1 (positive) or 0 (negative).
The tweet is enclosed in "." in a CSV training dataset. CSV
generate multi-layers of abstract features from it [15-17]
files of the form tweet id, tweet are used for the test dataset.
A. Advantages
1. Oversampling or undersampling may not be able to solve
this problem, as we'll illustrate in the next section.
2. There are no distinguishing linguistic traits in hate Tweets
that are not present in non-hate tweets.
One way of measuring how distinct hate and non-hate
Twitter messages are in their vocabulary is to look at how
many words are unique to each class.
A.5 Modelling:
In order to build a k-NN classifier model, 10 neighbor classes
and the Euclidean distance between them were used.
A.6 Evaluation:
The classifier is subsequently evaluated using new test data,
and the R squared values for the training and test datasets are
computed.
REFERENCES
[1] David M. Blei, Andrew Ng, and Michael I. Jordan. 2003. Latent Dirichlet
Allocation. Journal of Machine Learning Research, 3:993–1022.
[2] Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vincent J. Della
Pietra, and Jenifer C. Lai. 1992.Class-based n-gram models of natural
language. Computational Linguistics, 18(4):467–479.
[3] Karnan, B., Kuppusamy, A., Latchoumi, T. P., Banerjee, A., Sinha, A.,
Biswas, A., & Subramanian, A. K. (2022). Multi-response Optimization of
Turning Parameters for Cryogenically Treated and Tempered WC–Co
Inserts. Journal of The Institution of Engineers (India): Series D, 1-12.
[4] Pete Burnap and Matthew L. Williams. 2015. Cyber hate speech on twitter:
An application of machine classification and statistical modeling for policy
and decision making. Policy & Internet, 7(2):223–242.
[5] Dasari, M. S., & Mani, V. (2020). Simulation and analysis of three-phase
parallel inverter using multicarrier PWM control schemes. SN Applied
Sciences, 2(5), 1-10.
[6] Latchoumi, T. P., & Parthiban, L. (2022). Quasi oppositional dragonfly
algorithm for load balancing in a cloud computing environment. Wireless
Personal Communications, 122(3), 2639-2656.
[7] Pete Burnap, Matthew L. Williams, Luke Sloan, Omer Rana, William
Housley, Adam Edwards, Vincent Knight, Rob Procter, and Alex Voss.
2014. Tweeting the terror: modelling the social media reaction to the
woolwich terrorist attack. Social Network Analysis and Mining, 4(1):1–14.
[8] Pavan, V. M., Balamurugan, K., & Latchoumi, T. P. (2021). PLA-Cu
reinforced composite filament: Preparation and flexural property printed at
different machining conditions. Advanced composite materials.
[9] Ying Chen, Yilu Zhou, Sencun Zhu, and Heng Xu. 2012. Detecting
offensive language in social media to protect adolescent online safety. In
Privacy, Security, Risk and Trust (PASSAT), 2012 International
Conference on and 2012 International Conference on Social Computing
(SocialCom), pages 71–80, Amsterdam, Netherlands, September. IEEE.
[10] Ravipati, S., Mani, V., & Yarlagadda, S. R. (2021). Efficient Control of
Sensorless Hybrid Electric Vehicle Using RBFN Controller. Studies in
Informatics and Control, 30(4), 87-97
[11] Banu, J. F., Muneeshwari, P., Raja, K., Suresh, S., Latchoumi, T. P., &
Deepan, S. (2022, January). Ontology-Based Image Retrieval by Utilizing
Model Annotations and Content. In 2022 12th International Conference
on Cloud Computing, Data Science & Engineering (Confluence) (pp.
Fig 5: Activity Diagram 300-305). IEEE.
[12]Nemanja Djuric, Jing Zhou, Robin Morris, Mihajlo Grbovic, Vladan
Radosavljevic, and Narayan Bhamidipati. 2015. Hate speech detection with
VIII. CONCLUSION comment embeddings. In Proceedings of the 24th International Conference
We conducted a poll to see if hate speech identification could on World Wide Web, pages 29–30, New York, NY, USA. ACM.
[13]Njagi Dennis Gitari, Zhang Zuping, Hanyurwimfura Damien, and Jun
be automated. The most common way to describe this Long. 2015. A lexicon-based approach for hate speech detection.
challenge is as a problem of supervised learning. In a regular International Journal of Multimedia and Ubiquitous Engineering,
order, features that are sufficiently generic, such as word bags 10(4):215–230.
or word embeddings, provide good classification performance. [14]Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq,
Richard Han, Qin Lv, and Shivakant Mishra. 2015. Detection of
Character-level techniques outperform token-level ones. There cyberbullying incidents on the instagram social network. CoRR,
are several lists of slurs that can aid in categorization, but only abs/1503.03909.
when they are used in conjunction with other traits. Many more [15]Irene Kwok and Yuzhou Wang. 2013. Locate the hate: Detecting tweets
advanced characteristics, such as rely upon information or against blacks. In Marie desJardins and Michael L. Littman, editors, AAAI,
pages 1621– 1622, Bellevue, Washington, USA. AAAI Press.
features that mimic certain linguistic constructions, such as [16] Ravipati, S., Mani, V., & Yarlagadda, S. R. (2021). Efficient Control of
imperatives or politeness, have been demonstrated to be useful.. Sensorless Hybrid Electric Vehicle Using RBFN Controller. Studies in
Textual analysis may not be the only way to determine whether Informatics and Control, 30(4), 87-97
or not someone is spewing hate speech. There is a chance that [17] Dasari, M. S., & Mani, V. (2020). Simulation and analysis of three-phase
parallel inverter using multicarrier PWM control schemes. SN Applied
information gained from other modalities (such as pictures sent Sciences, 2(5), 1-10
along with text messages) might be useful as well. In many [18] C Bhuvaneshwari, A Manjunathan, “Reimbursement of sensor nodes and
situations, the only data sets that may be used to make path optimization”, Materials Today: Proceedings, 2021, 45, pp.1547-1551
judgments regarding the overall efficacy of these complicated [19] Roselin Suganthi Jesudoss, Rajeswari Kaleeswaran, Manjunathan
Alagarsamy, Dineshkumar Thangaraju, Dinesh Paramathi Mani,
characteristics are those that are not publicly available and that Kannadhasan Suriyan, “Comparative study of BER with NOMA system in
exclusively cover a specific subtype of hate speech, such as different fading channels”, Bulletin of Electrical Engineering and
bullying of certain ethnic minority. When it comes to Informatics, 2022, 11(2), pp. 854–861
identifying hate speech, there is a need for a uniform data set
that can be used to compare characteristics and approaches.