Paper 2
Paper 2
Learning
Amirmohammad Shahbandegan Lakshmi Preethi Kamak Mohammad Ghadiri
Depatrment of Computer Science Depatrment of Computer Science Depatrment of Computer Science
Lakehead University Lakehead University Lakehead University
Thunderbay, Canada Thunderbay, Canada Thunderbay, Canada
1172613 1160111 1170979
B. Future Scope
The authors are planning to improve this work in the future
in two ways. First, the effect of the hyper-parameters on the
final pipeline model are not studied in this work. It is possible
to run more experiments and find the optimal hyper-parameters
for the binary and multi-class models and therefore increase
the performance of the model.
Second, as mentioned before, the under sampling technique
used in the binary classifier resulted in poor performance.
One way to improve the quality of the binary classifier is
by employing over sampling methods instead. There are two
general strategies in generating synthetic text. One way is to
use the general methods such as SMOTE and ADASYN on
the embedded text. The other can be achieved by generating
new synthetic text using methods such as back-translation and
word-replacement and then generating new embeddings for
these synthetic text data. Both of the mentioned methods could
be a successful way to improve the performance of the binary
model and are great directions to work in the future.
ACKNOWLEDGEMENTS
Fig. 4. Our final best classification architecture
R EFERENCES
VI. D ISCUSSION AND C ONCLUSION [1] E. Englander, E. Donnerstein, R. Kowalski, C. A. Lin, and K. Parti,
“Defining cyberbullying,” Pediatrics, vol. 140, no. Supplement 2, pp.
S148–S151, 2017.
A. Challenges Faced [2] All the latest cyber bullying statistics and what they mean in
1) Long Execution time: Running all of the 450 exper- 2022. BroadbandSearch.net. (n.d.). Retrieved April 7, 2022, from
https://www.broadbandsearch.net/blog/cyber-bullying-statistics
iments in the first set was challenging since it required a [3] Canada, P. S. (2021, February 5). Government of Canada.
huge amount of time to complete. To overcome the slow Cyberbullying can be against the law - Canada.ca. Retrieved
execution time of the models, the experiments were conducted April 7, 2022, from https://www.canada.ca/en/public-safety-
canada/campaigns/cyberbullying/cyberbullying-against-law.html
in parallel on high-preformance compute nodes in Compute [4] Hatfield, H. (n.d.). Stop school bullying and cyber-
Canada clusters bullying. WebMD. Retrieved April 7, 2022, from
2) Random Under Sampling: The use of random under https://www.webmd.com/parenting/features/prevent-cyberbullying-
and-school-bullying
sampling had a negative effect on model’s performance. This [5] J. Wang, K. Fu and C.-T. Lu, ”Fine-grained balanced cyberbullying
can be due to the huge loss of information that happens when dataset”, 2020.
using this under sampling method. It is possible to experiment [6] J. Wang, K. Fu and C. -T. Lu, ”SOSNet: A Graph Convolutional
Network Approach to Fine-Grained Cyberbullying Detection,” 2020
with over sampling techniques in the future to overcome this IEEE International Conference on Big Data (Big Data), 2020, pp. 1699-
issue. 1708, doi: 10.1109/BigData50022.2020.9378065.
[7] T. Ahmed, M. Kabir, S. Ivan, H. Mahmud and K. Hasan, ”Am I
Being Bullied on Social Media? An Ensemble Approach to Categorize
Cyberbullying,” 2021 IEEE International Conference on Big Data (Big
Data), 2021, pp. 2442-2453, doi: 10.1109/BigData52589.2021.9671594.
[8] Rajaraman, Anand, and Jeffrey David Ullman. Mining of massive
datasets. Cambridge University Press, 2011. Mikolov, Tomas, et al.
”Efficient estimation of word representations in vector space.” arXiv
preprint arXiv:1301.3781 (2013).
[9] Pennington, Jeffrey, Richard Socher, and Christopher D. Manning.
”Glove: Global vectors for word representation.” Proceedings of the
2014 conference on empirical methods in natural language processing
(EMNLP). 2014.
[10] Pennington, Jeffrey, Richard Socher, and Christopher D. Manning.
”Glove: Global vectors for word representation.” Proceedings of the
2014 conference on empirical methods in natural language processing
(EMNLP). 2014.
[11] Jurafsky, Daniel; H. James, Martin (2000). Speech and language pro-
cessing : an introduction to natural language processing, computational
linguistics, and speech recognition. Upper Saddle River, N.J.: Prentice
Hall. ISBN 978-0-13-095069-7.