2022 Using Deep Transfer Learning
2022 Using Deep Transfer Learning
https://doi.org/10.1007/s40747-022-00772-z
ORIGINAL ARTICLE
Received: 13 August 2021 / Accepted: 1 May 2022 / Published online: 25 May 2022
© The Author(s) 2022
Abstract
Social networking platforms like Facebook, Twitter, and others have numerous advantages, but they have many dark sides
also. One of the issues on these social platforms is cyberbullying. The impact of cyberbullying is immeasurable on the life
of victims as it’s very subjective to how the person would tackle this. The message may be a bully for victims, but it may be
normal for others. The ambiguities in cyberbullying messages create a big challenge to find the bully content. Some research
has been reported to address this issue with textual posts. However, image-based cyberbullying detection is received less
attention. This research aims to develop a model that helps to prevent image-based cyberbullying issues on social platforms.
The deep learning-based convolutional neural network is initially used for model development. Later, transfer learning models
are utilized in this research. The experimental outcomes of various settings of the hyper-parameters confirmed that the transfer
learning-based model is the better choice for this problem. The proposed model achieved a satisfactory accuracy of 89% for
the best case, indicating that the system detects most cyberbullying posts.
123
5450 Complex & Intelligent Systems (2022) 8:5449–5467
ousness of the problem can be seen by the alarming statistics – Finding the best-suited model to detect the bully images
that are provided by these articles by ceoworld.biz.1 India is a challenging task, hence experimented with both DL
is the most affected country by cyberbullying in teens, fol- and transfer learning models to find the best model.
lowed by Brazil and United States in the year 2018. It is – The experimental outcomes confirmed that the trans-
not just about the number of cases but also about the unre- fer learning models are the better choice for predicting
ported cases. According to CyberBAPP (a Mumbai-based image-based cyberbullying posts.
anti-cyberbullying organisation), one out of three users has
been threatened online. In contrast, almost half of the users
have been bullied once or more, in which only half of the The rest of the paper is organised as follows: the next sec-
cases are reported, which makes cyberbullying detection tion discusses the existing works. the third section highlights
hard. As only about 50% cases are reported, the impact of the working of 2DCNN models used in the proposed frame-
this filter would help a greater number of people than the work. Fourth section discusses the experimental outcomes
thought of. of different models. Fifth section discusses the findings and
These statistics indicate that the issue of cyberbullying highlight the uses of the proposed framework. Finally, the
needs immediate attention. However, the unavailability of last section concludes the work with limitation and future
the labelled dataset is one of the biggest challenges of this scope.
research. To fill this gap, we have developed an image dataset
for this research. Recently, much research has been reported
with different machine learning (ML) and deep learning (DL) Literature review
approaches to address cyberbullying for the textual dataset
[4,9,10]. However, cyberbullying with the image received Recently, cyberbullying attracted huge attention from the
comparatively less attention. This leads to a major issue in research community. This section discusses the relevant
the current time, where most posts consist of images and research contribution on cyberbullying detection [3,20–24].
textual content. Such bullied posts remain untraced by the Hosseinmardi et al. [24] developed a model for cyberbully-
system. Hence, this research focused on detecting image- ing detection on Instagram by extracting captions, comments
based social cyberbullying posts. To process the image and and image content. The dataset was developed using Insta-
get the required features from it, DL-based convolutional gram API and the public profile of the users. Further, the
neural network (CNN) and transfer learning models have collected dataset was annotated using the CrowdFlower plat-
been widely used by the research community in the recent form, where annotators needed to pass a quiz to label the data.
past. Many pieces of research apart from cyberbullying detec- The dataset was divided into three sets:(i) Set40+ had 49%
tion have been reported recently using the CNN network [3] as non-cyber bullying and rest bullying; (ii) Set 0+ had 15%
such as spam detection [11], question answering [12,13], text not bullying rest bullying and (iii) Set 0 where there were
quality prediction [14], healthcare [15–19]. By following the no bullying instances. The ratio of 80:20 was followed for
performance of CNN models on different research on images training and testing to the model. The features were extracted
and text-domain, this research also used the 2DCNN model. from the data as followers, following, early caption, Image
Apart from 2DCNN, the transfer learning approach is also content etc. Logistic regression was used to predict the bully-
used for the same. The main contributions of our research ing post on the set40+ dataset-features like early comments,
are as follows: caption, post time, user properties, and image content are
used. The unigram and bigram combined features produced
the F1-score of 0.84.
– We proposed a transfer learning-based automated model AlAjlan et al. [25] used a DL model for cyberbullying
to detect image-based cyberbully posts from the social detection. They used feature selection and feature engineer-
platform. The transfer learning models are capable of ing techniques to extract the features from input. They used
extracting hidden contextual features from cyberbullying a Twitter dataset consisting of 39,000 tweets from which the
posts. duplicates were removed during cleaning. The model was
– Created two sets of datasets (i) having 1000 images and trained with 9000 bully, 21,000 non-bully tweets and tested
(ii) 3000 images consisting of cyberbullying and non with 2700 bully 6300 non-bully tweets. Their model per-
cyberbullying images. The datasets can be useful for formed far better than the SVM, with 95% accuracy. Banerjee
future researchers to extend the research. et al. [26] used a dataset consisting of 69,874 tweets. They
converted the word into vectors using Glove word embed-
ding. Removal of stop word accentuation marks was done,
1https://ceoworld.biz/2018/10/29/countries-where-cyber-bullying- and then converting to lowercase was performed during data
was-reported-the-most-in-2018/. preprocessing. On processed data, they used CNN based
123
Complex & Intelligent Systems (2022) 8:5449–5467 5451
DL model to detect the bully posts and achieved 93.97% as cyber aggressive or not. They reduced the features using
of accuracy value. Cigdem et al. [20] developed a model for the binary particle swarm optimization (BPSO) algorithm.
automatic detection of the cyberbullying instance in a social A multimodal dataset of 3600 images was manually cre-
network using text mining methods. They experiment with ated, comprising images and comments associated with the
different types of classifiers with feature selection algorithms image. The images are mainly symbolic images classified
to find the best results. The dataset was acquired from three into three categories: non-aggressive, medium-aggressive,
different social networks: (i) Formspring.me, (ii) Myspace, high-aggressive. The model used here combines the VGG16
and (iii) YouTube. The dataset was divided into two classes: network with a 3-layered CNN and BPSO algorithm for
(i) cyberbullying positive and (ii) cyberbullying negative. The optimization. The VGG16 network processes the images.
Formspring.me dataset was in an XML file containing 13,158 The text features are embedded to BPSO for optimum fea-
messages, from which 892 were cyberbullying positive, and tures selection and then passed to the different classifiers to
the rest 12,266 messages were cyberbullying Negative. The classify the images into predefined categories. The random
Myspace dataset consists of 1753 messages, out of which 357 forest classifier had the best F1-score of 0.74. Sadiq et al.
are positive, and the rest 1396 are negatively labelled. The [23] addressed the challenge of automatic identification of
YouTube dataset had 3464 messages from different users and aggression on tweets of the cyber-troll dataset. They used
out of which 417 were positive, the remaining 3047 were neg- CNN-LSTM and CNN-BiLSTM models. The dataset has
ative. Two classifiers, SGD and MLP, achieved the f-measure 20,001 instances, out of which 7,822 are cyber-aggressive,
value of more than 0.90 for all datasets. and 12,179 are non-cyber aggressive instances. The dataset is
Kumari et al. [3] used a unified representation of text first preprocessed for improving the result using NLTK. Their
and image, which would eventually make cyberbullying free model of tf-idf with uni-gram and bi-gram outperformed by
social media. For research purposes, 2100 images were man- achieving an accuracy value of 0.92 and an F1-score value
ually gathered from Facebook, Instagram, Twitter, Google, of 0.90.
etc. The CNN based system was used to classify each image The existing study on cyberbullying with image [3,21,
and comment into bullying and not-bullying and achieved a 22,24] is lacking very behind the text based cyberbully-
weighted F1-score of 0.68. Hate speech almost similar con- ing detection [20,23–28] not just in terms of the accuracy
text of cyberbullying was detected by [21]. Two types of and F1-score but also in terms of the number of research.
approaches are used: unimodal and multimodal. In unimodal, The model developed for textual cyberbullying detection
they used InceptionV3 architecture with 2048 dimensional achieved 0.90 F1-score [27], also the accuracy value is greater
feature vector and then 150 dimension vector for both image than 90% [20,23,25–27]. Compared to textual cyberbullying,
text read from OCR. The multimodal dataset consists of image-based cyberbullying detection received less attention.
150,000 tweets with both image and text. Tweet text comes However, in the current time, the post is not limited to text but
from LSTM architecture. The models were run by giving also posted in images and image-text mixed form. Hence, to
inputs like tweet text, image text, and image. The LSTM ensure a cyberbully-free network, it is needed to capture the
model with only text data achieved the F1 value is 0.703 and image-based bullied post soon published on the social plat-
an accuracy value of 68.30%. On combined input features, form. This study focused on developing an automated model
i.e., tweet text, image text and images, the model achieved with a deep transfer learning approach to detect image-based
an F1-score of 0.701 and 68.2% of accuracy, similar to the cyberbullying posts on social platforms to fill this research
LSTM model with text data only. gap.
Chen et al. [27] proposed a text classification model based
on CNN for the de facto verbal aggression dataset. They
have manually added Tweets, and Facebook comments to the
datasets while their emotions and stickers are not considered.
Besides the hand labelled comments, they collected social Methodology
network comment data from ‘sentiment140 corpus’. After the
modification, polarities of the tweets are tagged as aggressive Deep learning based Convolutional Neural Network (CNN)
or unaggressive. They removed the usernames, which are frameworks have shown their effectiveness and precision
followed by at the rate, hash topics with stickers, performed in various fields of image processing, including healthcare
lowercasing during preprocessing. The tf-idf technique did [15,17–19,29], social networks [11,12], agriculture [30–
feature extraction. The DL-based CNN model obtained the 32], and others. We have also utilized the CNN-based
best results with an accuracy of 0.92 with an AUC value of models for cyberbullying detection of social platforms by
0.98. following them. This section discusses the working of a two-
Kumari et al. [22] automatically extracted features from dimensional CNN (2DCNN) for cyberbullying detection and
text and images using DL techniques to classify the image also highlight the transfer learning models.
123
5452 Complex & Intelligent Systems (2022) 8:5449–5467
Two dimensional convolutional neural network operation with image matrix and filter size will be (n + 2 p −
f + 1) ∗ (n + 2 p − f + 1).
As shown in Fig. 1, a 2DCNN works in three phases: (i)
Pooling: The features extracted with the convolution oper-
extracting the features by convolution operation on input
ation in Image I and filter K are downsampled in this step.
images, (ii) selecting the important features using pooling
All extracted features might not have equal importance, and
operation, (iii) pooled features are flattened and passed to a
hence from each channel, the import features are pooled out.
fully connected dense layer present at the end.
This operation only affects the dimensions of the feature
Convolution operation: Once the input data is padded, and matrix.
stride value is defined, the convolution product between the
input tensor and filter can be defined. The convolution is a nh + 2 p − f
dim( pooling(I )) = +1 ,
sum of the element-wise product as shown in Fig. 2. Mathe- s
matically, an image will be represented in tensor form with nw + 2 p − f
+ 1 , nc ; s > 0
the following dimensions (Eq. 1): s
(n h + 2 p − f , n w + 2 p − f , n c ); s = 0
dim(image) = (Nh , Nw , Nc ) (1)
In general, the CNN works as follows- first, it extracts the
features using convolution operation and pooled the impor-
where; Nh is height, Nw is width and Nc is the channels of
tant features using the pooling layer. Then the pooled features
the image. For a colourful image (RGB), the value of Nc is
pass to a fully connected layer at the end of the framework.
3, which represents three channels—Red, Blue, and Green.
Suppose the following notations are used to represent the
The filter K used for convolution operation is square and
different terminologies at a particular layer ith of 2DCNN.
has odd size f d and the same number of channels Nc as the
Input of the model is a [i−1] with height n [i−1]
h , width n [i−1]
w
input image. The filter will be applied to each channel to [i−1]
extract the image’s pixel information. The dimension of the and channels n c , Padding is represented by p [i] , the kernel
filter used for convolution operation is (Eq. 2): will be moved with a stride size s [i] . The filters (F) are used
for convolution operation having the dimension of n ∗ n. The
bias of the network is represented as bn[i] , where n is the con-
dim( f ilter ) = ( f d , f d , Nc ) (2)
volution number. The processed information pass through an
activation function denoted by Ψ [i] . The output of the con-
The outcome of the convolution operation between the volution operation having the dimension of with height n [i] h ,
input image and filter K is a 2D matrix. Each value of the [i] [i]
width n w and channels n c . Then convolution operation will
2D matrix was calculated with element-wise multiplication
be represented as follows:
and taking the sum (Fig. 2).
Mathematically, the convolution operation on an image I
∀n ∈ [1, 2, ..., n [l]
c ]:
with a kernel K will be defined as (Eq. 3):
conv(a [i−1] ) conv(a [i−1] , F (n) )x y
⎛ [i−1] ⎞
nh [i−1] [i−1]
Nh
Nw
Nc
⎜ n
w n
c
(n) [i−1] [i] ⎟
conv(I , K )x,y = K i, j,k I x+i−1,y+ j−1,k (3) = Ψ [i] ⎝ Fi, j,k ax+i−1,y+ j−1,k + bn ⎠
i=1 j=1 k=1 i=1 j=1 k=1
(4)
Mathematically, the output matrix dimension will be:
dim(conv(I , K )) =
nh + 2 p − f
+1 , dim(conv(a [i−1] , F (n) )) = (n [i] [i]
h , nw )
s
nw Thus:
2p − f s + 1 ; s > 0
+
(n h + 2 p − f , n w + 2 p − f ); s = 0 a [i] = Ψ [i] (conv(a [i−1] )F (1) ), Ψ [i] (conv(a [i−1] )F (2) ),
Ψ [i] (conv(a [i−1] )F (3) ), ...., Ψ [i]
Here s is stride value which is fixed to 1, n h and n w are [i]
(conv(a [i−1] )F (n c ) )
the height and width of the image, p is padding, f is the
size of filter. Conclusively, If the input image size having the
dimension n ∗ n, and the filter size is f ∗ f , padding the
p ∗ p then the size of the matrix obtained after convolution dim(a [i] ) = (n [i] [i] [i]
h , nw , nc )
123
Complex & Intelligent Systems (2022) 8:5449–5467 5453
Input
Image
4 x -1 5 x 0 6x1
Image 7 x -1 8 x 0 9x1
1 2 3 5 6 3
4 5 6 0 5 2 6 -9 -6 3
-1 0 1
7 8 9 1 1 4 -5 -7 0 -1
-1 0 1
9 1 0 6 9 0
-1 8
0 1
= -4 -6 3 5
2 4 5 0 7 8 -7 -3 7 1
Filter
8 6 7 2 3 1 Output
This way, the parameters of the CNN are trained. The els to predict image-based cyberbullying posts. Initially,
convolution operation with the pooling operation helps to different transfer learning models like VGG16 [33], Xcep-
detect the filters used and these filters help in identifying the tion [34], VGG19 [33], InceptionResNetV2 [35], ResNet101
class of the image. [36], InceptionV3 [37] and others available in Keras library2
Figure 3 shows the configuration of the convolution layer, was applied to the selected dataset. Based on experimental
and pooling makes up a block; adding more and more blocks outcomes of the different models, it was found VGG16 and
increases the computation time and increases the number InceptionV3 are performing better than the other models.
of features. Thus, with more blocks, more features will The depth of the VGG16 model is 16, and it is just a simple
be extracted. This research uses three configurations: one stack of convolutional and max-pooling layers followed by
block, two blocks and three blocks of 2DCNN models. The one another and finally fully connected layers. It was one of
extracted features are flattened and pass to the dense layer the best performing architecture in the ILSVRC challenge
present at the end. The internal layers of the network use the in the same year. Hence, we have continued our research
activation function as ReLU, whereas at the output layer, the with VGG16 [33], and InceptionV3 [37] transfer learning
softmax activation function is used. The compilation of the approaches which are CNN based and are used widely for
model is done with the cross-entropy loss function with two image recognition purposes. InceptionV3 is the successor of
different optimizers: SGD and Adam. The dropout is also InceptionV1 and InceptionV2, which Google develops for
used to ensure there is no overfitting. The dropout refers to ILSVRC. It is comparatively a very light model than the
ignoring units (i.e. neurons) during the training phase of a VGG16 and the runner up of image classification in ILSVRC
certain set of neurons chosen at random. in 2015. It has a depth of 48 and is much more complex than
the VGG models, where the concept of inception is used
rather than just stacking the convolution and max-pooling
Transfer learning
layers one after another.
Transfer learning models have the edge over existing deep
learning architecture and are effectively used in multiple
domains for the prediction task. Hence, this research also
utilizes the benefit of pre-trained transfer learning mod- 2 https://keras.io/api/applications/.
123
5454 Complex & Intelligent Systems (2022) 8:5449–5467
Data preparation Table 1 Statistics of the dataset used for model development
Number of sample Bully Not bully Training Test
One of the major challenges of this work is data collection and
annotations. The image data for cyberbullying is not directly 1000 356 644 750 250
available, and thus images were collected from many sources. 3000 1458 1542 2250 750
The image data were acquired mainly from google images
searches by searching for the related terms of cyberbullying.
Still, as the images from google search belong to the web- each dataset category. The statistics of the dataset are shown
sites they are hosted on, the sources of the image are all given in Table 1.
due here. Some images were also taken from MMHS150K
dataset [21]. The image downloaded was converted to .jpg Data preprocessing
format if they were in any other format to maintain unifor-
mity. Further, with the help of three independent annotators, Every image has a different resolution and colour scheme, so
the dataset was labelled as bully and non-bully. The final we converted images to the same target size of the models.
dataset consists of 3000 images containing 1458 bullying There is a specified image input size for every DL model,
and 1542 not bullying images. The developed dataset con- and every image does not fit the input size. Thus, we need to
tains four columns, i.e., image name, description, bully or pre-process the image before passing it to the model for train-
not bully, source. The image names were given in the num- ing or testing purposes. Every model has its pre-processing
ber series as 1.jpg, 2.jpg,... so on as to which it is easier to requirements; therefore, we need different pre-processing for
acquire them in the models. every model. The 2DCNN model does not have a specific
The data has been collected to make the model understand input size for the image. However, transfer learning models
the normal case and the cyberbullying cases. For example, VGG16 and InceptionV3 needed input in predefined sizes
if obscene photos with human faces are present, the model of 224 × 224 × 3 and 299 × 299 × 3, respectively. Hence,
may categorise the normal human face as bullying. To avoid all images are reshaped accordingly. Further, the images are
this situation, a sufficient number of instances were added in converted into three channels, i.e., RGB. Next, images are
123
Complex & Intelligent Systems (2022) 8:5449–5467 5455
converted into array, and applied specific preprocessing for with 3000 samples. For every 2DCNN model, the following
the model using keras3 . variations have experimented:
System design
– Optimizers: SGD with a learning rate of 0.001 and 0.01;
Adam with a learning rate of 0.001 and 0.01.
Figure 4 shows the system design consisting of three phases:
– Dropout layer: There are two variations, one without any
first, data collection, second is, data preprocessing, and the
dropout and one with 0.2 dropouts.
last phase is training and testing of the model. Thus finding
the best model with their configuration, we have explained
the first two phases in previous sections ‘Data Preparation’ For VGG16 and InceptionV3 transfer learning model, the
and ’Data Preprocessing’ briefly. In the third phase, we following variations have experimented:
implemented CNN and two transfer learning models, VGG16
and InceptionV3 and compared them on different variations
– Optimizers: SGD with a learning rate of 0.001 and 0.01;
and configurations by altering the hyperparameter’s value.
Adam with a learning rate of 0.001 and 0.01.
The outcomes of these models are discussed briefly in fourth
– Dropout layer: There are three variations, one without
section.
any dropout and one with 0.2 and the other with 0.50
Every model is run in Google Colab using Keras and
dropout.
Python. Firstly, we run six different models all based on the
CNN methodology using 1000 images dataset and the models
are: 2DCNN, VGG16, VGG19 [33], InceptionV3, Inception- For every model, the results contain every major variable
ResNetV2 [35] and Xception [34]. We select the best two to analyse the model. The similarities in all the models are
transfer learning models (i) VGG16 and (ii) InceptionV3 out that the output layers contain a softmax activation function
of the five transfer learning models based on the accuracy and and two output neurons, which gives the probability of the
complexity of the model. The other transfer learning mod- image class with cross-entropy loss. The weights of pre-
els yielded similar accuracy. However, they needed higher trained transfer learning are freezed by making all layers
resources for execution. Hence, to save the resources and untrainable because it was already trained with a huge cor-
execute the program in less time, the VGG16 and Incep- pus. Every model has been run for 50 epochs with an early
tionV3 models are selected for further experiment. Next, the stopping setting. If the accuracy is not improving continu-
2DCNN, VGG16 and InceptionV3 models are re-executed ously for a fixed number of epochs (we used patience value
as 10), training will stop, and the corresponding weights are
3 Keras.applications.inception_v3.preprocess_input. stored.
123
5456 Complex & Intelligent Systems (2022) 8:5449–5467
123
Table 2 Results obtained on different settings with CNN model
CNN layers Optimizer Learning rate Dropout Class 0 Class 1 Accuracy Confusion Matrix [[TP, FN], [FP, TN] ] AUC
P R F1 P R F1
1 SGD 0.001 NA 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0] ,[90 0]] 0.50
0.2 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
0.01 NA 0.64 0.99 0.78 0.00 0.00 0.00 0.54 [[159 1], [90 0]] 0.50
0.2 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
Complex & Intelligent Systems (2022) 8:5449–5467
ADAM 0.001 NA 0.79 0.65 0.71 0.53 0.70 0.60 0.67 [[104 56], [27 63]] 0.67
0.2 0.67 0.98 0.79 0.79 0.12 0.21 0.67 [[157 3], [79 11]] 0.55
0.01 NA 0.68 0.84 0.75 0.52 0.30 0.38 0.65 [[135 25], [63 27]] 0.57
0.2 0.72 0.81 0.76 0.56 0.43 0.49 0.67 [[129 31], [51 39]] 0.62
2 SGD 0.001 NA 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
0.2 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
0.01 NA 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
0.2 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
ADAM 0.001 NA 0.71 0.71 0.71 0.48 0.48 0.48 0.63 [[114 46], [47 43]] 0.60
0.2 0.66 0.89 0.76 0.50 0.19 0.27 0.64 [[143 17], [73 17]] 0.54
0.01 NA 0.64 0.99 0.78 0.60 0.03 0.06 0.64 [[158 2], [87 3]] 0.51
0.2 0.64 0.98 0.78 0.50 0.03 0.06 0.64 [[157 3], [87 3]] 0.51
3 SGD 0.001 NA 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
0.2 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
0.01 NA 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
0.2 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
ADAM 0.001 NA 0.70 0.71 0.70 0.47 0.46 0.46 0.62 [[113 47], [49 41]] 0.58
0.2 0.63 0.93 0.75 0.15 0.02 0.04 0.60 [[149 11], [88 2]] 0.48
0.01 NA 0.65 0.95 0.77 0.50 0.09 0.15 0.64 [[152 8], [82 8]] 0.52
0.2 0.64 1.00 0.78 0.00 0.00 0.00 0.64 [[160 0], [90 0]] 0.50
123
5457
5458
123
Table 3 Results with VGG16 and InceptionV3 transfer Learning model on 1000 samples
Dropout Optimizer Learning rate Class 0 Class 1 Accuracy Confusion Matrix [[TP, FN], [FP, TN] ] AUC
P R F1 P R F1
VGG16
NA SGD 0.001 0.80 0.85 0.83 0.70 0.63 0.67 0.77 [[136 24], [33 57] ] 0.74
0.01 0.81 0.74 0.77 0.60 0.69 0.64 0.72 [[118 42], [28 62] ] 0.71
ADAM 0.001 0.80 0.92 0.85 0.80 0.59 0.68 0.80 [[147 13], [37 53] ] 0.75
0.01 0.79 0.89 0.84 0.74 0.58 0.65 0.78 [[142 18], [38 52] ] 0.73
0.2 SGD 0.001 0.82 0.89 0.85 0.76 0.64 0.70 0.80 [[142 18], [32 58] ] 0.77
0.01 0.75 0.93 0.83 0.77 0.44 0.56 0.75 [[148 12], [50 40] ] 0.69
ADAM 0.001 0.81 0.91 0.86 0.80 0.61 0.69 0.80 [[146 14], [35 55] ] 0.76
0.01 0.77 0.95 0.85 0.85 0.49 0.62 0.78 [[152 8], [46 44] ] 0.72
0.5 SGD 0.001 0.84 0.87 0.86 0.75 0.71 0.73 0.81 [[139 21], [26 64] ] 0.79
0.01 0.71 0.97 0.82 0.84 0.30 0.44 0.73 [[155 5], [63 27] ] 0.63
ADAM 0.001 0.78 0.93 0.85 0.82 0.54 0.65 0.79 [[149 11], [41 49] ] 0.74
0.01 0.81 0.92 0.86 0.81 0.61 0.70 0.81 [[147 13], [35 55] ] 0.77
InceptionV3
NA SGD 0.001 0.80 0.94 0.86 0.84 0.59 0.69 0.81 [[150 10], [37 53] ] 0.76
0.01 0.82 0.90 0.86 0.78 0.64 0.71 0.81 [[144 16], [32 58] ] 0.77
ADAM 0.001 0.82 0.94 0.87 0.85 0.63 0.73 0.83 [[150 10], [33 57] ] 0.79
0.01 0.80 0.88 0.84 0.74 0.62 0.67 0.78 [[140 20], [34 56] ] 0.75
0.2 SGD 0.001 0.80 0.95 0.87 0.87 0.59 0.70 0.82 [[152 8], [37 53] ] 0.77
0.01 0.80 0.93 0.86 0.81 0.58 0.68 0.80 [[148 12], [38 52] ] 0.75
ADAM 0.001 0.81 0.93 0.87 0.82 0.62 0.71 0.82 [[148 12], [34 56] ] 0.77
0.01 0.79 0.94 0.86 0.85 0.57 0.68 0.81 [[151 9], [39 51] ] 0.76
0.5 SGD 0.001 0.80 0.94 0.86 0.84 0.59 0.69 0.81 [[150 10], [37 53] ] 0.76
0.01 0.81 0.92 0.86 0.81 0.61 0.70 0.81 [[147 13], [35 55] ] 0.77
ADAM 0.001 0.81 0.94 0.87 0.85 0.61 0.71 0.82 [[150 10], [35 55] ] 0.77
0.01 0.75 0.94 0.83 0.81 0.43 0.57 0.76 [[151 9], [51 39] ] 0.69
Complex & Intelligent Systems (2022) 8:5449–5467
Complex & Intelligent Systems (2022) 8:5449–5467 5459
model may improve by increasing the total samples. Hence, is not increased compared to the 75:25 data split. A similar
all models are re-experimented with increased dataset where pattern is identified with another transfer learning model,
total number of samples are 3000, with same train-test split InceptionV3. With 90:10 data split, InceptionV3 has almost
ratio. Means, 75% of the total sample used for training pur- the same result as for 75:25 data split (Table 7). Interestingly,
pose, whereas 25% samples used to test the performance of on the increased training dataset, i.e., 90:10 train-test split,
the trained model. the 0.50 dropout has outperformed the 0.2 dropouts, unlike
First, the 2DCNN model are re-experimented with same the case in the 75:25 data split case.
settings and then transfer learning models are experimented. We have compared the outcomes of our proposed model
The outcome of the 2DCNN model is presented in Tables with similar works and shown in Table 8. Limited works have
4 and 5 consists the outcomes of transfer learning models. been found in literature that consider image-based cyberbul-
The CNN model with a single convolution layer achieved lying prediction [3,21,22,24]. Kumari et al. [3,22] proposed
the accuracy and AUC value of 0.51. Also, the recall value two different models, in [3], they used CNN based model
of class 1 is 0.07. The prediction accuracy by the model whereas in [22] traditional ML model was used. Their model
remains the same. After applying the dropout layer also did achieved an F1-score of 0.68, and 0.74 using the CNN and RF
not upgrade the performance. The best outcomes of 2DCNN model in [3], and [22],respectively. The model proposed by
models with 3000 samples are as follows: the precision, [21] was used InceptionV3 and LSTM model and achieved
recall and F1-score for the bully class is 0.66, 0.72, and 0.69, an F1-score value of 0.67. Hossainmardi et al. [24] used
whereas, for non-bully, it is 0.69, 0.63 and 0.66, respectively. traditional ML model and achieved 0.84 F1-score. On the
The AUC value of the model is 0.673, and the accuracy is other hand, the proposed system experimented with the DL-
67%. The AUC value is increased from 0.62 to 0.673, which based 2DCNN model and transfer learning based on VGG16
indicates the improvement of prediction accuracy on larger and InceptionV3. As shown in Table 8, the 2DCNN model
samples. achieved 0.65 F1-score value whereas VGG16 and Incep-
Next, the VGG16 transfer learning model received higher tionV3 achieved 0.86 and 0.87 F1-score value. The outcomes
performance with increased data samples. The best results of of the proposed transfer learning model with the tuned hyper-
VGG16 for a total of 3000 sample is obtained with a dropout parameters settings outperformed the existing research.
value of 0.5. The optimizer is SGD with a learning rate of
0.001. The F1-score for the non-bully class is 0.87, and for the
bully class, the F1-score is 0.86. The accuracy of the model is Discussion
86%, and the AUC value is 0.864, which improved compared
to the previous model, where the AUC value is 0.79 only. One of the main findings of this research is the requirement
Following the pattern of VGG16, InceptionV3 also yielded of a high amount of annotated data for modelling. If the
better performance with increased samples. The InceptionV3 number of training samples is low, then the DL-based mod-
model trained and tested with 3000 images dataset has the els cannot train properly, and hence, the models fail when
best results with 0.89 F1-score of bully class. The model’s unseen data supply for testing. Tables 2 and 4 shows the
accuracy is 89%, and the AUC value is 0.888. The exper- performance of the CNN model with the different number
imental outcomes of 2DCNN and transfer learning models of convolution layer and hyperparameter values on 250 and
confirmed that if the training samples increase, the model 750 samples. The model’s outcomes trained with 2250 sam-
performance will also increase. We have created another ples are better compared to the model trained on 750 samples.
train-test sample with a 90:10 ratio to verify this hypothe- Another finding of this research is a suitable optimizer to han-
sis. In this case, 90% of samples are used for training, and dle the images. Two optimizers with learning rates 0.01 and
the remaining 10% samples are used to test the model per- 0.001 were used in this research and found Adam optimizer
formance. The outcomes of the models with 90:10 train-test is a better option for the 2DCNN model. The outcomes with
split is shown in Tables 6 and 7. different training sample sizes on 2DCNN and transfer learn-
We expected the results of the 2DCNN would improve ing models InceptionV3 and VGG16 confirmed that 2DCNN
with more training data, but it did not happen. The model’s required more samples for training. The dropout values do
best F1-score value is 0.68, the AUC value reduced to 0.696 not affect more in the predictions.
and accuracy reducing to 0.70 (Table 6). Hence, by follow- The experimented transfer learning model performances
ing the outcomes of the 2DCNN model concerning different are comparable in different settings. The outcomes of the
settings of the training samples (Tables 2, 4, 6), it can be said VGG16 and InceptionV3 models are shown in Tables 5 and
that the 2DCNN model is unable to capture the patterns of 7. The best outcomes were achieved with transfer learning
the dataset properly. models trained with 90% of the sample. The performance
The VGG16 model with a 90:10 data split has almost the of the transfer learning model has minimum variation; how-
same result as for 75:25 data split. The model’s performance ever, the 2DCNN model’s performance changes with a large
123
5460
123
Table 4 Results of CNN models on different settings with 3000 samples
CNN layer Optimizer Learning rate Dropout Class 0 Class 1 Accuracy Confusion Matrix [[TP, FN], [FP, TN] ] AUC
P R F1 P R F1
1 SGD 0.001 NA 0.50 0.95 0.66 0.58 0.07 0.12 0.51 [[354 18] [353 25]] 0.51
0.2 0.51 0.97 0.66 0.68 0.07 0.13 0.52 [[359 13] [350 28]] 0.52
0.01 NA 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
0.2 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
ADAM 0.001 NA 0.65 0.76 0.70 0.71 0.60 0.65 0.68 [[281 91] [152 226]] 0.677
0.2 0.69 0.63 0.66 0.66 0.72 0.69 0.67 [[233 139] [106 272]] 0.673
0.01 NA 0.65 0.56 0.60 0.62 0.70 0.66 0.63 [[207 165] [113 265]] 0.629
0.2 0.63 0.68 0.65 0.66 0.61 0.63 0.64 [[253 119] [149 229]] 0.643
2 SGD 0.001 NA 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
0.2 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
0.01 NA 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
0.2 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
ADAM 0.001 NA 0.61 0.63 0.62 0.62 0.61 0.61 0.62 [[234 138] [149 229]] 0.617
0.2 0.60 0.73 0.66 0.66 0.52 0.58 0.62 [[271 101] [183 195]] 0.622
0.01 NA 0.2 0.01 0.02 0.50 0.97 0.66 0.49 [[3 369] [12 366]] 0.488
0.2 0.50 0.98 0.66 0.67 0.04 0.07 0.51 [[365 7] [364 14]] 0.509
3 SGD 0.001 NA 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
0.2 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
0.01 NA 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
0.2 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
ADAM 0.001 NA 0.58 0.57 0.58 0.58 0.59 0.59 0.58 [[213 159] [154 224]] 0.583
0.2 0.66 0.59 0.62 0.63 0.70 0.67 0.65 [[219 153] [112 266]] 0.646
0.01 NA 0.58 0.72 0.64 0.64 0.50 0.56 0.61 [[266 106] [190 188]] 0.606
0.2 0.50 1.00 0.66 0.00 0.00 0.00 0.50 [[372 0] [378 0]] 0.50
Complex & Intelligent Systems (2022) 8:5449–5467
Table 5 Performance with VGG16 and InceptionV3 transfer learning with 3000 samples
Dropout Optimizer Learning rate Class 0 Class 1 Accuracy Confusion Matrix [[TP, FN], [FP, TN] ] AUC
P R F1 P R F1
VGG16
NA SGD 0.001 0.83 0.90 0.86 0.89 0.82 0.85 0.86 [[333 39] [67 311]] 0.859
0.01 0.78 0.90 0.84 0.89 0.75 0.81 0.83 [[336 36] [94 284]] 0.827
ADAM 0.001 0.82 0.88 0.85 0.87 0.81 0.84 0.85 [[327 45] [ 70 308]] 0.847
0.01 0.83 0.76 0.8 0.78 0.85 0.81 0.81 [[284 88] [ 58 320]] 0.805
Complex & Intelligent Systems (2022) 8:5449–5467
0.2 SGD 0.001 0.84 0.88 0.86 0.87 0.83 0.85 0.85 [[326 46] [64 314]] 0.854
0.01 0.75 0.95 0.84 0.93 0.70 0.8 0.82 [[353 19] [115 263]] 0.822
ADAM 0.001 0.86 0.8 0.83 0.81 0.87 0.84 0.83 [[297 75] [50 328]] 0.833
0.01 0.67 0.97 0.79 0.95 0.52 0.68 0.75 [[362 10] [180 198]] 0.749
0.5 SGD 0.001 0.84 0.89 0.87 0.89 0.84 0.86 0.86 [[331 41] [61 317]] 0.864
0.01 0.62 0.98 0.76 0.95 0.4 0.57 0.69 [[364 8] [225 153]] 0.692
ADAM 0.001 0.82 0.90 0.86 0.89 0.8 0.85 0.85 [[336 36] [75 303]] 0.852
0.01 0.78 0.82 0.8 0.82 0.77 0.79 0.8 [[306 66] [86 292]] 0.797
InceptionV3
NA SGD 0.001 0.87 0.89 0.88 0.89 0.87 0.88 0.88 [[332 40] [50 328]] 0.88
0.01 0.86 0.91 0.88 0.90 0.85 0.88 0.88 [[337 35] [55 323]] 0.88
ADAM 0.001 0.85 0.88 0.86 0.88 0.85 0.86 0.86 [[326 46] [56 322]] 0.864
0.01 0.86 0.89 0.88 0.89 0.86 0.87 0.87 [[330 42] [52 326]] 0.875
0.2 SGD 0.001 0.87 0.91 0.89 0.91 0.86 0.89 0.89 [[340 32] [ 52 326]] 0.888
0.01 0.88 0.87 0.87 0.87 0.88 0.88 0.88 [[324 48] [ 45 333]] 0.876
ADAM 0.001 0.89 0.82 0.85 0.83 0.90 0.86 0.86 [[304 68] [39 339]] 0.857
0.01 0.83 0.92 0.87 0.92 0.81 0.86 0.87 [[344 28] [72 306]] 0.867
0.5 SGD 0.001 0.86 0.91 0.89 0.91 0.85 0.88 0.88 [[339 33] [55 323]] 0.883
0.01 0.83 0.93 0.88 0.92 0.81 0.86 0.87 [[347 25] [73 305]] 0.87
ADAM 0.001 0.85 0.90 0.87 0.89 0.85 0.87 0.87 [[334 38] [58 320]] 0.872
0.01 0.79 0.90 0.84 0.88 0.77 0.82 0.83 [[333 39] [87 291]] 0.833
123
5461
5462
123
Table 6 Results of CNN models with 90:10 train-test split
CNN layer Optimizer Learning rate Dropout Class 0 Class 1 Accuracy Confusion Matrix [[TP, FN], [FP, TN] ] AUC
P R F1 P R F1
1 SGD 0.001 NA 0.53 0.95 0.68 0.65 0.09 0.16 0.54 [[148 7] [132 13]] 0.522
0.2 0.53 0.94 0.68 0.62 0.11 0.19 0.54 [[145 10] [129 16]] 0.523
0.01 NA 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0] [145 0]] 0.50
0.2 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0] [145 0]] 0.50
ADAM 0.001 NA 0.67 0.81 0.74 0.74 0.58 0.65 0.70 [[126 29][61 84]] 0.696
0.2 0.64 0.77 0.70 0.69 0.54 0.60 0.66 [[120 35] [67 78]] 0.656
0.01 NA 0.64 0.47 0.54 0.56 0.72 0.63 0.59 [[73 82] [41 104]] 0.594
0.2 0.70 0.72 0.71 0.69 0.68 0.68 0.70 [[111 44] [47 98]] 0.696
2 SGD 0.001 NA 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0] [145 0]] 0.50
0.2 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0] [145 0]] 0.50
0.01 NA 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0] [145 0]] 0.50
0.2 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0] [145 0]] 0.50
ADAM 0.001 NA 0.67 0.69 0.68 0.66 0.63 0.65 0.66 [[107 48] [53 92]] 0.662
0.2 0.62 0.71 0.66 0.64 0.54 0.59 0.63 [[110 45] [66 79]] 0.627
0.01 NA 0.53 0.98 0.69 0.80 0.08 0.15 0.55 [[152 3] [13312]] 0.532
0.2 0.52 0.99 0.68 0.75 0.02 0.04 0.52 [[154 1] [1423]] 0.507
3 SGD 0.001 NA 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0] [145 0]] 0.50
0.2 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0] [145 0]] 0.50
0.01 NA 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0] [1450]] 0.50
0.2 0.52 1.00 0.68 0.00 0.00 0.00 0.52 [[155 0][145 0]] 0.50
ADAM 0.001 NA 0.61 0.65 0.63 0.60 0.56 0.58 0.60 [[100 55] [64 81]] 0.602
0.2 0.55 0.92 0.69 0.68 0.18 0.28 0.56 [[143 12] [119 26]] 0.551
0.01 NA 0.55 0.94 0.69 0.72 0.18 0.29 0.57 [[145 10] [11926]] 0.557
0.2 0.58 0.56 0.57 0.54 0.56 0.55 0.56 [[8768] [64 81]] 0.56
Complex & Intelligent Systems (2022) 8:5449–5467
Table 7 Results of transfer learning approach with 90:10 train-test split
Dropout Optimizer Learning rate Class 0 Class 1 Accuracy Confusion Matrix [[TP, FN], [FP, TN] ] AUC
P R F1 P R F1
VGG16
NA SGD 0.001 0.85 0.86 0.85 0.85 0.83 0.84 0.85 [[133 22] [24 121]] 0.846
0.01 0.81 0.90 0.85 0.88 0.78 0.82 0.84 [[139 16] [32 113]] 0.838
ADAM 0.001 0.84 0.87 0.86 0.86 0.83 0.84 0.85 [[135 20] [25 120]] 0.849
0.01 0.88 0.80 0.84 0.81 0.88 0.84 0.84 [[124 31] [17 128]] 0.841
Complex & Intelligent Systems (2022) 8:5449–5467
0.2 SGD 0.001 0.85 0.85 0.85 0.84 0.83 0.84 0.84 [[132 23] [24 121]] 0.843
0.01 0.77 0.91 0.83 0.88 0.71 0.79 0.81 [[141 14] [42 103]] 0.81
ADAM 0.001 0.85 0.85 0.85 0.84 0.84 0.84 0.85 [[132 23] [23 122]] 0.847
0.01 0.92 0.60 0.73 0.69 0.94 0.80 0.77 [[93 62] [8 137]] 0.772
0.5 SGD 0.001 0.86 0.85 0.85 0.84 0.85 0.85 0.85 [[132 23] [22 123]] 0.85
0.01 0.92 0.50 0.65 0.64 0.95 0.77 0.72 [[78 77] [7 138]] 0.728
ADAM 0.001 0.85 0.88 0.86 0.86 0.83 0.85 0.86 [[136 19] [24 121]] 0.856
0.01 0.78 0.94 0.85 0.91 0.72 0.80 0.83 [[145 10] [41 104]] 0.826
InceptionV3
NA SGD 0.001 0.85 0.88 0.86 0.87 0.83 0.85 0.86 [[137 18] [25 120]] 0.856
0.01 0.87 0.85 0.86 0.85 0.87 0.86 0.86 [[132 23] [19 126]] 0.86
ADAM 0.001 0.82 0.87 0.85 0.85 0.80 0.83 0.84 [[135 20] [29 116]] 0.836
0.01 0.92 0.74 0.82 0.77 0.93 0.84 0.83 [[114 41] [10 135]] 0.833
0.2 SGD 0.001 0.88 0.86 0.87 0.86 0.88 0.87 0.87 [[134 21] [18 127]] 0.87
0.01 0.82 0.97 0.89 0.96 0.78 0.86 0.88 [[150 5] [32 113]] 0.874
ADAM 0.001 0.88 0.80 0.84 0.81 0.88 0.84 0.84 [[124 31] [17 128]] 0.841
0.01 0.86 0.88 0.87 0.87 0.84 0.86 0.86 [[137 18] [23 122]] 0.863
0.5 SGD 0.001 0.88 0.88 0.88 0.87 0.88 0.87 0.88 [[136 19] [18 127]] 0.877
0.01 0.88 0.88 0.88 0.87 0.88 0.87 0.88 [[136 19] [18 127]] 0.877
ADAM 0.001 0.87 0.85 0.86 0.85 0.87 0.86 0.86 [[132 23] [19 126]] 0.86
0.01 0.78 0.96 0.86 0.94 0.71 0.81 0.84 [[149 6] [42 103]] 0.836
123
5463
5464 Complex & Intelligent Systems (2022) 8:5449–5467
Table 8 Performance
Source Model Precision Recall F1-score
comparison of the proposed
models with existing research Kumari et al [3] 2DCNN 0.68 0.68 0.68
Gomez et al. [21] InceptionV3 and LSTM – – 0.67
Kumari et al [22] Random Forest 0.74 0.75 0.74
Hosseinmardi et al. [24] 0.85 0.83 0.84
Proposed 2DCNN 0.74 0.58 0.65
VGG16 0.85 0.85 0.86
InceptionV3 0.87 0.88 0.87
margin when the values of the hyperparameters are tuned. and consequently reduce the incidents happening due to this.
VGG16 results have higher variance as compared with Incep- Building an automated model to detect image-based cyber-
tionV3 with the change in hyperparameters. The hypothesis bullying is a complex task and hence requires a large number
is that the more the training data, the better results do not of labelled data for training. Hence, the 2DCNN model is
fit for transfer learning models as the 75:25 data split gives not a better choice; instead, the pre-trained transfer learning
better results than the 90:10. The hypothesis is that the more models like VGG16 and InceptionV3 performed better and
the data, the better results apply to every model, whether hence can be preferred. These models are available in the
transfer learning or simple 2DCNN model, as an increase Keras library and can be tuned as per the requirement by the
in the dataset from 1000 images to 3000 images has shown researchers.
significant improvement in results.
Theoretically, 2DCNN model do not have any inherent
reason to show the variance in the result with the change in
optimizers only. But, this type of variance in result makes it
Conclusion, limitations and future scope
important to look forward to the hyperparameters selection.
Complex problems like cyberbullying, which have various
Transfer learning is beneficial when solving complex prob-
problems embedded in, are difficult to trace with the normal
lems like cyberbullying as it has varied subproblems. These
system. Especially, image-based social cyberbullying post-
models are trained on a large corpus having many classes.
detection is a challenging task. This research explored deep
Hence utilizing the learning capabilities to handle the com-
learning and transfer learning frameworks to find the best-
plex problem is easy with transfer learning. On the other
suited model to predict image-based cyberbullying posts on
hand, a pure CNN model is trained with training samples
social platforms. The deep learning-based 2DCNN has ini-
provided by the users. Hence, their knowledge base is lim-
tially experimented and, by tuning their hyperparameters,
ited to the supplied dataset. If any test sample falls out of
achieved the accuracy value of 69.60%. On the other hand, the
the scope of the training sample, then tough for the model
transfer learning models VGG16 and InceptoionV3 always
to predict their actual class. This may be one reason behind
achieved better prediction accuracy. The VGG16 achieved an
gets better prediction accuracy with transfer learning models
accuracy value of 86% whereas, InceptionV3 achieved 89%
compared to the CNN mode. The epoch wise loss is plotted
accuracy. Hence, the transfer learning models VGG16 and
and shown in Fig. 5. The loss value of the 2DCNN model
InceptoionV3 have an accuracy margin of 16.40 and 19.40%,
is very high as compared to VGG16 and InceptionV3 in all
respectively, compared to the best configured 2DCNN model.
settings. It means the CNN model needed more epochs to
Therefore, it can be concluded that the proposed system
train, and then the loss value may be decreased. On the other
detects most of the image-based cyberbullying posts.
hand, the transfer learning models are pre-trained, and hence
The limitations of the proposed model include the follow-
the loss values are very low at the beginning itself.
ing: (i) it is not considered textual cyberbullying detection,
Cyberbullying is a major issue that has existed on social
which means a post having only text is not a part of this
platforms to date. Many people, especially teenagers, are
research, (ii) combining the image with text has been found in
affected by this. The textual cyberbullying events detection
cyberbullying posts. However, this study is limited to image-
mechanism suggested by the researchers. However, the pro-
oriented cyberbullying detection. Hence, the future scope of
posed model is designed to detect cyberbullying posts having
this research is always open to discussion as it has varied sub-
images. The model can be used as an initial scanner of the
problems. The accuracy achieved by the proposed system
social post. If any posts are predicted as cyberbully, they
was 89%, which can be improved by increasing the training
will be migrated or generate a notification to the sender and
sample size. Also, the other combinations of the models can
receiver to check and report it. This mechanism can help to
opt, and an ensemble system will form to achieve better pre-
reduce the number of bullying posts from social platforms
diction accuracy. The textual part can be considered along
123
Complex & Intelligent Systems (2022) 8:5449–5467 5465
123
5466 Complex & Intelligent Systems (2022) 8:5449–5467
with the image to catch more cyberbullying related posts on 12. Roy PK, Singh JP (2019) Predicting closed questions on commu-
social platforms. nity question answering sites using convolutional neural network.
Neural Comput Appl 32:10555–10572
13. Roy PK (2021) Deep neural network to predict answer votes
Declarations on community question answering sites. Neural Process Lett
53(2):1633–1646
14. Roy PK (2020) Multilayer convolutional neural network to filter
Conflict of interest There is no conflict of interest. low quality content from quora. Neural Process Lett 52(1):805–821
15. Khan MA, Kadry S, Parwekar P, Damaševičius R, Mehmood A,
Open Access This article is licensed under a Creative Commons Khan JA, Naqvi SR (2021) Human gait analysis for osteoarthritis
Attribution 4.0 International License, which permits use, sharing, adap- prediction: a framework of deep learning and kernel extreme learn-
tation, distribution and reproduction in any medium or format, as ing machine. Complex Intell Syst. https://doi.org/10.1007/s40747-
long as you give appropriate credit to the original author(s) and the 020-00244-2
source, provide a link to the Creative Commons licence, and indi- 16. Yu X, Yang T, Lu J, Shen Y, Lu W, Zhu W, Bao Y, Li H, Zhou J
cate if changes were made. The images or other third party material (2021) Deep transfer learning: a novel glucose prediction frame-
in this article are included in the article’s Creative Commons licence, work for new subjects with type 2 diabetes. Complex Intell Syst.
unless indicated otherwise in a credit line to the material. If material https://doi.org/10.1007/s40747-021-00360-7
is not included in the article’s Creative Commons licence and your 17. Li S, Liu B, Li S, Zhu X, Yan Y, Zhang D (2021) A deep
intended use is not permitted by statutory regulation or exceeds the learning-based computer-aided diagnosis method of x-ray images
permitted use, you will need to obtain permission directly from the copy- for bone age assessment. Complex Intell Syst. https://doi.org/10.
right holder. To view a copy of this licence, visit http://creativecomm 1007/s40747-021-00376-z
ons.org/licenses/by/4.0/. 18. Kaur H, Koundal D, Kadyan V, Kaur N, Polat K (2021) Automated
multimodal image fusion for brain tumor detection. J Artif Intell
Syst 3(1):68–82
19. Aggarwal S, Gupta S, Alhudhaif A, Koundal D, Gupta R, Polat K
(2021) Automated COVID-19 detection in chest x-ray images using
References fine-tuned deep learning architectures. Expert Syst 39:e12749
20. Çiğdem A, Çürük E, Eşsiz ES (2019) Automatic detection of cyber-
1. Smith PK, Mahdavi J, Carvalho M, Fisher S, Russell S, Tippett N bullying in formspring.me, myspace and Youtube social networks.
(2008) Cyberbullying: its nature and impact in secondary school Turk J Eng 3(4):168–178
pupils. J Child Psychol Psychiatry 49(4):376–385 21. Gomez R, Gibert J, Gomez L, Karatzas D (2020) Exploring hate
2. Ak Şerife, Özdemir Y, Kuzucu Y (2015) Cybervictimization and speech detection in multimodal publications. In Proceedings of
cyberbullying: the mediating role of anger, don’t anger me! Comput the IEEE/CVF Winter Conference on Applications of Computer
Human Behav 49:437–443 Vision, pp. 1470–1478
3. Kumari K, Singh JP, Dwivedi YK, Rana NP (2020) Towards 22. Kumari K, Singh JP, Dwivedi YK, Rana NP (2021) Multi-modal
cyberbullying-free social media in smart cities: a unified multi- aggression identification using convolutional neural network and
modal approach. Soft Comput 24(15):11059–11070 binary particle swarm optimization. Future Gener Comput Syst
4. Balakrishnan V, Khan S, Arabnia HR (2020) Improving cyber- 118:187–197
bullying detection using twitter users’ psychological features and 23. Sadiq S, Mehmood A, Ullah S, Ahmad M, Choi GS, On B-W (2021)
machine learning. Comput Secur 90:101710 Aggression detection through deep neural model on Twitter. Future
5. Cheng L, Li J, Silva YN, Hall DL, Liu H (2019) Xbully: Cyberbul- Gener Comput Syst 114:120–129
lying detection within a multi-modal context. In Proceedings of the 24. Hosseinmardi H, Rafiq RI, Han R, Lv Q, Mishra S (2016) Predic-
Twelfth ACM International Conference on Web Search and Data tion of cyberbullying incidents in a media-based social network. In
Mining, pp. 339–347 2016 IEEE/ACM International Conference on Advances in Social
6. Bastiaensens S, Vandebosch H, Poels K, Van Cleemput K, DeSmet Networks Analysis and Mining (ASONAM), pp. 186–192, IEEE
A, De Bourdeaudhuij I (2014) Cyberbullying on social network 25. Al-Ajlan MA, Ykhlef M (2018) Deep learning algorithm for cyber-
sites. an experimental study into bystanders’ behavioural inten- bullying detection. Int J Adv Comput Sci Appl 9(9):199–205
tions to help the victim or reinforce the bully. Comput Hum Behav 26. Banerjee V, Telavane J, Gaikwad P, Vartak P (2019) Detection of
31:259–271 cyberbullying using deep neural network. In 2019 5th International
7. López-Vizcaíno MF, Nóvoa FJ, Carneiro V, Cacheda F (2021) Early Conference on Advanced Computing & Communication Systems
detection of cyberbullying on social media networks. Future Gener (ICACCS), pp. 604–607, IEEE
Comput Syst 118:219–229 27. Chen J, Yan S, Wong K-C (2020) Verbal aggression detection on
8. Singh VK, Ghosh S, Jose C (2017) Toward multimodal cyber- twitter comments: convolutional neural network for short-text sen-
bullying detection. In Proceedings of the 2017 CHI Conference timent analysis. Neural Comput Appl 32(15):10809–10818
Extended Abstracts on Human Factors in Computing Systems, 28. Ali WNHW, Mohd M, Fauzi F (2018) Cyberbullying detection: an
pp. 2090–2099 overview. In 2018 Cyber Resilience Conference (CRC), pp. 1–3,
9. Singh VK, Huang Q, Atrey PK (2016) Cyberbullying detec- IEEE
tion using probabilistic socio-textual information fusion. In 2016 29. Bhat S, Koundal D (2021) Multi-focus image fusion using neutro-
IEEE/ACM International Conference on Advances in Social Net- sophic based wavelet transform. Appl Soft Comput 106:107307
works Analysis and Mining (ASONAM), pp. 884–887, IEEE 30. Kamilaris A, Prenafeta-Boldú FX (2018) A review of the use
10. Reynolds K, Kontostathis A, Edwards L (2011) Using machine of convolutional neural networks in agriculture. J Agric Sci
learning to detect cyberbullying. In 10th International Confer- 156(3):312–322
ence on Machine learning and applications and workshops, vol. 2, 31. Udendhran R, Balamurugan M (2021) Towards secure deep learn-
pp. 241–244, IEEE ing architecture for smart farming-based applications. Complex
11. Roy PK, Singh JP, Banerjee S (2020) Deep learning to filter sms Intell Syst 7(2):659–666
spam. Future Gener Comput Syst 102:524–533
123
Complex & Intelligent Systems (2022) 8:5449–5467 5467
32. Xue G, Liu S, Ma Y (2020) A hybrid deep learning-based fruit Publisher’s Note Springer Nature remains neutral with regard to juris-
classification using attention model and convolution autoencoder. dictional claims in published maps and institutional affiliations.
Complex Intell Syst. https://doi.org/10.1007/s40747-020-00192-
x
33. Simonyan K, Zisserman A (2014) Very deep convolu-
tional networks for large-scale image recognition. Preprint at
arXiv:1409.1556,
34. Chollet F (2017) Xception: Deep learning with depthwise separable
convolutions. In Proceedings of the IEEE conference on computer
vision and pattern recognition, pp. 1251–1258
35. Szegedy C, Ioffe S, Vanhoucke V, Alemi AA (2017) Inception-v4,
inception-resnet and the impact of residual connections on learning.
In Thirty-first AAAI conference on artificial intelligence
36. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for
image recognition. In Proceedings of the IEEE conference on com-
puter vision and pattern recognition, pp. 770–778
37. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016)
Rethinking the inception architecture for computer vision. In Pro-
ceedings of the IEEE conference on computer vision and pattern
recognition, pp. 2818–2826
38. Roy PK, Ahmad Z, Singh JP, Alryalat MAA, Rana NP, Dwivedi
YK (2018) Finding and ranking high-quality answers in community
question answering sites. Global J Flex Syst Manag 19(1):53–68
123