RoBERTa-GCN A Novel Approach For Combating Fake News in Bangla Using Advanced Language Processing and Graph Convolutional Networks
RoBERTa-GCN A Novel Approach For Combating Fake News in Bangla Using Advanced Language Processing and Graph Convolutional Networks
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2024.DOI
ABSTRACT In this era of widespread information, combating fake news in less commonly represented
languages like Bengali is a significant challenge. Fake news is a critical issue in Bangla, a language that
a vast population uses but lacks adequate natural language processing tools. To address this, our research
introduces RoBERTa-GCN, a cutting-edge model combining RoBERTa with a graph neural network (GCN)
to accurately identify fake news in Bangladesh. The dataset we utilized comprises articles from 22 prominent
Bangladeshi news portals covering diverse subjects such as politics, sports, economy, and entertainment.
This comprehensive dataset enables the model to learn and adapt to the intricacies of the Bangla language
and its news ecosystem, facilitating effective fake news detection across various content categories. Our
approach integrates the RoBERTa model, adapted for Bangla, with GCN’s expertise in processing relational
data, forming an effective means to differentiate between authentic and fake news. This study’s key
achievement is the creation and application of the RoBERTa-GCN model to the Bangla language, an area
not thoroughly explored in previous research. The findings show that RoBERTa-GCN surpasses existing
methods, achieving impressive accuracy rates of 98.60%, highlighting its capability as a robust model for
preserving news integrity in the digital era, especially for the Bangla-speaking population.
INDEX TERMS Fake news detection, Graph neural network, NLP, Bangla language, Deep learning,
Machine learning, Encoder
I. INTRODUCTION formation, including fake news, within the digital realm [2].
Recent years, the emergence of online news plat- “Fake news” refers to articles that might delude or deceive
IN forms, including social media, news blogs, and on-
line newspapers, has prompted individuals to actively seek
readers by disseminating fabricated information [3]. For the
sake of traffic, some self-media and internet users spread a
and consume news due to their advantages, such as rapid in- significant amount of unverified news that was subsequently
formation dissemination, convenient accessibility, and cost- reprinted and unthinkingly followed to expand and under-
effectiveness [1]. Meanwhile, easy access to social media mine the credibility and authority of mainstream media.
platforms enables the rapid and widespread spread of false in- However, it additionally resulted in economic, political, and
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
other hidden risks [4], [5]. Approximately 265 million people speak Bangla as their na-
The propagation of false news serves to create fear, pro- tive language, making it the 7th most widely spoken language
mote racist ideology, and provoke acts of bullying and vio- globally [23]. The majority of the readers are unable to recog-
lence against innocent individuals [6]. According to intelli- nize fake news with their own knowledge. There are several
gence studies, social media rumors have a particularly long- studies available in English. As far as we know, limited
lasting influence on less clever people, preventing them from resources or computational methods are currently available to
making optimal judgments [7]. address the menace of fake news generated in Bangla, which
Fake news is widely recognized as a significant threat to poses a potential threat to this sizable community. Therefore,
global trade, media, and democracy, carrying severe ramifi- to address these challenges, we propose a custom method of
cations [8]. Around 25% of Americans visited a fraudulent detecting Bangla fake news utilizing the power of RoBERTa
news website six weeks before the 2016 US election. This and GCN. In addition, our research through extensive data
occurrence has been hypothesized as a potential factor that analysis and exploring various methods for fake news detec-
impacted the election’s outcome [9]. A fake news report tion marked by the following key contributions:
claimed the explosion resulted in injuries to US President 1) We propose a custom model named RoBERTa-GCN
Barack Obama, leading to a significant loss of $130 billion for Bangla fake news detection. To the best of our
in the stock market [10]. knowledge, we are the first to integrate RoBERTa and
In addition to having a higher rate of social media penetra- GCN for Bangla fake news detection.
tion, Bangladesh is a South Asian nation plagued by poverty 2) The combination of RoBERTa’s contextual embed-
and rumors. As of the beginning of 2023, Bangladesh’s dings with GCN’s structural learning is an innovative
internet user population stood at 66.94 million, as reported approach that enhances Bangla fake news detection by
by Datareportal [11], [12]. Bangladesh has encountered many leveraging the relationships within news content, an
quantifiable incidences of misinformation in recent years. aspect often overlooked by other models.
Ten individuals were injured, and rioters fatally beat five in 3) We benchmark several SOTA models against Ban-
July 2019 due to widespread rumors regarding the antici- FakeNews and compare them with our proposed
pated human sacrifice during the Padma Bridge construction RoBERTa-GCN, which outperforms all existing mod-
project [13]. An additional case of misinformation resulting els by achieving 98.60% accuracy.
in violence occurred in 2012 when a local Buddhist youth’s The structure of the subsequent sections of this document
Facebook post featuring an image of a burned Quran, the is meticulously designed to facilitate a coherent presentation
holy book of Islam, incited an enraged mob to torch 12 of our research findings. Section II is dedicated to a crit-
Buddhist temples, pillage, and burn over 50 houses in a ical examination of pertinent literature, situating our work
Buddhist community situated in Ramu, Cox’s Bazar [14]. within the broader scholarly context. Section III, we explain
During the COVID-19 pandemic, vaccination skepticism in overall data collection and analysis procedures. In Section
Bangladesh was encouraged by allegations that coronavirus IV, we delineate our proposed methodology, elaborating on
vaccines "contain a microchip" that makes it possible for the innovative approaches and techniques employed in our
Western governments to spy on people. Such unjustified investigation. Section V presents a rigorous analysis of the
conspiracy concepts pose a huge obstacle to the country’s data obtained through the application of advanced method-
public health programs [15]. Another false news stated that ological procedures, offering a comparative evaluation of
just 200,000 vaccinations would be imported for the first the results. Finally, Section VI synthesizes the key insights
installment, with all of this being delivered to the government and discoveries emanating from our study, underscoring their
leaders and members of the ruling party in Bangladesh [16]. significance and implications for the field.
To counter the spread of misinformation, dedicated plat-
forms like PolitiFact [17], FactCheck [18], and JaChai [19] II. RELATED WORKS
manually review and update potential fake news articles The dissemination of false information on online media
in online media, providing detailed explanations and citing platforms has led to a need for clarification among various
logical and factual reasons to debunk the false claims. De- individuals, giving rise to numerous challenges. Several re-
spite their efforts, computational tools have recently been searchers have employed diverse techniques to control the
employed to fight the threat of false news [3]. One effort to spread of fake news.
identify fake news involves using multi-perspective speaker Rai et al. [35] used Long Short-Term Memory (LSTM)
profiles [20]. Conversely, another approach introduces an and bidirectional encoder representation from transformers
automated fact-checking methodology leveraging third-party (BERT) to classify fake news. Their results showed higher
sources to verify the precision of news articles [21]. Ad- accuracy compared to the baseline models on the PolitiFact
ditionally, a methodology has been established to detect and GossipCop datasets. Additionally, Rai et al. employed a
incongruities between Bangla news headlines and body con- vanilla BERT model combined with an LSTM layer to in-
tent [22]. vestigate performance improvements further. Nasir et al. [24]
Distinguishing fake news in Bangla is significantly more showed how to use a hybrid deep learning (DL) model
challenging than in other languages because of its structure. that combines convolutional and recurrent neural networks
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
TABLE 1. Summarizing various methods in the literature for detecting fake news, detailing the datasets used, implemented techniques, and their overall
performances.
(CNN-RNN) to sort fake news. This model did better than datasets for social media.
non-hybrid methods on the ISOT and FA-KES datasets and
To notice fake news, Xia et al. [39] utilized a multi-head
shows promise for future tests that will use it on other
attention model to find major emergencies in a think tank
datasets as well. Alghamdi et al. [33] explored the impact
and public opinion stage and a hybrid CNN, Bi-LSTM, and
of freezing and unfreezing parameters in a neural network
attention mechanism model, which helped to increase metrics
architecture used for BERT and CT-BERT. They found that
values like loss, accuracy, f1-score, and recall. To distin-
models like BiGRU and CT-BERT achieved superior perfor-
guish fake news from SOTA models by adding variational
mance on the COVID-19 fake news dataset. Sudhakar and
Bi-LSTM, autoencoder, and semantic topic-related features,
Kaliyamurthie [36] employed advanced machine learning
Hosseini et al. [40] used LDAVAE, an integrated LDA,
classifiers, specifically Logistic Regression(LR) and Naive
supervised Bi-LSTM VAE probabilistic model where the
Bayes(NB), where the G-power experiment ensures accuracy
model ablation measured the accuracy and performance of
and T-test analysis displays that LR works better than NB
neural embeddings and topic features. Palani and Elango [41]
for independent samples. Choudhary and Arora [37] applied
proposed a hybrid BERT-BiLSTM-CNN (BBC-FND) model
a sequential neural model incorporating an LSTM-based
structure consisting of three primary layers: an embedding
word embedding model. The feature-based sequential model
layer, a feature representation layer, and a classification
performed better than ML-based models and LSTM-based
layer that then captures applicable information, patterns,
word embeddings in less time. Choudhary et al. [31] showed
and global meaning to detect the forecast news validity.
that BerConvoNet, using BERT in the news embedding block
Okunoye and Ibor [42] used genetic search to select neural
with various kernel sizes, effectively distinguishes real from
architecture, utilizing neural architecture and DL techniques
fake news across benchmark datasets, outperforming other
to classify bogus news. Malhotra and Malik [43] evaluated
state-of-the-art (SOTA) models. Khullar and Singh [38] em-
SVM, LR, CatBoost, XGBoost, multinomial, NB, and RF
phasized the need for a distributed client-server architecture
ensemble models, where the deep auto-ViML model and
to protect data, proposing a model named F-FNC, utilizing
passive-aggressive classifier worked well. SVM was the
LSTM, CNN-LSTM, Bi-LSTM, and CNN-BiLSTM algo-
fastest at 0.245 ms. Ghamdi et al. [27] utilized natural lan-
rithms, the model outperformed other distributed DL clas-
guage processing (NLP), BERT, and human programming
sifiers based on different metrics. Raja et al. (raja2023fake)
to detect misinformation in Twitter and website content. A
used transfer learning with mBERT and XLM-R models to
labeled dataset in English, Arabic, and Urdu was used to
effectively detect fake news in the Dravidian Fake dataset,
highlight the efficiency of their models in detecting false
achieving high accuracy in multilingual settings. Similarly,
material. Mohawesh et al. [44] proposed a semantic tech-
Kaliyar et al. (kaliyar2020fndnet) developed FNDNet, a
nique combining pre-trained word2vec vectors with a capsule
CNN-based model that identifies biased features to classify
neural network to enhance multilingual fake news detection.
fake news, showing improved performance on benchmark
This approach surpassed existing methods by incorporating
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
relational variables extracted from the text. Dixit et al. [34] III. DATASET: BANFAKENEWS
showed the four-step false news detection models: data pre- In this paper, we utilized a dataset named "BanFakeNews",
processing, feature reduction, feature extraction, and LSTM- which was specially introduced for developing models for
LF classification. It worked better than previous ways of classifying fake and real news. "BanFakeNews" was collected
finding fake news on Buzzfeed, GossipCop, ISOT, and Poli- from 22 most prominent online news sites in Bangladesh,
tifact datasets. Jain et al. [45] utilized a resilient DL model which was used to acquire false news data for our research.
to detect false information assertions using associated em- Ensuring the dataset’s diversity involved selecting news sto-
bedding, attention methods, and pertinent metadata to show ries from various fields, such as sports, politics, economics,
that it worked effectively across real datasets. In contrast and the environment. Here is a synopsis of all the news stories
to CNN, Palani et al. [46] used the BERT model to pull that were part of our data collection:
out written parts that kept their meaning and came up with
the Capsule network (CapsNet) algorithm, which helped • kalerkantho.com • bd24live.com
combine useful data to make a lot of data representations • jagonews24.com • risingbd.com
for accurate fake news detection. Ravish et al. [47] suggested • banglanews24.com • dailyjanakantha.com
using a multi-layered Principal Component Analysis (PCA) • banglatribune.com • bd-pratidin.com
feature selection method along with Multiclass Support Vec- • jugantor.com • channelionline.com
tor Machines (MSVMs) for classification. This improved • dhakatimes24.com • samakal.com
model accuracy across ten datasets, especially when more • ittefaq.com.bd • independent24.com
characteristics were needed to validate the feature extrac- • somoynews.tv • rtnn.net
tion methods. Hossain et al. [3] studied and tested cutting- • dailynayadiganta.com • bangla.thereport24.com
edge models such as BERT, CNN, Bi-LSTM, LR, and RF. • bangla.bdnews24.com • mzamin.com
These models used common language variables and neural • prothomalo.com • bhorerkagoj.net
networks to make low-resource language research more ad-
vanced. To detect fake news classification, Hasib et al. [48] Table 2 presents the framework of a dataset specifically
used DT, SGD, BERT, CNN, and ANN models, where BERT created for identifying fake news in the Bengali language.
worked better than any other models and achieved maximum The dataset consists of columns representing the article ID,
accuracy in balanced datasets and good performance through domain, publication date, category, title, text, and a label
cross-validation with varied K values. Pranto et al. [29] indicating the authenticity of the piece. Every column has
experimented with ML models, among which BERT was a distinct function for analysis. The bar chart in Figure 1
the most prominent automatic detection method for Bengali represents a dataset breakdown for noticing fake news in
false news classification. When BERT was applied to Bengali Bangla. Each bar corresponds to a different news domain,
Facebook posts related to the COVID-19 dataset, 10 topics and the length of the bar reflects the number of articles from
were identified and grouped into three groups: system, belief, that domain included in the dataset. Here’s an analysis based
and social. Roman et al. [30] enhanced the Bangla Fake News on the visualization:
Dataset by addressing the imbalance, using web scraping TABLE 2. Features of the fake news dataset used in this study.
and Google Translate, achieving the highest F1-score via
Columns Descriptions
the BERT model, surpassing the BanFakeNews dataset’s articleID Unique identifier for each article
performance. Ali et al. [32] conducted a feasibility study on domain The domain in which the publication was made
detecting Bangla false news from social media, employing date Date of publication of the article
diverse feature extraction methods and ML algorithms, ulti- category Category or genre of the article
headline Headline or title of the article
mately achieving optimal accuracy using LSTM in their pro- content Main content or body of the article
posed system. Anjum et al. [25] used a Random Forest(RF) label Label indicating the authenticity of the article
classifier to distinguish bogus news from a combination of
false and legitimate sources, with an accuracy of 82%. It
included more than 300 articles. Islam et al. [26] employed a Domain Diversity:The dataset comprises articles from di-
data mining approach to classify Bengali fake news in South verse domains. This diversity is beneficial for building a
Asia, analyzing 726 articles, and found RF to archive an robust fake news detection mechanism, allowing the model
accuracy rate of 85%. Bhattacharjee et al. [49] introduced to learn from various writing styles and content types.
BanglaBERT and BanglishBERT and achieved results in Volume of Articles: The domains contribute a varying num-
NLU for Bangla, along with new datasets. ber of articles, with kalerkantho.com and jagonews24.com
providing the largest amount. In contrast, bhorerkagoj.net
As highlighted in the literature review and Table 1, numer- and mzamin.com contribute the least.
ous models have shown potential for fake news detection, but Data Representation: The dataset reflects the diversity
most are tailored for English. This paper addresses this gap of domains and the proportionality of fake and real news
by proposing a specialized model for detecting fake news in within each domain to prevent overfitting to specific domain
the Bangla language. characteristics that are not related to news authenticity.
4 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
FIGURE 3. Polar Bar chart visualizing frequency of top words used in FIGURE 4. Word cloud of the dataset we used for RoBERTa-GCN.
Bangla language fake news articles
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
FIGURE 5. Workflow for preparing bangla text data for analysis, featuring cleaning, feature extraction, and various analytical methods.
6) Pesudo Algorithm for Bangla Fake News Data fake. If an article is predicted to be fake, it is flagged, and
Preprossessing relevant stakeholders are notified. This automated process
streamlines the task of identifying misinformation in Bangla
The Algorithm 1 describes a process for preprocessing the news articles. Table 3 outlines the symbols and their mean-
dataset for training our proposed RoBERTa-GCN. It begins ings used in Algorithm 1.
by fetching articles from Bangla news sources. These articles
are preprocessed, which includes normalization, tokeniza-
tion, noise removal, etc., to clean the text. Next, features are IV. PROPOSED: ROBERTA-GCN
extracted using NLP techniques like BOW and TF-IDF. A In NLP and misinformation detection, we present
pre-trained model is then loaded to analyze these features. ‘RoBERTa-GCN,’ an innovative Convolutional (CNN)
Furthermore the pseudocode shows the pipeline for assessing model to detect false information in the Bangla language.
each article and determines whether the content is real or This model is a fusion of RoBERTa’s [54] enhanced lan-
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
TABLE 3. Defination of the symbols used in Algorithm 1. TABLE 4. Defination of the symbols used in Algorithm 2.
8 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
FIGURE 6. A convoluted working flow demonstrates a Bangla fake news detection model.
vector, embeddings from RoBERTa [54] for each word or encapsulating the relationships between them. The GCN [55]
subword token are first obtained. Then, a pooling operation layer utilizes graph convolution to gather data from neighbor-
(e.g., mean pooling or max pooling) is applied across these ing nodes. The operation for the l-th layer of GCN is defined
embeddings. This pooling process reduces the embeddings to as follows:
a single vector that captures the essence of the entire article, H (l+1) = σ ÃH (l) W (l) (2)
providing a holistic representation that serves as input for
the GCN. The graph is constructed by treating the unified
In the given context, Ã denotes the normalized adjacency
representation vector as a feature of a single node or multiple
matrix, H (l) represents the node feature matrix at layer l, and
nodes (if further segmenting the article). Edges are defined
W (l) signifies the weight matrix for layer l
based on the relationships between these nodes, which could
be determined through similarity measures, contextual re-
lationships, or predefined rules capturing the structural and 3) Dynamic Adaptation Mechanism (D)
semantic connections within the text. This model is designed The dynamic component adjusts the GCN weights over time
specifically to tackle the challenges of detecting fake news in or other changing factors, represented by:
Bangla, leveraging both language understanding and graph-
based learning. W (l) (t) = g(W (l) , t) (3)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
Where Ltotal represents the total loss function, Loriginal is b: Sentence Encoder (RoBERTa)
the original loss function, such as cross-entropy The sentence encoding function by RoBERTa transforms the
PL for classi-
fication. λ is the regularization coefficient. l=1 ∥W (l) ∥2F sentence into embedding vectors:
denotes the sum of the squared Frobenius norms of the
SE(c) = (ec1 , ec2 , . . . , ecn ) (10)
weight matrices W (l) at each layer l, which is a common
regularization term to prevent overfitting.
c: Word Representation
Each word from the vocabulary is transformed into a dense
6) Integration of RoBERTa-GCN vector via an embedding matrix:
The complete "RoBERTa-GCN" model, incorporating all the
components, can be formally represented as: ewi = Evwi (11)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
FIGURE 7. Overview of a comprehensive Bangla news detection framework. The framework includes preprocessing of Bangla text datasets, fact-checking
entities, detecting text patterns, various machine learning models, and evaluation metrics to ensure robust fake news detection.
are carried out in a Google Colab setting using a powerful ogy to ensure the credibility of information. It initiates source
GPU backend with 52 GB of system RAM. The experimental verification to authenticate the origin. Claim extraction iso-
setup utilizes an Intel processor as the system model. The lates the main assertions for scrutiny. Evidence retrieval
investigations included using the Python 3 programming then gathers data supporting or refuting the claims. Stance
language to train and test the efficacy of the models. The detection assesses the evidence’s position relative to the
evaluation measures included in the experiments include claim, followed by claim validation against established facts.
accuracy, precision, recall, F1-score, and time complexity, Cross-referencing with trusted sources further corroborates
which are utilized to evaluate the performance of machine findings, while logical inconsistency identification spots con-
learning models. tradictions. Data and statistical verification quantitatively an-
alyze the plausibility. Finally, machine learning-based truth
1) Modeling Approach prediction algorithms synthesize these steps, predicting the
Figure 7 outlines a comprehensive approach for detecting likelihood of the information being factual.
fake news in Bangla. It begins with preprocessing a dataset Detect Text Pattern: The Detect Text Pattern step in Bangla
of Bangla news articles, and then a fact-checking entity runs fake news analysis is a sophisticated process designed to
various checks like source verification, evidence retrieval, identify distinctive linguistic patterns that are commonly as-
and stance detection. Both traditional ML models and Neural sociated with misinformation. This method involves compu-
Network (NN) models are used for analysis. The process tational analysis of the text structure, seeking out anomalies
also includes checking for sensationalism and propaganda or irregularities in the use of language. It encompasses the
techniques. Finally, an evaluation matrix with metrics like identification of stylistic features, narrative frameworks, and
recall, accuracy, and the F1-score assesses the performance discourse constructs that diverge from standard journalistic
of the models. This structured framework is essential for de- practices. By leveraging natural language processing tailored
veloping effective tools to combat misinformation in Bangla to the Bangla language, this step is pivotal in automat-
news. ing the recognition of fabricated content, streamlining the
fact-checking process in the expansive and diverse Bangla-
2) Hyperparameter Tuning speaking media landscape.
Hyperparameter tuning is a critical phase in ML where the Model Tuning. In addressing Bangla fake news, a suite of
ideal hyperparameters for a specific model are determined. traditional machine learning models is deployed to discern
This process entails experimenting with various hyperparam- patterns indicative of misinformation. These models include
eter combinations and assessing the model’s effectiveness K-Nearest Neighbors(KNN) [56], RF [57], AdaBoost [58],
using a test set. The goal of hyperparameter tuning is to Support Vector Machines [59], Stochastic Gradient De-
identify the hyperparameters that yield the highest perfor- scent [60], XGBoost [61], Decision Trees [62], Bagging [63],
mance on the test set. This step is vital as it can enhance Gradient Boosting(GB) [64], and pipelines integrating with
the effectiveness of the ML methods. Table 5 shows the set . Each algorithm brings unique strengths in handling the
of hyperparameters we utilized in our experiments. complexities of Bangla text data, from probabilistic output
Fact-Checking Entity. The Fact-Checking Entity for to ensemble methods like RF and GB that capitalize on
Bangla fake news operates through a systematic methodol- collective decision-making to improve prediction accuracy.
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
TABLE 5. The set of Hyperparameter settings used across different models for optimization.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
TABLE 6. Performance comparison of traditional Machine Learning algorithms based on various performance metrics such as Accuracy (ACC), Precision
(PRE), Recall (REC), Matthews Correlation Coefficient (MCC), Hinge Loss (HL), Sensitivity(SEN), Specificity (SPE), Positive Predictive Value (PPV),
Negative Predictive Value (NPV) without employing any classification thresholds.
conditions, respectively, yi is the true label, wadv and at 0.982 and precision at 0.958, highlighting its capability to
wcln are weights for adversarial and clean condition identify true positives, while Random Forest (RF) led to a
performances.ri is the resilience factor for each sample, negative predictive value of 0.926.
I is the indicator function. λ is a regularization parame- Our extensive evaluations of the same models for different
ter, MCR is the Misclassification Rate. classification thresholds are presented in Table 7. At the 0.05
• Probability of Attack Success: In the Probability of threshold, Random Forest showed superior results with the
Attack Success (Succ) in Bangla Fake News Detection, highest accuracy of 0.992, MCC of 0.908, and specificity of
where the system categorizes news into two classes 0.965. XGBClassifier led the recall at 0.998 and exhibited
(’Fake News’ and ’Authentic News’), we can consider the lowest hinge loss of 0.018. At the 0.1 threshold, K-
a scenario where the system is under adversarial attack. Nearest Neighbors stood out with the highest accuracy of
The aim would be to measure how often such attacks 0.966, recall of 0.996, and sensitivity of 0.996, making it
successfully deceive the system. This metric is particu- the most reliable for detecting true positives. The Decision
larly relevant in the context of robust machine learning, Tree maintained its strength in MCC at 0.915 and specificity
where models must resist adversarial examples designed at 0.942, indicating robust classification capabilities. Support
to cause misclassification. The complex equation for Vector Machine models also performed well, particularly in
Succ is defined as follows: precision and positive predictive value. Overall, K-Nearest
Neighbors, Random Forest, and Decision Tree models con-
1 X
N sistently ranked among the top performers, with specific
si · I ŷiadv ̸= yi · patk
Succ = i accuracy, precision, and recall strengths, making them suit-
N i=1
(25) able for different aspects of fake news classification tasks
cln atk depending on the specific metric requirements.
+ (1 − si ) · I ŷi ̸= yi · (1 − pi )
where, N is the total number of samples. si is the B. RESULTS OF DEEP LEARNING MODELS
sensitivity factor for each sample. ŷiadv and ŷicln are the A comparative analysis of the performance metrics for var-
predicted labels under adversarial and clean conditions, ious deep learning models, including BERT + BiLSTM,
respectively. yi is the true label. patki represents the BERT + BiGRU, XLM-RoBERTa, LSTM, CNN, CNN-
probability of an attack on each sample. I[·] is the LSTM, and the proposed RoBERTa-GCN model is presented
indicator function. in Table 8. Among the models, the proposed RoBERTa-
GCN achieves the highest accuracy, with a score of 0.986,
V. EXPERIMENTAL RESULTS demonstrating its superior ability to classify Bangla news
The following sections discuss our experimental results and articles as authentic or fake correctly. It also outperforms the
compare different traditional Machine learning and Deep other models in precision, achieving 0.972, Matthews corre-
learning models with our proposed RoBERTa-GCN. lation coefficient with a score of 0.957, sensitivity at 0.986,
and negative predictive value reaching 0.978. These results
A. RESULTS OF MACHINE LEARNING MODELS indicate its robustness and reliability in detecting fake news.
The performance evaluation of various machine learning Additionally, RoBERTa-GCN exhibits the lowest Hamming
models for fake news classification, as shown in Tables 6 loss at 0.017 and specificity at 0.985, further confirming its
and 7, reveals distinct strengths across different metrics. In effectiveness in minimizing classification errors.
the analysis without classification thresholds as presented in Compared to traditional models like BERT + BiLSTM and
Table 6, K-Nearest Neighbors emerged as the top performer CNN-LSTM, which achieved accuracies of 0.944 and 0.913,
with the highest accuracy of 0.954, MCC of 0.632, and respectively, RoBERTa-GCN’s performance is significantly
specificity of 0.871. The decision Tree excelled in the recall higher. This result highlights the advantage of integrating
14 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
TABLE 7. Comparative of various machine learning models based on different performance metrics for the classification thresholds of 0.05 and 0.1.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
FIGURE 8. Comparison of proposed RoBERTa-GCN with other models based on different performance metrics.
D. EXPLAINABLITY OF ROBERTA-GCN as the first one, with different words highlighted. This
Figure 10 depicts the output of a Local Interpretable Model- suggests that those words have a significant impact on
agnostic Explanations (LIME) analysis applied to our pro- the model’s prediction of authenticity.
posed RoBERTa-GCN. It helps to explain the predictions of 3) The third text is predicted with absolute certainty
our RoBERTa-GCN model by highlighting the features, in (probability 1.00) to be fake, and no probability is
this case, words that contribute most to the model’s predic- assigned to it being authentic. Highlighted words in
tion. Based on the Figure 10 we can get some insights as the text are the features that the model found most
follows: indicative of it being fake.
1) The first text has a prediction probability of 0.92 for The right side of the image, where the texts with high-
being authentic and 0.08 for being fake. Certain words lighted words are shown, illustrates how LIME provides local
are highlighted, which likely influenced the model’s interpretability. For each prediction, it points out the specific
decision to classify it as authentic. words that have the highest weight in the model’s decision-
2) The second text has the same prediction probabilities making process. This is crucial for understanding why the
model makes certain decisions and can be used to improve
the model by, for example, adjusting feature weightings or
by providing more training data to reduce biases.
Moreover, the LIME analysis reveals that our proposed
RoBERTa-GCN model does not rely solely on superficial
cues or common phrases but rather on a nuanced under-
standing of the text’s context, enabled by the integration of
RoBERTa’s language understanding and GCN’s relational
reasoning. This interpretability is crucial for validating the
model’s reliability, especially in sensitive applications like
fake news detection, where understanding the rationale be-
hind a classification is as important as the classification itself.
VI. CONCLUSION
This paper represents a pivotal stride in combating the es-
FIGURE 9. The obtained ROC curve of our proposed RoBERTa-GCN calating problem of misinformation in the digital age, par-
model. ticularly within the context of the Bengali language. As the
16 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
FIGURE 10. LIME analysis for Bangla fake news detection, highlighting words influencing model predictions. Blue indicates words supporting
authenticity; orange indicates words supporting fake news.
digital ecosystem becomes increasingly saturated with de- includes a new model design and a deep understanding of
ceptive content, the need for sophisticated, language-specific regional linguistic traits, opens the door for more research in
solutions has never been more pressing. Addressing this this area and shows how important customized AI solutions
need, the research introduces the RoBERTa-GCN model, a are for keeping information safe in a world that is becoming
fusion of RoBERTa’s advanced natural language processing more and more connected.
capabilities and GCN’s adeptness at managing relational
data. This model is meticulously tailored to grasp the intrica- A. LIMITATIONS AND FUTURE SCOPES
cies of the Bangla language, encompassing its rich linguistic
Although RoBERTa-GCN is good at identifying Bangla fake
nuances and cultural contexts. Trained on a comprehensive
news, it has certain shortcomings that might make it less
dataset collected from diverse Bangladeshi news sources, the
applicable to other languages and training datasets. A poten-
RoBERTa-GCN model is rigorously optimized to distinguish
tial future direction for improving the model’s usefulness is
authentic news from fabricated narratives effectively. Its
to investigate ways to include other South Asian languages.
performance, benchmarked against several baseline models,
In addition, there is potential for modifying RoBERTa-GCN
demonstrates a marked improvement, affirming its potential
to identify more nuanced types of disinformation, such as
as a critical tool in the arsenal against misinformation.
biased journalism or satire. To further explore the model’s
The significance of the RoBERTa-GCN model transcends
efficacy across other social media platforms and include
its technical achievements; it addresses a crucial gap in the
real-time data analysis, more studies might be conducted to
field of fake news detection for languages that have tradi-
expand its practical applicability.
tionally been underrepresented in global NLP research. By
leveraging state-of-the-art NLP techniques and adapting to
the unique challenges posed by the Bangla language, the ACKNOWLEDGEMENT
model not only sets a new standard for fake news detection We would like to acknowledge Dr. Sami Azam and Dr. Asif
but also underscores the importance of incorporating lin- Karim from the Faculty of Science and Technology, Charles
guistic and cultural idiosyncrasies into AI-driven solutions. Darwin University, Casuarina, NT 0909, Australia, for their
The model’s evaluation, employing metrics like accuracy, assistance in securing funding and their thorough review of
precision, recall, and F1-score, attests to its robustness and this paper.
reliability in real-world settings. Furthermore, the use of
advanced coherence measures like UCI, PMI, and NPMI REFERENCES
ensures that the model’s performance is both consistent and
[1] M. H. Goldani, R. Safabakhsh, and S. Momtazi, “Convolutional neural
dependable, making it a formidable tool against the spread network with margin loss for fake news detection,” Information Processing
of misinformation. The study’s all-around approach, which & Management, vol. 58, no. 1, p. 102418, 2021.
VOLUME 4, 2016 17
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
[2] M. Narra, M. Umer, S. Sadiq, H. Karamti, A. Mohamed, I. Ashraf, et al., International Conference on Computing Communication and Networking
“Selective feature sets based fake news detection for covid-19 to manage Technologies (ICCCNT), pp. 1–5, IEEE, 2021.
infodemic,” IEEE Access, vol. 10, pp. 98724–98736, 2022. [26] F. Islam, M. M. Alam, S. S. Hossain, A. Motaleb, S. Yeasmin, M. Hasan,
[3] M. Z. Hossain, M. A. Rahman, M. S. Islam, and S. Kar, “Banfak- and R. M. Rahman, “Bengali fake news detection,” in 2020 IEEE 10th
enews: A dataset for detecting fake news in bangla,” arXiv preprint International Conference on Intelligent Systems (IS), pp. 281–287, IEEE,
arXiv:2004.08789, 2020. 2020.
[4] D. Wang, W. Zhang, W. Wu, and X. Guo, “Soft-label for multi-domain [27] M. A. Al Ghamdi, M. S. Bhatti, A. Saeed, Z. Gillani, and S. H. Almotiri,
fake news detection,” IEEE Access, 2023. “A fusion of bert, machine learning and manual approach for fake news
[5] T. Jiang, J. P. Li, A. U. Haq, A. Saboor, and A. Ali, “A novel stacking detection,” Multimedia Tools and Applications, pp. 1–18, 2023.
approach for accurate detection of fake news,” IEEE Access, vol. 9, [28] E. Raja, B. Soni, and S. K. Borgohain, “Fake news detection in dravidian
pp. 22626–22639, 2021. languages using transfer learning with adaptive finetuning,” Engineering
[6] M. G. Hussain, M. R. Hasan, M. Rahman, J. Protim, and S. Al Hasan, Applications of Artificial Intelligence, vol. 126, p. 106877, 2023.
“Detection of bangla fake news using mnb and svm classifier,” in 2020 [29] P. B. Pranto, S. Z.-U.-H. Navid, P. Dey, G. Uddin, and A. Iqbal, “Are you
International Conference on Computing, Electronics & Communications misinformed? a study of covid-related fake news in bengali on facebook,”
Engineering (iCCECE), pp. 81–85, IEEE, 2020. arXiv preprint arXiv:2203.11669, 2022.
[7] A. Roets et al., “‘fake news’: Incorrect, but hard to correct. the role of [30] S. Rohman, J. Ferdous, S. M. R. Ullah, and M. A. Rahman, “Ibfnd: An
cognitive ability on the impact of false information on social impressions,” improved dataset for bangla fake news detection and comparative analysis
Intelligence, vol. 65, pp. 107–110, 2017. of performance of baseline models,” in 2023 International Conference on
[8] H. Saleh, A. Alharbi, and S. H. Alsamhi, “Opcnn-fake: Optimized con- Next-Generation Computing, IoT and Machine Learning (NCIM), pp. 1–6,
volutional neural network for fake news detection,” IEEE Access, vol. 9, IEEE, 2023.
pp. 129471–129489, 2021. [31] M. Choudhary, S. S. Chouhan, E. S. Pilli, and S. K. Vipparthi, “Bercon-
[9] E. Grave, P. Bojanowski, P. Gupta, A. Joulin, and T. Mikolov, “Learning vonet: A deep learning framework for fake news classification,” Applied
word vectors for 157 languages,” arXiv preprint arXiv:1802.06893, 2018. Soft Computing, vol. 110, p. 107614, 2021.
[10] S. Vosoughi, D. Roy, and S. Aral, “The spread of true and false news [32] M. A. Ali, M. L. Matubber, V. Sharma, and B. Balamurugan, “An improved
online,” science, vol. 359, no. 6380, pp. 1146–1151, 2018. and efficient technique for detecting bengali fake news using machine
[11] S. KEMP, “Digital 2023: Bangladesh — datareportal – global digital in- learning algorithms,” in 2022 6th International Conference On Computing,
sights.” https://datareportal.com/reports/digital-2023-bangladesh, Feb.13, Communication, Control And Automation (ICCUBEA, pp. 1–4, IEEE,
2023. (Accessed on 12/21/2023). 2022.
[33] J. Alghamdi, Y. Lin, and S. Luo, “Towards covid-19 fake news detection
[12] M. Hossain, “Nothing but fb. why bangladeshis never
using transformer-based models,” Knowledge-Based Systems, vol. 274,
took to twitter, threads and the like | the business
p. 110642, 2023.
standard.” https://www.tbsnews.net/features/panorama/
[34] D. K. Dixit, A. Bhagat, and D. Dangi, “Automating fake news detection
nothing-fb-why-bangladeshis-never-took-twitter-threads-and-667186,
using ppca and levy flight-based lstm,” Soft Computing, vol. 26, no. 22,
July.18, 2023. (Accessed on 11/15/2023).
pp. 12545–12557, 2022.
[13] S. Report, “Mobs beat 2 dead for ‘kidnapping’ | daily star.” https://www.
[35] N. Rai, D. Kumar, N. Kaushik, C. Raj, and A. Ali, “Fake news classifica-
thedailystar.net/frontpage/news/mobs-beat-2-dead-kidnapping-1774471,
tion using transformer based enhanced lstm and bert,” International Journal
July .21, 2019. (Accessed on 12/07/2023).
of Cognitive Computing in Engineering, vol. 3, pp. 98–105, 2022.
[14] I. Ahmed and J. A. Manik, “A hazy picture appears | the daily star.” https:
[36] M. Sudhakar and K. Kaliyamurthie, “Effective prediction of fake news
//www.thedailystar.net/news-detail-252212, Oct .3, 2012. (Accessed on
using two machine learning algorithms,” Measurement: Sensors, vol. 24,
12/02/2023).
p. 100495, 2022.
[15] R. Rafe, “Misinformation mars bangladesh vaccina-
[37] A. Choudhary and A. Arora, “Linguistic feature based learning model for
tion drive – dw – 01/27/2021.” https://www.dw.com/en/
fake news detection and classification,” Expert Systems with Applications,
covid-bangladesh-vaccination-drive-marred-by-misinformation/
vol. 169, p. 114171, 2021.
a-56360529, Jan .27, 2021. (Accessed on 01/05/2024).
[38] V. Khullar and H. P. Singh, “f-fnc: Privacy concerned efficient federated
[16] T. Report, “Busting the top 3 fake news of the week | approach for fake news classification,” Information Sciences, vol. 639,
the business standard.” https://www.tbsnews.net/thoughts/ p. 119017, 2023.
busting-top-3-fake-news-week-173236, Dec .18, 2020. (Accessed [39] H. Xia, Y. Wang, J. Z. Zhang, L. J. Zheng, M. M. Kamal, and V. Arya,
on 01/10/2024). “Covid-19 fake news detection: A hybrid cnn-bilstm-am model,” Techno-
[17] “Politifact.” https://www.politifact.com/. (Accessed on 01/25/2024). logical Forecasting and Social Change, vol. 195, p. 122746, 2023.
[18] “Factcheck.org - a project of the annenberg public policy center.” https: [40] M. Hosseini, A. J. Sabet, S. He, and D. Aguiar, “Interpretable fake news
//www.factcheck.org/. (Accessed on 01/09/2024). detection with topic and deep variational models,” Online Social Networks
[19] “Verification - fact check bangladesh. fact-check bangladesh.” https:// and Media, vol. 36, p. 100249, 2023.
www.jachai.org/. (Accessed on 01/20/2024). [41] B. Palani and S. Elango, “Bbc-fnd: An ensemble of deep learning frame-
[20] Y. Long, Q. Lu, R. Xiang, M. Li, and C.-R. Huang, “Fake news detection work for textual fake news detection,” Computers and Electrical Engineer-
through multi-perspective speaker profiles,” in Proceedings of the eighth ing, vol. 110, p. 108866, 2023.
international joint conference on natural language processing (volume 2: [42] O. B. Okunoye and A. E. Ibor, “Hybrid fake news detection technique with
Short papers), pp. 252–256, 2017. genetic search and deep learning,” Computers and Electrical Engineering,
[21] G. Karadzhov, P. Nakov, L. Màrquez, A. Barrón-Cedeño, and I. Koychev, vol. 103, p. 108344, 2022.
“Fully automated fact checking using external sources,” arXiv preprint [43] P. Malhotra and S. K. Malik, “Fake news detection using ensemble
arXiv:1710.00341, 2017. techniques,” Multimedia Tools and Applications, 2023.
[22] M. A. Haque Palash, A. Khan, K. Islam, M. A. Al Nasim, and R. M. [44] R. Mohawesh, S. Maqsood, and Q. Althebyan, “Multilingual deep learning
Bin Shahjahan, “Incongruity detection between bangla news headline and framework for fake news detection using capsule neural network,” Journal
body content through graph neural network,” in The Fourth Industrial of Intelligent Information Systems, pp. 1–17, 2023.
Revolution and Beyond: Select Proceedings of IC4IR+, pp. 375–387, [45] V. Jain, R. K. Kaliyar, A. Goswami, P. Narang, and Y. Sharma, “Aenet:
Springer, 2023. an attention-enabled neural architecture for fake news detection using
[23] R. Kawser, “Bangla ranked at 7th among 100 most spoken contextual features,” Neural Computing and Applications, vol. 34, no. 1,
languages worldwide.” https://www.dhakatribune.com/world/201648/ pp. 771–782, 2022.
bangla-ranked-at-7th-among-100-most-spoken, Feb .17, 2020. [46] B. Palani, S. Elango, and V. Viswanathan K, “Cb-fake: A multimodal deep
(Accessed on 01/01/2024). learning framework for automatic fake news detection using capsule neural
[24] J. A. Nasir, O. S. Khan, and I. Varlamis, “Fake news detection: A hybrid network and bert,” Multimedia Tools and Applications, vol. 81, no. 4,
cnn-rnn based deep learning approach,” International Journal of Informa- pp. 5587–5620, 2022.
tion Management Data Insights, vol. 1, no. 1, p. 100007, 2021. [47] R. Katarya, D. Dahiya, S. Checker, et al., “Fake news detection system us-
[25] A. Anjum, M. Keya, A. K. M. Masum, and S. R. H. Noori, “Fake ing featured-based optimized msvm classification,” IEEE Access, vol. 10,
and authentic news detection using social data strivings,” in 2021 12th pp. 113184–113199, 2022.
18 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
[48] K. M. Hasib, N. A. Towhid, K. O. Faruk, J. Al Mahmud, and M. Mridha, MEJBAH AHAMMAD received his BSc in com-
“Strategies for enhancing the performance of news article classification in puter science and engineering with the major of
bangla: Handling imbalance and interpretation,” Engineering Applications computer vision and his MSc in computer science
of Artificial Intelligence, vol. 125, p. 106688, 2023. with the major of artificial intelligence from Amer-
[49] A. Bhattacharjee, T. Hasan, W. U. Ahmad, K. Samin, M. S. Islam, A. Iqbal, ican International University, Bangladesh. He is
M. S. Rahman, and R. Shahriyar, “Banglabert: Language model pretrain- currently the CEO of Bytes of Intelligence, New
ing and benchmarks for low-resource language understanding evaluation York, USA, and works as a deep learning and
in bangla,” arXiv preprint arXiv:2101.00204, 2021.
AI-specialized consultant and instructor at aiQuest
[50] W. A. Qader, M. M. Ameen, and B. I. Ahmed, “An overview of bag of
Intelligence, Dhaka, Bangladesh. He has authored
words; importance, implementation, applications, and challenges,” in 2019
international engineering conference (IEC), pp. 200–204, IEEE, 2019. over 11 research papers. He has the Best Research
[51] J. Ramos et al., “Using tf-idf to determine word relevance in document Award from the International Conference ICETET-SIP-22.
queries,” in Proceedings of the first instructional conference on machine
learning, vol. 242, pp. 29–48, Citeseer, 2003.
[52] Z. Yin and Y. Shen, “On the dimensionality of word embedding,” Ad-
vances in neural information processing systems, vol. 31, 2018.
[53] G. Sidorov, F. Velasquez, E. Stamatatos, A. Gelbukh, and L. Chanona-
Hernández, “Syntactic n-grams as machine learning features for natural
language processing,” Expert Systems with Applications, vol. 41, no. 3, AL SANI received his B.Sc. degree in Indus-
pp. 853–860, 2014. trial and Production Engineering from Jashore
[54] Y. Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, University of Science and Technology (JUST),
L. Zettlemoyer, and V. Stoyanov, “Roberta: A robustly optimized bert Jashore–7408, Bangladesh. His current research
pretraining approach,” arXiv preprint arXiv:1907.11692, 2019. interests include artificial intelligence (AI), ma-
[55] T. N. Kipf and M. Welling, “Semi-supervised classification with graph chine learning (ML), and deep learning (DL),
convolutional networks,” arXiv preprint arXiv:1609.02907, 2016. specifically image processing, computer vision,
[56] L. E. Peterson, “K-nearest neighbor,” Scholarpedia, vol. 4, no. 2, p. 1883. and natural language processing. Additionally, he
[57] L. Breiman, “Random forests,” Machine learning, vol. 45, pp. 5–32, 2001. is passionate about exploring the integration of
[58] Y. Freund and R. E. Schapire, “A decision-theoretic generalization of on- industrial engineering principles with AI technolo-
line learning and an application to boosting,” Journal of computer and
gies to optimize manufacturing processes, enhance production efficiency,
system sciences, vol. 55, no. 1, pp. 119–139, 1997.
and innovate smart industrial systems. His academic and research career
[59] C. Cortes and V. Vapnik, “Support-vector networks,” Machine learning,
vol. 20, pp. 273–297, 1995. demonstrates a deep fascination with the intersection of engineering and
[60] L. Bottou, “Stochastic gradient descent tricks,” in Neural Networks: Tricks cutting-edge technologies, emphasizing an interdisciplinary approach. AL
of the Trade: Second Edition, pp. 421–436, Springer, 2012. SANI is eager to contribute to the advancement of intelligent industrial
[61] T. Chen and C. Guestrin, “Xgboost: A scalable tree boosting system,” in systems through groundbreaking research and innovative solutions.
Proceedings of the 22nd acm sigkdd international conference on knowl-
edge discovery and data mining, pp. 785–794, 2016.
[62] W.-Y. Loh, “Classification and regression trees,” Wiley interdisciplinary
reviews: data mining and knowledge discovery, vol. 1, no. 1, pp. 14–23,
2011.
[63] L. Breiman, “Bagging predictors,” Machine learning, vol. 24, pp. 123–140,
1996.
[64] J. H. Friedman, “Greedy function approximation: a gradient boosting KHALILUR RAHMAN received a B.Sc. degree
machine,” Annals of statistics, pp. 1189–1232, 2001. in Computer Science and Engineering from Sylhet
[65] A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, Engineering College, Sylhet-3100, Bangladesh.
F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Un- His research interest in Deep learning, Ma-
supervised cross-lingual representation learning at scale,” arXiv preprint chine learning, and Artificial intelligence. He has
arXiv:1911.02116, 2019. worked on various projects in the field of appli-
[66] M. Sundermeyer, R. Schlüter, and H. Ney, “Lstm neural networks for cation of machine learning. His focus lies specifi-
language modeling.,” in Interspeech, vol. 2012, pp. 194–197, 2012.
cally in Computational Bioinformatics, Computer
[67] K. W. Church, “Word2vec,” Natural Language Engineering, vol. 23, no. 1,
Vision, and Image Processing.
pp. 155–162, 2017.
[68] J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for
word representation,” in Proceedings of the 2014 conference on empirical
methods in natural language processing (EMNLP), pp. 1532–1543, 2014.
[69] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov,
“Fasttext. zip: Compressing text classification models,” arXiv preprint
arXiv:1612.03651, 2016.
[70] R. K. Kaliyar, A. Goswami, P. Narang, and S. Sinha, “Fndnet–a deep
convolutional neural network for fake news detection,” Cognitive Systems MD TANVIR ISLAM is a graduate student pursu-
Research, vol. 61, pp. 32–44, 2020. ing an M.Sc. in Computer Science and Engineer-
ing at Sungkyunkwan University (SKKU), South
Korea. Specializing in AI-driven image process-
ing, his research has led to publications in top-
tier journals and conferences such as "Alexan-
dria Engineering Journal" and "ACM Multime-
dia". He has experience as a Research Assistant
at "VIS2KNOW Lab", focusing on computer vi-
sion, deep learning, and machine learning. He is
passionate about applying advanced computational techniques to solve real-
world challenges and has been actively involved in academic and research
communities.
VOLUME 4, 2016 19
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2024.3457860
M. Ahammad et al.: RoBERTa-GCN: A Novel Approach for Combating Fake News in Bangla Using Advanced Language Processing
20 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/