0% found this document useful (0 votes)
44 views8 pages

A Survey On Bias Detection in Online News Using Deep Learning

This document presents a comprehensive survey on bias detection in online news using deep learning techniques, focusing on methodologies, datasets, and future research directions. It discusses various algorithms like mBERT, LSTM, and CNN, highlighting their effectiveness in identifying biases in news articles and headlines. The paper emphasizes the importance of automated bias detection for improving journalism and providing balanced news consumption tools for the public.

Uploaded by

ishuvijay88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views8 pages

A Survey On Bias Detection in Online News Using Deep Learning

This document presents a comprehensive survey on bias detection in online news using deep learning techniques, focusing on methodologies, datasets, and future research directions. It discusses various algorithms like mBERT, LSTM, and CNN, highlighting their effectiveness in identifying biases in news articles and headlines. The paper emphasizes the importance of automated bias detection for improving journalism and providing balanced news consumption tools for the public.

Uploaded by

ishuvijay88
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)

IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

A Survey on Bias Detection in Online


News using Deep Learning
Khushi Rakhecha Muskan Agrawal

Department of Computer Science and Engineering Depart ment of Computer Science and Engineering
Delhi T echnological University, Delhi, India Delhi T echnological University, Delhi, India
[email protected] [email protected]
2023 2nd International Conference on Applied Artificial Intelligence and Computing (ICAAIC) | 978-1-6654-5630-2/23/$31.00 ©2023 IEEE | DOI: 10.1109/ICAAIC56838.2023.10140917

Simran Rauniyar Aruna Bhatt

Department of Computer Science and Engineering Department of Computer Science and Engineering
Delhi T echnological University, Delhi, India Delhi T echnological University, Delhi, India
[email protected] [email protected]

Abstract— The detection of bias in online news has on the bias in various news sources. It can aid in the creation
become a critical and sensitive area of research in recent of tools for news consumers to lessen any negative effects
years, largely due to the growing use of online platforms, of media bias on them which today can be performed by
such as social media, and the proliferation of news
using various deep learning and machine learning based
sources in digital format. This article provides a
comprehensive review of existing studies on online bias architectures using mBERT, LSTM, CNN, etc. The article
detection using natural language processing, including includes models which are being implemented to enhance
an analysis of the methodologies employed, an overview the quality of the news and detect biases in the news. A
of available datasets, and suggestions for further Neural Network model, Headline attention network is used
research in this field. The article examines techniques to increase the accuracy of bias identification in comparison
such as data pre-processing, feature extraction, to baseline model LSTM. With the increase of news in
classification, and prediction in detail. Various deep
various languages, detection of the bias became extremely
learning algorithms, such as BERT and Long Short-
Term Memory (LSTM), as well as machine learning difficult with the existing models and therefore the mBERT
algorithms, such as logistic regression, Recursive Neural model is implemented. The paper also reviews additional
Network models, and Naive Bayes, can be used to detect models that are applied to improve the model's accuracy.
bias in news headlines and articles. The article concludes Automatic bias detection for journalists can enhance their
by discussing the potential impact of bias detection on writing through more unbiased reporting. It might enable
journalism and society, as well as future research
balanced search for news aggregator programs like Google
directions.
News, similar to what is available on All Sides . The survey
Keywords—Bias Detection, Deep Learning, Natural includes five sections where section 1 gives introduction of
Language Processing, Long S hort-Term Memory, Machine the paper. Section 2 gives a brief introduction about various
Learning technologies used in the research. In Section 3, numerous
I. INTRODUCTION data sets examined by researchers are described, along with
an overview of recent work, as well as various approaches
The core purpose of news is to inform people about current
of feature extraction. This section also talks about the
affairs and developing global challenges. The intent of news
different algorithms that are implemented to achieve best
is to let the public know about the events, issues and
accuracy. This research study explores the outcomes of
different happenings that are unfolding everywhere in the
numerous models and provides a comparative analysis of
world. Today, with increasing acceptance of extensive
the techniques in section 4. This research study is concluded
social media platforms it has gathered more than millions of
in section 5 by briefly outlining the future direction of study.
users of different age groups and from all over the world,
with this extensive access of news which are available on II. BACKGROUND
online websites, users are able to browse and access news A. Deep Learning (DL)
data more quickly and easily. These websites are required to Deep learning uses artificial neural networks, a branch of
provide accurate, unbiased news that includes facts. On machine learning, to model and resolve complicated issues.
several levels, identifying and maybe reducing media bias in AI enables machines to learn from massive volumes of data
the news is important for society. Automated bias detection and carry out operations that ordinarily demand for human
can help policy regulators and related entities keep an eye intellect, like picture recognition, interpretation of natural

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 396


Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 29,2024 at 09:30:29 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

language, and decision-making. Each layer in the neural language and identify any potentially biased content. To do
network's topology extracts increasingly abstract this, the LSTM network can be trained using a sizable
information from the incoming data. A network can learn corpus of biased news items. The network can then be used
more intricate and complex properties the more layers it to analyze new articles and identify any language patterns
has.Some very effective applications of deep learning can be that may be associated with biased reporting. LSTM can be
found in computer vision, speech recognition, and natural a useful tool for bias detection in news, as it is able to
language processing. It can therefore be quite useful in identify subtle patterns in language use that may be difficult
identifying and reducing bias; nevertheless, it necessitates for humans to detect. By using LSTM, it is possible to
careful consideration of a number of variables, such as the analyze large quantities of news articles and identify any
selection of fairness metrics, data preparation, model potentially biased content, which can help to promote more
architecture, and others. The analysis of news articles, fair and balanced reporting.
headlines, and other media information to find potential
biases is a common practice in news bias detection.

B. Natural Language Processing (NLP)

NLP is a field that combines linguistics and artificial


intelligence to enable communication and interaction
between computers and human beings. It entails the creation
of methods and processes that allow machines to
Fig.2 Long Short-Term M emory [6]
comprehend, decipher, and produce data in natural
language. This encompasses activities like text D. Multilingual BERT (mBERT)
summarization, sentiment analysis, named entity
recognition, speech recognition, language translation, and mBERT is a variant of the BERT model that has been pre-
more. To evaluate and process natural language data, NLP trained on a large corpus of text data from multiple
algorithms often combine statistical models, machine languages. It is designed to understand the complexities of
learning strategies, and linguistic rules .NLP can help detect different languages and LSTMs have shown proficiency in
several natural language processing tasks, spanning various
and quantify bias in natural language data. Analytical
languages. It is pre-trained on text data from over 100
methods, including sentiment analysis, can be applied to languages, which allows it to learn the commonalities and
evaluate an article's tone and subjective bias. Additionally, differences between languages and develop a deep
these techniques can facilitate the detection of potential understanding of multilingual text. It is trained using a
sources of bias in machine learning models or datasets and multilingual language model, which enables it to perform
aid in the development of strategies to reduce or eliminate tasks such as language classification. One of the advantages
of mBERT is its ability to perform cross -lingual transfer
such biases.
learning. This means that the model can be trained on a
particular task or dataset in one language and then applied to
a similar task or dataset in another language, without the
need for additional training or fine-tuning. Overall, mBERT
is a powerful tool for natural language processing across
different languages, and LSTM technology has numerous
applications in various fields, including but not limited to
machine translation, sentiment analysis, and text
classification..

Fig.1 Natural Language Processing [5]

C. Long Short-Term Memory (LSTM)

The LSTM architecture was developed to overcome the


issue of gradient descent in standard RNNs , which is a Fig.3 mBERT[13]
problem that occurs when gradients (which are used to
update the network's parameters during training) become so E. Convolutional Neural Network (CNN)
small that they effectively vanish, this challenge makes it
challenging for the network to acquire knowledge of long -
term dependencies. LSTM can be used to analyze the

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 397


Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 29,2024 at 09:30:29 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

CNNs are a type of deep neural network that are commonly headline, description, date and time, URL of the
employed to analyze data with a grid based formation, such article, full article, article type short, mid, long
as pictures. They typically consist of an input layer, values to show the length of the article
convolution layer, max-pooling layer, and fully connected ● News Category Dataset [9] - This is an open source
layer, also known as a dense layer, are some of the layers of that contains around 2 lakh news headlines from
a CNN that are described in [Fig . 2]. An image is entered 2012 to 2022 . Each record consists of these
into the input layer. The output of the convolution layer features : Editorial article belongs to which group,
relies on the kernel or filter value, which is then fed into the headline, authors, Link, short description, date. This
subsequent layers. The pooling layer is utilized to enhance dataset has a total of 42 categories in which it is
processing speed and reduce dimensionality. categorized like politics, wellness, entertainment,
travel, style and beauty and many more.

● Telugu News Articles [10] -This dataset contains


~20k news articles collected from Eenadu news
website. Initially, the dataset was cleaned. It
consists of a training set and a testing set that you
may use to assess your Telugu Binary classification
models. The iNLTK library has made use of this
dataset.
● Nepal News Dataset [11] - This Kaggle dataset
contained 510 lines of Nepali texts. The headlines
with links preceding them were scraped from 24
news websites of Nepal during the time of election
Fig.4 Convolutional Neural Network[18] since headlines give the gist of the election talk. The
headlines, with links in most cases, were harvested
for a quick viewing of the kinds of election talk the
online news media were carrying as the campaign
III. EXISTING METHODOLOGY picked up steam ahead of Nepal's first federal and
provincial election less than a month away.
This paper reviews some of the latest work done in the field ● Hindi Text Short Summarization Corpus [12]: This
of detecting online bias using various machine learning and open-source dataset consists of a compilation of
articles and their corresponding headlines obtained
deep learning approaches (such as BERT, LSTM, XLM -
from news websites.As it is the first Hindi dataset,
Roberta, SVM, Logistic Regression, and Random Forest),
making it suitable for evaluating text summarization
and compares some readily accessible datasets. models. It contains a total of 330k articles which
includes One line heading for the article and
A. Detailed study of some readily accessible datasets summary of the article in Hindi.

Huge amounts of meaningful data can enhance the analysis TABLE 1


and put a stop to issues such as overfitting and erroneous
predictions.However, collecting accurate data and labeling it SUMMARY OF THE DATASET
properly can be difficult. As a result, numerous researchers
have created standardized datasets. This section provides a Dataset Class Description
comprehensive analysis of several of these publicly
[7] News Headlines Headlines:3.6
available datasets.
Million

● India News Headlines Dataset [7]: This open-source


database of India news headlines comprises over News Headlines Headlines: 20,000
[8]
3.6M events that were written by the Times of
and articles with 9 categories
India. Based on one or more features, this dataset
can be categorized.It contains information on
notable occurrences in the Indian Subcontinent
[9] News Headlines Headlines: 210k
during the previous 21 years. (From January 2001
through the third quarter of 2012). with 42 categories
● News Articles Dataset from Indian Express [8] –
This is a dataset of news articles from Indian
Express. It contains 20,000 news headlines with the [10] News Articles Articles :20,000 in
description of the news Article from Indian telugu
Express. These articles are from August 11, 2019 to
June 8, 2020 obtained from Indian Express. It
consists of 9 columns which includes article id,

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 398


Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 29,2024 at 09:30:29 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

significantly improves accuracy. The "Headline Attention


[11] News Headlines Headlines: 510 in
Network" model beats all other models because it is
Nepali
effective at identifying key terms that introduce bias into a
text. This model outperforms previous best baseline method
LSTM by 4.22%, with accuracy of 85.25% for Headline
[12] News Headlines Headlines: 330k Attention Network without attention layer and 89.54% for
and articles in Hindi Headline Attention Network.

B. Review of the recent work

Another study [2] proposed neural network architecture, the


Headline Attention Network [fig. 5] which aims to capture
the key elements of news articles that lead to political
prejudice by paying attention to the headlines ; it contains a
headline encoder, article encoder and a headline attention
layer. The model uses an attention mechanism for extracting
politically biased terms and forming a vector representation
v. The word of the headline of an article is embedded into a
vector through embedding matrix and to get contextual
encoding of headline in both the direction it uses bi- Fig.6 Headline Attention Network [2]
directional LSTM (forward LSTM, backward LSTM). The
headline of the article is encoded by concatenating forward
representation and backward representation as the following In another study [3] has used two deep learning
equation architectures (i) Long Short-Term Memory networks
(LSTMs) (ii) Bidirectional Encoder Representations from
Transformers (BERT).The evaluation of this model involves
analyzing the effects of de-biasing and integrating media-
level representation, where in de-biasing process involves
incorporating an adversarial domain classifier on labeled
images and implementing media-classifier, and then the
losses of both the media classifier and the label predictor is
minimized and applies to the complete dataset and using
Triplet loss training approach where the encoder is pre-
Similarly, in Article encoder it used bi-directional LSTM to trained with triplet loss and triplets used to train the model
obtain annotations for the words by condensing data in both consist of a positive example, an anchor example, and a
directions and concatenating forward and backward negative example as shown in fig 8. During pre-training, the
representation to get annotation of word. The mod el is parameters of the encoder and the softmax classifier are
implemented on a dataset created by authors comprising adjusted to focus on the primary job of predicting the
1329 news articles from Telugu newspapers. The dataset political ideology of articles while reducing the cross -
includes the article's title, body copy, and the political group entropy loss. Then, for incorporating media-level
it is biased against. If it is objective, it is labeled "None". representation the model has fine-tuned the hyperparameters
Each item in the dataset was annotated by four annotators of both models by performing a guided grid search trial on
with one of the five parties —the BJP, TDP, Congress, TRS, the validation set while keeping the seeds for random weight
or YCP—or with None if the news was objective. initialization fixed. Similarly for the Long Short-Term
Memory model, the optimal outcome was achieved using an
input of 512 tokens and for Bert Model, the input length was
varied with learning and gradient clipping Value, using a
512-token input gave the best result for this Model.

The model was trained on 4 Titan X Pascal GPUs and used


a dataset consisting of 34,737 articles published by 73 news
media outlets, covering over 106 topics created by authors.
Fig 5: News article from the dataset. Bias towards ”TDP”[2] The authors have used Allsides to create the dataset as it
promotes news stories from all political perspectives on
each trending event or subject. All the pieces were
It has been observed that only taking into account the downloaded, along with their political ideologies (left,
headlines forecasts bias with a similar degree of precision to center, or right), designated topics, media types in which
taking into account the entire piece. It can be seen that they appeared, authors, and publication dates. Articles that
solely concatenating headlines does not significantly aid in
bias prediction; instead, reading articles with title depiction

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 399


Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 29,2024 at 09:30:29 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

were carefully chosen and marked and are thus indicative of


the actual political landscape can be found in the dataset.

Fig.9 Statistics of the dataset used[4]

Fig.7 Statistics about the dataset[3] A multilingual version of BERT that has been trained on
100 languages including Hindi known as XLM-RoBERTa
was used .To prepare the input for the transformer network,
When either of these methods is applied, the results in a [SEP] token was used, and the standard cross entropy loss
Table 2 demonstrate considerable improvements when was used to train the model.In order to improve the model's
compared to the baseline approach without de-biasing, TLP ability to identify bias on the dataset the advantage of the
showed an increase of 14.12 points in accuracy and 12.73 model's pre-training in Hindi is taken.A ML approach SVM
points in macro-F1. The use of triplet loss to mitigate source is also used to transform the textual data into a set of
bias and incorporation of media-level representation from features using TF-IDF. Using the Radial Basis Function, the
Twitter followers resulted in 30.51 and 28.76 absolute classification model is applied to the changed features.
improvements in macro F1 compared to the baseline Twitter
bios + Article with TLP vs Article (baseline) on the K (x, x′) = exp (−γ∥x − x′∥2)
challenging media-based split, respectively.
Here γ is a free parameter. A count matrix is created, and
the model is trained using TF-IDF. once the matrix has been
normalized .The dataset was tested using both the
approaches and yielded the following validation accuracy
scores: 80.2% for mBERT , 83% for XLM-RoBERTa,
79.2% for XLM-RoBERTa (Hindi), 78.9 % for IndicBERT,
78.7% for SVM, 77.1% for Logistic Regression , 3: 78.7%
for Random Forest.And It was observed that XLM -
RoBERTa a multilingual deep learning models perform
better than the machine learning approaches since the results
obtained were 83% accuracy, 72.1% MCC and 76.4% F1-
score.The main issue was that the models could not tell if
the news stories and headlines were neutral to the BJP.

Another study [1] proposed the usage of deep learning


models such as Multilayer Perceptron to ease out the issue
of bias detection in political information.The dataset used in
this model is taken from The Ideological Books Corpus
which consists of 4,062 sentences annotated for political
ideology at a sub-sentential level. In this model, FastText,
an unsupervised learning approach created by Facebook's
Fig.8 An example Triple used for de-biasing [3] AI Re- search (FAIR) team to construct vector
representations for words, was utilized to initialize the word
The study [4] proposed use of 4 baseline Deep Learning embedding matrix. The Multilayer Perceptron classifier in
model: mBERT, XLM-RoBERTa, XLM-RoBERTa (Hindi), this project receives word and sentence matrix
IndicBERT, and 3 baseline Machine learning model: SVM, representations and outputs the anticipated political
Logistic Regression, Random Forest to detect political bias ideology. Using the train test split() function, about 75% of
and its type. The dataset was created by collecting 8000 the IBC were randomly selected for training.The training
hindi news articles and headlines from 4 sources and module receives the pretreated sentence with matrix from
frequently used twitter hashtags the irrelevant articles were the initialization module and divides the training and test
removed manually and the research examined a total of data before producing test results for analysis.The module
1388 Hindi news articles and headlines from 4 sources to produces test results and packages them into a .pkl file for
check whether they are biased towards, against, or neutral to later use.
BJP, India's current center-ruling party. The output classifier file and word embedding matrix is
imported to classifier module, it extracts sentences using
article extractor module, vectorizes them, and determines
political ideologies such as conservative, liberal, or neutral

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 400


Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 29,2024 at 09:30:29 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

sentiment.The Part-of-Speech tagger processes broken-up


phrases, which classifies every word as a noun, verb,
adjective, etc. Each word that contains a part of speech is
now sent to the dictionary-based approach. This technique
determines the opinion of the words and their polarity. of
words(W) = W1+ W2+…. Wn where n= number of the total
outcome of each sentence that is computed by making the
use of content sentiment algorithm. The calculation for each
sentence is dependent on its individual words such as
Sequence words in a sentence.Using the sentence score
sentiment algorithm, the outcome is calculated. The overall
score of the summation of viewpoints will examine the
outcome of the topic's bias. The score typically falls
between -1.0 and 0.0 between 0.0 and 1.0. Each word is
given a value, and the sum of those values is used to
calculate the ultimate score for each phrase. A grouping of
one or more synonyms is known as a synset. The sentence is
Fig. 10 Framework of the Political News Bias Detection considered positive if the result is between 0.0 and 1.0. The
System[1] sentence is deemed negative if the score is between -1.0 and
0.0. If it has a value of 0, a neutral value is assigned.
and provides a percentage breakdown of each. The classifier
was first trained with four hidden layers, each containing 10
neurons, a maximum of 200 iterations, no mini-batch, warm
start, or early stopping, and the results showed an average
precision, recall, and F1 score of 68%. With the later
experiment, the score was increased to 72% by incrementing
the neuron number in every layer to 20 in the subsequent
experiment. 1000 was set as the maximum number of
repetitions and the original experiment's batch size, warm
start, and early stopping parameters were maintained. A
number of trials, however, found that performance measures
only climbed with increases in the size of the first concealed
layer but not with consecutive hidden layers. As a result,
500 neurons were chosen as the ideal quantity for the first
hidden layer. The mini-batch size was marked as 32 and
warm_start and early_stopping was set to be true. It gave an
average score of 81%.These findings demonstrate that the
MLP classifier outperformed the RNN model on the same Fig 11. News bias in sentence-level document using
dataset in terms of F1 score. The major issue in this model sentiment analysis[5]
was that the real time political news classification did not
provide justified results. In 20 different opinionated news
sites, it gave either a completely “neutral” score or gave IV. DISCUSSION AND ANALYSIS
exactly the opposite of what was actually expected.
Detecting bias in one of the most sensitized news all over
Another study [5] suggested a model employing the Natural the world. One of the main challenges is the subjectivity of
Language Processing module's content sentiment analysis bias. Researchers have implemented a variety of DL, ML
technique.The dataset used in this model comes from 20 techniques – LSTM, BERT, sentiment analysis, Recursive
different news sites by collecting around 3265 new Neural Network models, Recurrent Neural Network models,
sentences around various topics. In the training phase, Multilayer Perceptron models and other classification
before the selection of news content, tokenization and
methods in order to achieve the most accurate level of
punctuation are done. The news information is then divided
into categories like biased, unbiased, and neutral in online bias. Itis challenging to recommend an approach or a
accordance with predictions. In the testing phase the chosen technique that will provide excellent results in identifying
news subjects are compared with the training set. It can bias in news articles.The articles with above 80% of
extract the sentiment for the word as positive, negative or accuracy has been surveyed which is good results in bias
neutral value. This method involves initially gathering news detection and 89.54% in Headline Attention Network where
topics before extracting the news. The preprocessing stage is the model outperformed by 4.22% from its baseline method
then used to clean up the data by getting rid of any
undesirable elements like semicolons, commas, symbols, LSTM. However the objective would be to raise the level of
etc. After cleaning, it is sent to the sentence parsing accuracy to nearly 100% by improving and experimenting
module.The part-of-speech tagger and dictionary-based with new machine learning techniques, as it is important to
technique are employed in this to determine the promote more accurate news and reduce the spread of fake

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 401


Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 29,2024 at 09:30:29 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

news and misinformation. Relative research of the recent articles in various languages from recent studies in this field
work is shown in the table. of study. There are more available for research purposes.
The requirement for a sizable data set with labels for
TABLE II training models is one of the major issues with deep
learning-based approaches for bias detection. Most recent
COM PARISON AM ONG SOM E OF THE RECENT WORK IN works used inequivalent open-source datasets, where the
ONLINE BIAS DETECTION proportion of biased news was much smaller than that of
unbiased news. Even with small and unbalanced datasets,
numerous techniques produced results with extremely high
sensitivity and specificity. This demonstrates that overfitting
is present to some level and that insufficient data exists to
allow for generalization of the models to the task at hand.
Study Year of Method used Remarks
To some degree, data augmentation approaches may be able
Publication to solve this problem. The training of these models might be
enhanced with more data. The grade of the enhanced data
89.54% may increase by incorporating more labeled data. Thus it
[2] 2020 Headline
accuracy would be possible to create a deep learning model that is
Attention
precise.
Network.
REFERENCES
70%
[1] Minh Vu “Political News Bias Detection using Machine Learning”
Twitter bios + (BERT)
accuracy Department of Computer Science Earlham College 801 National Road
[3] 2019 Article with TLP
West Richmond, Indiana 47374
72%
(LSTM ) [2] R. R. Reddy, S. R. Duggenpudi, R. mamidi,” Detecting Political Bias in
accuracy News Articles Using Headline Attention” 10.18653/v1/W19-4809, ].

[3] R. Baly, G. Da Sa Martino, J. Glass, P. Nakov,” We Can Detect Your


Bias: Predicting the Political Ideology of News
Articles”10.18653/v1/2020.emnlp-main.404, November 2020.
[4] 2019 XLM-RoBERTa 83%
accuracy
[4] S. Agrawal, K. Gupta, D. Gautam, R. Mamidi,” T owards Detecting
Political Bias in Hindi News Articles”, Artificial Intelligence (AAAI-19),
on January 27-February 1, 2019

[1] 2018 Multilayer 81% [5] W. Marusarz, “T he 2022 Definitive Guide to Natural Language
Perceptron model accuracy Processing (NLP)”, November 15, 2022.

[6] L. Burgueño “A Generic LSTM-Based Neural Network Architecture to


7% more as Infer Heterogeneous Model T ransformations”, May 21, 2021.
Sentiment Analysis compared to
Algorithm WiSARD [7] Rohit Kulkarni. India News Headlines Dataset. Kaggle; 2022.
[5] 2019 Algorithm
[8] Pulkit Komal. News Article Dataset from Indian Express. Kaggle;2022.
https://www.kaggle.com/datasets/pulkitkomal/news-article-data-set-from-
indian-express
V. CONCLUSION AND FUTURE SCOPE
[9] Rishabh Misra. News Category Dataset. Kaggle; 2022.
https://www.kaggle.com/datasets/rmisra/news-category-dataset
Most of the news being circulated today exhibits the
pervasive problem of biased news reporting. The manual [10] Shubham Jain. Telugu News Dataset. Kaggle; 2020.
detection of bias in media articles is laborious and time- https://www.kaggle.com/datasets/shubhamjain27/telugu-news-
consuming; therefore, automation of bias detection in media dataset
articles may assist in more effectively assessing the
accuracy of these articles. Media bias has a very large effect [11] Dish. Election News Headlines. Kaggle; 2018.
on society and most often in a negative way. With the use of https://www.kaggle.com/datasets/blogdish/election-news-headlines
methods like LSTM, mBERT, deep learning can be
effective at identifying bias. Deep learning is essential for [12] Gaurav. Hindi T ext Short Summarization Corpus. Kaggle; 2020.
identifying internet bias. This article compares a few recent https://www.kaggle.com/datasets/disisbig/hindi-text-short-summarization-
efforts that have used deep learning algorithms to identify corpus
bias in distinct open-source data sets. The number of studies
covered in this article is five. This study has evaluated five
publicly available data sets that include news headlines and

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 402


Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 29,2024 at 09:30:29 UTC from IEEE Xplore. Restrictions apply.
Proceedings of the Second International Conference on Applied Artificial Intelligence and Computing (ICAAIC 2023)
IEEE Xplore Part Number: CFP23BC3-ART; ISBN: 978-1-6654-5630-2

[13] Natural Language Inference: Fine-T uning BERT , 16.7. Natural


Language Inference: Fine-Tuning BERT — Dive into Deep Learning 1.0.0-
beta0 documentation (d2l.ai)

[14] Kamen cay, P., M. Benco, T. Mizdo ˇ s and R. Radil. “A new method
for face recognition using convolutional neural network.” Advances in
Electrical and Electronic Engineering 15 (2017): 663 -672.

[15] SV. Shri Bharathi, Angelina Geetha “Determination of news


biasedness using content sentiment analysis algorithm” ISSN: 2502-4752,
DOI:10.11591/ijeecs.v16.i2.pp882-889, November 2019.

[16] Parisa Bazmi , Masoud Asadpour , Azadeh Shakery “Multi-view co-


attention network for fake news detection by modeling topic-specific user
and news source credibility”, DOI:10.1016/j.ipm.2022.103146, 8
November 2022.

[17]Andreea Iana, Alexander Grote ,Katharina Ludwig Philipp Müller


,Heiko Paulheim ”T owards Analyzing the Bias of News Recommender
Systems Using Sentiment and Stance
Detection”,DOI:10.1145/3487553.352 4674,16 August 2022

[18] Saily Shah, “ Convolutional neural network: an overview” January


27,2022.

978-1-6654-5630-2/23/$31.00 ©2023 IEEE 403


Authorized licensed use limited to: PES University Bengaluru. Downloaded on November 29,2024 at 09:30:29 UTC from IEEE Xplore. Restrictions apply.

You might also like