Dissertation cn600
Dissertation cn600
CN6000
27 February 2025
Automatic AI for Detection of Fake News Student’s first and last name
Abstract
The objective of this project is to figure out how advanced tools for artificial intelligence (AI) are
capable of helping detect fake news. The project's goal is to develop a strong and reliable system
that can identify the difference between real and fake news. In order to finish the task, several
distinct machine learning and deep learning models were implemented. Some of these are GPT,
Naïve Bayes, Support Vector Machines (SVM), and Long Short-Term Memory (LSTM) networks.
There are numerous significant phases that collectively make up the project method. A lot of pre-
processing steps are required for the written text in the first steps. For example, the HTML tags and
stopwords are filtered out, and the data is turned into tokens. In this manner, the written data is sure
to have been modified correctly and is now ready to be modeled. After that, it is divided into two
groups: training and testing. The models are subsequently shown what to do with the help of the
training data. To fully understand how effectively the models work, evaluation methods like the
F1-score, accuracy, precision, and recall are implemented. There are also stability tests that check
the extent to which the AI system can adapt to changes in the data and how it is distributed out.
The outcomes of this project demonstrate that AI-based models make it simpler to identify fake
news. The LSTM model is the most effective in terms of accuracy and overall classification
measurements. The SVM model, Naïve Bayes model, and LSTM model all function exceptionally
well. This indicates that models for natural language processing (NLP) could be helpful in this task.
It means that the more developed machine learning and deep learning methods can distinguish fake
news from the rest, they are more effective artificial intelligence techniques. This work is aimed at
decreasing the field of spread of fake news and gathering researchers’ fresh ideas to help to stop
spreading the fake news in online information systems.
2
Automatic AI for Detection of Fake News Student’s first and last name
Acknowledgments
I greatly appreciate the invaluable experience and guidance of my supervisor "Dr. Sujit Biswas"
who was an aide right through the entire work duration. In addition to providing their constant
support and invaluable advice, they did so in a way that steered the direction and implementation of
the study.
In addition to that, my colleagues and peers are valuable as they really do the analytical job,
sharing their insights that were necessary to polish the work. Moreover, I also want to point out to
the invaluable help of my college friends as well as friends in my business that were an important
source of useful comments and suggestions. Partnerships has played a pivotal role for the
attainment of the aim of this investigation.
At last, my family is the last people I want to thank for their patience and tolerance, they are the
source of the energy which is needed for my research together with a conducive environment for
my research.
3
Automatic AI for Detection of Fake News Student’s first and last name
C ontents
Abstract..............................................................................................................................2
Acknowledgments..............................................................................................................3
Chapter 1: Introduction..................................................................................................7
1.1 Background.................................................................................................................. 7
2.1 Introduction................................................................................................................ 10
2.2.3 The role of Artificial Intelligence (AI) in addressing the challenge of fake news
10
2.5.1 Analysis of the current landscape of fake News and its Impact on Society...12
4
Automatic AI for Detection of Fake News Student’s first and last name
2.7 Review of Machine Learning Algorithms, Natural Language Processing (NLP), and
other AI Methodologies in the Context of Fake News Detection...........................................14
2.8.1 Evaluation of strengths and limitations of different models and algorithms.. .16
2.10 Identification of Key Approaches, algorithms, and Techniques Used in the research
methodology.......................................................................................................................... 17
2.12 Summary................................................................................................................... 18
3.1.2 Data-Preprocessing.......................................................................................20
Chapter 4: Results/Findings/Outcomes.......................................................................25
5
Automatic AI for Detection of Fake News Student’s first and last name
Chapter 5: Evaluation..................................................................................................33
Chapter 6: Conclusion.................................................................................................35
Reference List..................................................................................................................37
List of Figures
Figure 1: Timeline for the evolution of fake news............................................................................12
Figure 2: An AI and ML-based methodology for detecting fake news and disinformation.............14
Figure 3: ML framework for Fake news detection............................................................................15
Figure 4: Trending Techniques to Detect Fake News.......................................................................17
Figure 5: Block Diagram...................................................................................................................22
6
Automatic AI for Detection of Fake News Student’s first and last name
Chapter 1: Introduction
1.1 Background
The proliferation of bogus news through digital media is highly detrimental. It is experienced
globally. There is something called "fake news" which is false information that is spread as real
news to trick, control, or change people's minds. The internet and social networks have made this
issue worse by speeding up the spread of fake and misleading information around the world. It's
important for people to think and act based on accurate information. It may have negative impacts
on public policies, and election results, and even contribute to increased aggression in individuals.
There are tech-based options being looked into, especially in the area of artificial intelligence (AI),
to stop the spread of fake news right away. AI has a lot of computer power and formulas that are
always changing, so it looks like it could be utilized to find and stop fake news on its own. It's very
important for our society that technology and the media work together to keep knowledge correct.
To collect and preprocess relevant datasets for training and testing the AI model.
To evaluate the effectiveness of the AI model in identifying fake news and analyze its
performance.
7
Automatic AI for Detection of Fake News Student’s first and last name
Introduction: Outlining the research background, problem statement, aims, objectives, and
significance.
Literature Review: A detailed analysis of existing literature in the fields of AI, NLP, and
fake news detection.
Results and Discussion: Presenting the findings of the research, discussing the
implications, and comparing them with existing literature.
References: Creating a list of all the citation those are used in the dissertation.
Appendices: It includes any related extra material for this research, like data samples, code
snippets, or comprehensive tables.
8
Automatic AI for Detection of Fake News Student’s first and last name
2.1 Introduction
The popularity of wrong information on online platforms provides a significant threat to the
truthfulness in public conversations and the trustworthiness of information. This study provides in-
detailed analysis report related to the effective and efficient use of artificial intelligence (AI) in the
era of broadcasting a wrong information. There is only one primary objective is to observe various
AI and NLP techniques and regulate how to integrate them in a coherent way. The primary
objective is to develop a that type of system those are capable for accurately display truth and
defining the reliability of news articles. The day to day increasing impact of digital platforms on
public opinion, it is very important to develop strong approaches based on AI those easily and
automatically detect misinformation.
2.2.3 The role of Artificial Intelligence (AI) in addressing the challenge of fake news
There's no doubt that fake news needs to be dealt with by artificial intelligence. Artificial
intelligence improves individuals' abilities by facilitating quick recognition and classification of
false or misleading information. This tool possesses other features that extend beyond the detection
of fake information. It also stops it from spreading. This part will talk about different ways AI can
be used, with a major focus on how it can make digital information ecosystems more reliable (Patil,
et al., 2024).
9
Automatic AI for Detection of Fake News Student’s first and last name
spreads through networks and various channels. This approach is used to investigate the
consistency has felt some changes, as Sitaula et al. analyse a new sources and writings. The study
shows how important an author's past links to fake news and the number of writers are work with
us and find out how trustworthy they are by looking at closely at fake news data those are
presenting in front of public. This changes in history provides ideas on related to the new way of
doing things that considers both content-focused factors and source-related hints about reliability.
These findings highlight needs to gain furthermore understanding on writing and consistency of
sources, specified the constantly changes on nature of fake news. Moreover, it recommended that
complete restoration of our strategy for identifying the wrong information may be needed.
The work of Gangireddy et al. (2020) on the use of a graph-based methodology for unsupervised
false news detection represents a significant landmark in the advancement of fake news detection.
Lots of fake information are find out with the help of supervised learning. For this, we needed a
large number of datasets those are written in correct manner. To solve the issues related on how
fake news spreads on social media platforms, this study come up with a new self-detection method
called GTUT. The methods provide services related to graph-based techniques such as feature
vector learning, biclique recognition, and label spreading to run this series in three major steps that
progressively increase the labelling process. Due to the limited availability of labelled historical
data, this study and research suggests novel and effective and efficient method for identifying
objects without any surveillance. Empirical experiments have demonstrated that GTUT surpasses
the current methodologies by a margin of over 10 percentage points in terms of accuracy. This
report suggests several possible paths for additional researches, it includes the incorporation of
emotions, analysis on social media connections, and this survey involved labelling inside the
graph-based framework. These additions main aim is to enhance the effectiveness and efficiency of
unsupervised detection process.
10
Automatic AI for Detection of Fake News Student’s first and last name
According to Zhou and Zafarani's (2020), A detailed analysis on wrong information involves
considering its historical background and the growth made in artificial intelligence systems is
designed to identify it. The analysis indicates the fastest spread of false information is creating
harmful consequences on democracy, justice, and public trust on government. Journalism, political
science, computer science, and the social sciences should work together on projects that span
multiple fields. because this study provides aspects on how to find out fake news based on elements
it includes writing style, distribution method, and the consistency of sources. The conclusion
highlights the importance of the survey in classifying fake news, developing basic theories, and
identifying problems and locations that require more investigation. It gives them ideas and makes
people want to work together to create a system that can find fake news and explain it. In the other
paper Choraź et al., (2021), say that a full mapping study on advanced machine learning methods
for finding fake news can help us see how far AI has come in the last few decades in the fight
against fake news online. The paper examines fake news from many eras and locations, with a
primary emphasis on its current application in information warfare. It is hard to validate how false
news is the biggest reason for important societal issues. This study provides a full of guidance
because it provides answers based on the expert works, reviews, and stresses how to use smart
systems it provides important ways to find places where false information comes from. This study
provides suggestions on captivating examination the spread of wrong information, the size of
educational involvements helps to promote continuous learning and the requirements for
transparency in machine learning systems develop to identify and respond to the false news.
11
Automatic AI for Detection of Fake News Student’s first and last name
the online information ecosystem is underscored in the conclusion. Moreover, This shows the risks
that come with the spread of useless information. What the review mostly talks about are study
gaps. These are mostly in how false information spreads on various platforms and languages, as
well as how networks change over time. These voids provide academics with crucial new insights
and highlight areas that require further investigation.
The study by Zhang and Ghorbani (2020) examines the huge amount of fake news on the internet
and how it changes society. The US presidential election of 2016 is a prime example of this.
Identifying fake news could be difficult, the authors admit, because there is so much information
online. However, they stress how important it is for people and technology to work together. The
study takes a close look at the current ways of finding fake news, paying special attention to things
like user, content, and context. It also shows places where more study can be done to improve
detection frameworks and datasets. The main goals of the survey are to identify and categorize
forms of misinformation, evaluate various methodologies for identifying it, and identify specific
areas that necessitate additional research to enhance online surveillance and detection systems
designed to counteract false news.
The investigation conducted by Molina et al., (2021), analyses the changing definition of "fake
news," which goes beyond simple lies to involve a variety of online content kinds. Their seven-
category system helps to see the bigger picture. Satire, fake news, and amateur journalism are all
part of it. This paper shows a classification of false news based on its message, source, structure,
and network traits. This will enhance understanding of its essence. They stress the importance of
taking into account the objective and methodology of machine learning. They also say that the
amount of information that can be gathered and the subjects that can be studied are limited. They
think that the computer and social sciences should work together to make it better at finding fake
news. More statistical testing of features is what they desire.
12
Automatic AI for Detection of Fake News Student’s first and last name
Figure 2: An AI and ML-based methodology for detecting fake news and disinformation
The growing risks of fake news and disinformation (FNaD) penetrating social media and online
platforms, which can seriously affect decision-making and disrupt supply chains, are discussed by
Akhtar et al. (2023). The study draws attention to the paucity of research on creating FNaD-
specific AI and ML models to reduce supply chain disruptions (SCDs). Based on a blend of
artificial intelligence, machine learning, and case studies from Pakistan, Malaysia, and Indonesia,
the authors suggest a FNaD detection algorithm intended to prevent SCDs. The approach shows
efficacy in managerial decision-making, utilizing a variety of data sources. The study adds to the
literature on supply chains and AI-ML. It provides useful insights and recommends future research
directions, emphasizing the need for a focus on particular FNaD and supply chain operations, the
integration of operational performance measures, and longitudinal studies to explore evolving
SCDs.
13
Automatic AI for Detection of Fake News Student’s first and last name
credentials of wrong information in Thailand it is possible way to tackle the unescapable issue of
false informations. At the time of two critical model development and data acquisition—machine
learning, natural language processing (NLP), and information recovery are effectively used in this
research. This study executes several machine learning models that organize content from Thai
online news sources by using web-crawler information extraction techniques and natural language
processing. LSTM, which gives 100% on test set accuracy, memory, precision, and f-measure, it is
knowns as the best model. Once the research is completed, a web app that automatically identify
fake news online will be launched. This shows that the problem of fake news needs flexible
solutions.
14
Automatic AI for Detection of Fake News Student’s first and last name
In their survey, Merryton and Augasta, (2020), address the important issues of fake news on social
media, these show that machine learning, and especially deep learning, can be used in this
situation. The authors draw attention to the growing difficulty in separating authentic
communications from phony ones, particularly during occasions such as general elections when
political parties use social media to disseminate possibly false material widely. The research
explores a range of machine learning techniques, comparing the effectiveness of deep learning—a
subset that emulates the functioning of the human brain—with more conventional methods. The
authors believe that deep neural networks show the potential to outperform standard methods,
especially when dealing with complicated applications and big data volumes. The report
summarises effective categorization techniques for identifying fake news, highlighting the possible
overlap between traditional machine learning techniques and deeper learning approaches.
15
Automatic AI for Detection of Fake News Student’s first and last name
The comprehensive evaluation of existing literature, known as a systematic literature review (SLR),
was conducted by Iqbal et al. (2023) to reveal the complicated interaction between Fake News
Detection (FND) and artificial intelligence (AI). It uses "Preferred Reporting Items for Systematic
Reviews and Meta-Analyses" to look at 25 studies that understand and reviewed by other experts.
From what was found, FND and AI are linked. Peoples very well-known about that receiving fake
information can hurt them and it also effect on its health and it make a risk related to the health. It
is very important to be able to tell people apart now. As effective and efficient countermeasures to
false information, digital literacy, fact-checking websites, automated technology, and big data
analytics are all helpful assets. Moreover, apart from contribution of researchers in these theoretical
visions, this study also transfers managerial suggestions for IT specialists, legislators, and
educators. As a result, it creates a crucial standard to prevent the widespread supply of false
information on social media platforms. Research by Merryton and Augasta (2020) investigates the
importance of machine learning in handling universal issue of fake messages on social media. They
stress how important it is to be able to tell the difference between real and fake news, especially
when it comes to politics. The study looks at a lot of different types of machine learning, but deep
learning is the one that gets the most attention.
A substantial subset of machine learning, deep neural networks autonomously derive high-level
features from unprocessed data to thrive in intricate applications. The survey-style study offers
insights into various approaches used in false news detection research and emphasizes the benefits
of deep learning approaches over traditional ML techniques.
16
Automatic AI for Detection of Fake News Student’s first and last name
vector machine. Another important point made by the study is that more information, like about the
author, is needed to better spot fake news. The writers believe that in the future, computer fact-
checking models will be used, so they focus on knowledge-based approaches to improve accuracy
and help users understand better. Rohera et al. (2022) examine the widespread problem of fake
news spreading on social media platforms and highlight its negative effects in their study. The
researchers provide a taxonomy of current methods for identifying fake news, with an emphasis on
social media sites including Twitter, Facebook, WhatsApp, and Telegram. With the help of a self-
aggregated dataset, the study trains four machine learning models: LSTM, Random Forest (RF),
Passive Aggressive Algorithm, and Naive Bayes (NB). 92.34% of the time, LSTM can tell the
difference between real and fake news. The study advocates for the implementation of a hybrid
approach that integrates NB and LSTM techniques to improve the accuracy of detection.
2.12 Summary
Different areas' methods have made it much easier to spot fake news, as we can see from the
research that has already been done. To find false information on digital platforms, academics use a
mix of mixed models, machine learning, and natural language processing. Robust identification
methods comprise LSTM, SVM, and fact-verification functionalities. Furthermore, the impact of
geographical limitations is alleviated by establishing reference datasets such as the Indian Fake
News Dataset. The accuracy of the models has grown, but the complexity of changing news is still
a problem. Innovations include letting users give feedback, changing things in real-time, and
making multimedia analysis better. The study's findings highlight How important it is to develop
detecting methods to counteract false news's constant evolution.
17
Automatic AI for Detection of Fake News Student’s first and last name
18
Automatic AI for Detection of Fake News Student’s first and last name
Fake and real news dataset: This dataset is available on the Kaggle site that contains separate
files for fake and real news articles. Considering the appropriate requirements for the dataset, this
data is found suitable to use in this project.
There are approximately 40,000 news articles are present in this dataset folder which is sourced
from the Kaggle site. The link for the dataset is attached at the bottom of this page. The new
articles present in the dataset is separated in two groups as real news articles and fake news articles.
The objective of using this dataset is to supervise the proposed machine learning models and after
training of the models, the subset of dataset will be used for the performance evaluation using
different performance metrics. Individual articles in this dataset is labelled accordingly that provide
a clear distinction that assists in training the supervised machine learning models.
There are suitable number of instances are present in the dataset for both labels of the news making
is large volume dataset that ensures that model trained on this dataset can learn the intricate
difference in language, style, and presentation.
The inclusion of both fake and real news articles in significant volumes ensures that models trained
on this dataset can learn the nuanced differences in language, style, and presentation that typically
distinguish factual information from misinformation or disinformation.
-------------------------------
https://www.kaggle.com/code/madz2000/nlp-using-glove-embeddings-99-87-accuracy
3.1.2 Data-Preprocessing
Data preprocessing is too very important step to prepare the raw data for analysis. This phase
involves some aspects those are written in this manner:
The preprocessing steps records are very crucial in preparing text data for machine learning (ML)
and natural language processing (NLP) tasks, similarly for applications like fake news detection.
Here is a more detailed explanation of each step:
19
Automatic AI for Detection of Fake News Student’s first and last name
1. Cleaning
Text data, specifically collected from the web, it contains lots of irrelevant information that could
be misleading or unhelpful for analysis. Cleaning the data involves removing these unnecessary
parts to confirm that the machine learning model focuses only on meaningful content. It includes:
HTML tags: Meanwhile web pages are created by using HTML, scraping a content directly from
them can result in a mix of content and HTML markup. HTML tags do not contribute to
understanding the text's meaning and are thus removed.
Advertisements: Ad content are mixed-up with the real news content, which can skew the
analysis. It’s very important to remove these to focus on the news text itself.
Non-textual elements: It includes images, videos, and any implanted multimedia. Then our focus
is on textual analysis, these elements are removed.
2. Normalization
Normalization is the process of transforming text into a single canonical form that it might not have
had before. It reduces the complexity for NLP tasks. This step includes:
Converting to lowercase: It confirms that the same words are known as identical regardless of
their place in a sentence or their usage, e.g., "The" and "the" are treated the same.
Removing punctuation and special characters: Punctuation marks and special characters are
introducing extra complexity without contributing meaningfully to understanding the text's
meaning. Removing them simplifies the data.
3. Tokenization
Tokenization is the process of splitting a text object into smaller units known as tokens.
Examples of tokens can be words, characters, numbers, symbols, or n-grams . This stage is
foundational for text analysis as it transforms a text from a string of characters into a list of tokens
that can be analysed individually.
Stop word removal is one of the most used preprocessing steps across different NLP applications.
The idea is simply removing the words that occur commonly across all the documents in the
corpus. Typically, articles and pronouns are generally classified as stop words, such as "the", "is",
"at", "which", and "on". These words are removed for reducing the dataset size and improve
processing speed. On the behalf of fake news detection, focusing on more meaningful words must
improve the model's ability to learn discriminative features.
Both stemming and lemmatization are techniques used to reduce words to their base or root form
but in slightly different ways:
Stemming: Stemming is the process of removing the last few characters of a given word, to
obtain a shorter form, even if that form does not have any meaning . It is a basic experimental
process that cuts the ends off words based on common prefixes or suffixes that can be found in an
inflected word, leading to a reduced form called the "stem".
Lemmatization: The purpose of lemmatization is same as that of stemming but overcomes the
drawbacks of stemming. Its main aims is to remove inflectional endings only and to return the
20
Automatic AI for Detection of Fake News Student’s first and last name
improper or vocabulary form of a word, known as the "lemma". Lemmatization is classier and
using a vocabulary and morphological analysis, thus easily handling irregular words better.
These preprocessing steps is very important for reducing the complexity of the text data, This
focuses on the most meaningful elements, and at the end improving the performance of machine
learning models in tasks like fake news detection.
21
Automatic AI for Detection of Fake News Student’s first and last name
Supervised Learning Models: Such as Logistic Regression, Support Vector Machines (SVM), and
Naive Bayes classifiers, are very well known for their effectiveness and efficiency in text
classification tasks.
Deep Learning Models: It include Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs), with a focus on Long Short-Term Memory (LSTM) networks, to capture the
sequential nature of textual data.
Ensemble Methods: To interconnect the multiple models for improves the prediction accuracy,
such as Random Forests and Gradient Boosting Machines.
This methodology provides a well-structured approach on developing an AI-driven system for the
detection of fake news. By merging advanced Machine learning (ML) and Natural Language
Processing (NLP) techniques with a complete evaluation framework, this estimated system main
objective is meaningfully improve the ability to identify and ease the spread of half-truth. Future
22
Automatic AI for Detection of Fake News Student’s first and last name
work will focus on purifying the model through continuous learning and variation to new forms of
fake news, confirming the system remains actual in the ever-evolving digital landscape.
23
Automatic AI for Detection of Fake News Student’s first and last name
Chapter 4: Results/Findings/Outcomes
This chapter addresses the study's real results and findings, which examined into how to mark news
stories as being reliable or not. Several types of analyses of data and machine learning models are
employed to look at an impartial assessment of the methods and the efficacy of each model. The
first part of the chapter presents a summary of the aspects of the dataset. After that, it discusses in
more detail about the findings that were discovered by showing and examining the data. This is
followed by testing three supervised learning approaches to see how effectively they may classify
news stories. Neural networks such as Long Short-Term Memory (LSTM), Support Vector
Machines (SVM), and NaÅve Bayes. The positive and negative aspects of each model are
discussed about in detail, and evaluation determines are used to provide useful information. This
helps to move text methods for classification forward in the domain of news categorization.
1. Distribution of Labels: A histogram was utilized to display how the names in the data
were unevenly distributed. The two labels, 0 and 1, were spread out pretty equitably, as
shown by the histogram. This suggests that there are an adequate number of articles in both
the "reliable" and "unreliable" categories, with about 10,000 articles in each category.
2. Word Count Distribution: An illustration was made to demonstrate how the number of
words in the stories was spread out. It was almost impossible to find articles with word
counts over 2,500. The number of stories with word counts over 1,000 dropped sharply.
This means that the dataset had a lot of short items.
24
Automatic AI for Detection of Fake News Student’s first and last name
3. Author Analysis: A bar chart has been created to show the top 10 authors based on the
number of articles they have written. The most commonly recognized author was "Pam
Key," followed by "admin" and "Jerome Hudson." This meant that each of these
individuals had added a great deal to the knowledge.
4. Text Length vs. Label: A box plot was used to see how the amounts of text on two labels
were spread out and compare them. Articles that were reliable (label "0") had a narrow
spread of low word counts with few outliers. Articles that were not trustworthy (label "1"),
on the other hand, had an occasionally higher average word count with a couple of very
high counts.
25
Automatic AI for Detection of Fake News Student’s first and last name
further cleaned up by eliminating a number of stopwords. The objective of the procedure was to get
rid of noise while enhancing features better so that they could be examined later. Also,
BeautifulSoup was employed to get rid of the HTML tags in the text, leaving only the text that was
significant to be expanded on further.
It was very essential to tokenize the text data so that algorithms for machine learning could
interpret it as a number code. The Tokenizer class from the Keras library was employed to fit the
text data, and it was determined that the size of the vocabulary was 237,927 words. It was trimmed
to make certain that all the inputs were the same size, and 1000 words were the largest string that
could be utilized. It was also easy to add helpful details from an extensive quantity of text when
GloVe pre-trained word embeddings were utilized. The GloVe Twitter dataset was loaded, which
has 1,193,514 word vectors with 100-dimensional embeddings. Pre-processing steps transformed
the text data so that it could be employed later for models and analysis. This provided a strong
foundation to create machine learning models that can identify fake news and function properly.
The SVM algorithm did an outstanding task of organising articles into groups, as can be seen by its
94.13% score on the test data. The total number of errors was approximately the same for both
classes, corresponding to the confusion matrix and classification report.
According to the confusion matrix, the SVM model appropriately categorised 3212 fake news
articles and 3238 real news articles. There was an appropriate amount of error between the two
classes, as it wrongfully classified 214 real articles as fake and 188 fake articles as real.
Visualization of Performance
The confusion matrix looked superior as a heatmap, which made it simpler to determine how well
the model performed. The heatmap, which presented correct and wrong classifications of real and
fake news articles, emphasized the balanced error dispersion.
26
Automatic AI for Detection of Fake News Student’s first and last name
The test data demonstrated that the Naïve Bayes classifier was able to identify the distinction
between real and fake articles because it got 87.41% of them correct. Despite experiencing a higher
error rate than SVM, Naive Bayes showed significant improvements in article classification.
According to the confusion matrix, Nave Bayes appropriately categorized 3091 fake news articles
and 2898 real news articles. However, it has been seen that a higher error rate in misclassifying real
articles as fake (335) and fake as real (528).
27
Automatic AI for Detection of Fake News Student’s first and last name
Visualization of Performance
A heatmap of the confusion matrix for Naïve Bayes demonstrated the extent to which it worked in
the same fashion that SVM did. It was simple to see which labels were wrong on the
heatmap which demonstrated where the accuracy and general classification were missing.
In conclusion, both SVM and Naive Bayes classifiers demonstrated amazing capabilities for article
classification, with SVM demonstrating slightly more favorable outcomes in terms of accuracy and
uniformly distributed error. Nave Bayes was able to identify the difference between fake and real
news articles regardless of their higher error percentages. These findings show how helpful
machine learning algorithms are for determining how true news articles are and then classifying
them.
The LSTM neural network, defined by its multi-layer architecture, carried out extremely well,
accurately classifying 96.51% of the test data. It has been demonstrated just how effective the
LSTM model is at text classification by performing more effectively than both the SVM and Naïve
Bayes algorithms.
28
Automatic AI for Detection of Fake News Student’s first and last name
The model is composed of an embedding layer, 3 LSTM layers, and a thick layer for classification.
It makes use of word embeddings that were previously learned to show words in a vector space that
continues on and on. The linear dependencies in the raw data are selected up by the LSTM layers.
This allows the model to learn how to cope with long-term dependencies successfully.
As it was trained, the LSTM model experienced 50 epochs, which made it stronger over time. They
learned how to employ the Adam algorithm to determine the best number for the cross-entropy of a
binary loss function. A list of the model's factors and layers in the report demonstrated how it had
been assembled together.
Performance Evaluation
The LSTM model performed astonishingly well, correctly identifying 3318 fake news articles and
3295 real news articles. It was precise 96.51% of the time and had high precision, recall, and F1-
score for both classifications.
29
Automatic AI for Detection of Fake News Student’s first and last name
This matrix demonstrated that there were just a few wrong classes, which added to the
demonstration of the manner in which the model performed. It was very accurate given that only
108 fake articles were categorised as real and 131 real articles were recognized fakes.
The heatmap of the confusion matrix made it easy to determine how well the model performed.
Darker shades demonstrated correct labels, while lighter shades indicated wrong ones. LSTM
performs well at text classification tasks because it is accurate and has very few errors. It is
particularly efficient at telling the difference between fake and real news articles.
The first news story indicated that the Earth is flat. It was considered an important discovery, but it
was not believed to be true (label "1"). It is remarkable that all three models—SVM, Naïve Bayes,
and LSTM—constantly corroborated with the label's assessment that this story lacked
dependability.
30
Automatic AI for Detection of Fake News Student’s first and last name
The second news story, which had been designated as reliable (label "0"), was about NASA's
discovery of an unknown planet that might have sufficient drinking water for life. Additionally, all
three models generated identical predictions, which is why the article was correctly marked as
reliable.
The SVM, Nave Bayes, and LSTM models all did exceptionally well at predicting what was going
to occur this time. This shows how effectively they are able to identify the difference between
articles that they can trust and ones that they shouldn't. This consistency indicates how well the
models can classify news articles. This demonstrates that they may assist in stopping the global
distribution of fake news and false information.
31
Automatic AI for Detection of Fake News Student’s first and last name
Chapter 5: Evaluation
The AI-based system for identifying fake news has been put through a lot of tests, and all of the
models generated results that were outstanding. Among the models, the Support Vector Machines
(SVM) model was 94.13% right, the Naïve Bayes model was 87.41% right, the LSTM model was
96.51% right, and the GPT model always provided precise predictions. The models are capable of
identifying the difference between real and fake news articles as they have high accuracy rates.
This means that they might assist in fight false information.
Performance Metrics
In addition to accuracy, additional performance metrics such as precision, recall, and F1-score were
additionally employed to completely evaluate the models. All the models did reasonably well, but
the LSTM model did extremely well. It had the highest accuracy, recall, and F1-score. This
demonstrates that not only can it correctly classify articles into groups, but it is also effective at
cutting down on mistakes. This contributes to the system to identify fake news more accurate
overall.
Robustness Testing
There were evaluations of the AI system's robustness by providing the models various data sets and
settings to work with. It was obvious that the models were capable of handling changes in how the
data is distributed and what it includes because their accuracy scores were always high across
various data sets. Another test that demonstrated the AI system was strong was a sensitivity
assessment, which investigated what happened to model performance when specific variables were
changed. In all circumstances, the results demonstrated the same level of success.
A study was conducted to see how well the basic models and the new AI models worked together.
In terms of accuracy, precision, recall, and F1-score, the results revealed that the AI-based models
did much better than the baseline models. These results demonstrate that advanced machine
learning and deep learning methods are more effective at finding fake news. The difficulties that
lies and false information cause might be simple to handle if these methods were utilized.
A lot of attention was paid to what steps to include in the steps that could implement a secret AI
system that can identify fake news. The phases were that of data being prepared, choosing a model,
instructing it, using it testing and then making practical uses. The process of the system to work in
a perfect flow as well as a reliable type was a lot of planning and thinking in order to better my
chances of making it. A patterned strategy is another way of tackling the hard work of
differentiating fake news from true one.
Challenges Faced
32
Automatic AI for Detection of Fake News Student’s first and last name
The AI algorithm is complicated to develop and check due to some difficulties. The information
categorization was a big challenge; accuracy was critical; the model had to be further enhanced;
and computational shortages were too apparent. Data improvement techniques advanced model
frameworks, and algorithms for optimization are some of the creative and fresh methods that these
issues have been solved. As the preceding instances show, it is crucial to be flexible and
imaginative when concerns arise.
Lessons Learned
AI systems that can identify fake news can now be used owing to the project's helpful lessons and
learned facts. After a lot of work, the right model had been selected and the data was reviewed to
make sure it was correct. It additionally advised to keep trying and improve so that it continues to
uncover fake news even if things evolve. This demonstrates just how important it is to be flexible
and receptive to new thoughts in a field that is always developing.
There is a great deal of optimism that studies is going to soon result in significant advancements.
As per the study, discovering new trends of false information quickly could be accomplished by
adding real-time monitoring, and combined techniques in machine learning could help the model
do superior. Everyone who employs the model will comprehend it better and find it easier to
operate if there are methods for users to provide feedback. In order to tackle fake news, an
environment was created that supports growth by collaborating with experts in the field and other
significant individuals. Digital information will be more safeguarded in the long run as things
evolve. This will help it fight lies better and be less inclined to believe them.
33
Automatic AI for Detection of Fake News Student’s first and last name
Chapter 6: Conclusion
This project's primary objective is to show how useful it is to make use of advanced AI to stop the
growth of fake news, which is a required job in today's information world. A lot of various
algorithms for deep learning and machine learning have been meticulously developed and tried out.
Support Vector Machines (SVM) and Naïve Bayes classifiers are two of the simpler ones. Long
Short-Term Memory (LSTM) networks and GPT-based models are two of the more complex ones.
These very complicated models demonstrate that the people who made them actually know how to
distinguish the difference between real and fake news. This is proof that AI can very precisely
identify the difference between truth and lies.
Key Findings
The LSTM, SVM, and Naïve Bayes models all did an excellent job, getting excellent
results for F1-scores, accuracy, precision, and recall. It is believed that the LSTM model to
be extremely intelligent. There are significant connections between time and written data in
recurrent neural networks, as demonstrated here.
The forecasts that were based on GPT led to accurate and consistent classifications. So, it
demonstrates that natural language processing (NLP) models may be helpful when it's
necessary to find fake news. This is evidence of how essential it is to use advanced
language models that were recently trained on an immense quantity of text data.
The AI system was capable of handling different types of data and materials, as shown by
tests that demonstrated it was robust. News articles can be quite distinct in style, matter,
and source, which is essential for practical applications.
Limitations
Although AI models are highly precise now, they might still have difficulty finding types
of lies and misinformation that are very complicated and will evolve over time. Systems
have difficulty finding fake news because it evolves all the time.
It is more challenging for the models to operate with data they have not encountered before
when labeled datasets are employed for model training, which may lead to error. It is
essential to work to get rid of these biases and make certain the models are fair for ethical
application.
AI-based systems might not be frequently utilized to identify fake news because they have
trouble expanding and don't have sufficient computing power. This is particularly true on
large platforms that must deal with a lot of data concurrently.
Future Opportunities
Researchers could look into methods for ensemble learning in the future to make models
that find fake news even more precise and helpful. Ensemble techniques may be able to
enhance the accuracy and reliability of classification by bringing data together from
multiple models.
AI systems could do their work better and more publicly if they had real-time tracking
resources and methods for users to provide feedback. They could also find patterns that
might be fake news. In the future, this way of making models smoother might make the
system work smoother.
Fighting fake and false data is most beneficial when writers, social media sites, and experts
in the field of study work together. For better detection systems, working with people from
various industries can bring about fresh concepts and perspectives.
On top of identifying fake news, this work could be made greater by looking for other
damaging content, such as sexist remarks, propaganda, and online scams. Researchers may
34
Automatic AI for Detection of Fake News Student’s first and last name
utilize similar AI methods and modify them so that they work in different environments to
help protect data stored in digital spheres.
In short, this project has accomplished a lot to enhance the way AI finds fake news, but there is still
a lot to be learned and inventive concepts to come up with. Further research is warranted to identify
and address the problems discovered, as well as to expand upon the primary discoveries, in order to
establish more robust, scalable, and morally acceptable mechanisms for preventing the
dissemination of forged information and safeguarding the reliability of online information
ecosystems. There is a lot of room for further development to be made in this essential field of
study as things transform.
35
Automatic AI for Detection of Fake News Student’s first and last name
Reference List
Aggarwal, A., Chauhan, A., Kumar, D., Verma, S. and Mittal, M. (2020) ‘Classification of fake
news by fine-tuning deep bidirectional transformers based language model’, EAI Endorsed
Transactions on Scalable Information Systems, 7(27), pp.e10-e10.
Ahmad, I., Yousaf, M., Yousaf, S. and Ahmad, M.O. (2020) ‘Fake news detection using machine
learning ensemble methods’, Complexity, 2020, pp.1-11.
Akhtar, P., Ghouri, A.M., Khan, H.U.R., Amin ul Haq, M., Awan, U., Zahoor, N., Khan, Z. and
Ashraf, A. (2023) ‘Detecting fake news and disinformation using artificial intelligence and
machine learning to avoid supply chain disruptions’, Annals of Operations Research,
327(2), pp.633-657.
Choraś, M., Demestichas, K., Giełczyk, A., Herrero, Á., Ksieniewicz, P., Remoundou, K., Urda, D.
and Woźniak, M. (2021) ‘Advanced Machine Learning techniques for fake news (online
disinformation) detection: A systematic mapping study’, Applied Soft Computing, 101,
p.107050.
Collins, B., Hoang, D.T., Nguyen, N.T. and Hwang, D. (2021) ‘Trends in combating fake news on
social media–a survey’, Journal of Information and Telecommunication, 5(2), pp.247-266.
Du, Y., Bosselut, A. and Manning, C.D. (2022, June) ‘Synthetic disinformation attacks on
automated fact verification systems’, In Proceedings of the AAAI Conference on Artificial
Intelligence, Vol. 36, No. 10, pp. 10581-10589..
Faustini, P.H.A. and Covoes, T.F. (2020) ‘Fake news detection in multiple platforms and
languages’, Expert Systems with Applications, 158, p.113503.
Gangireddy, S.C.R., P, D., Long, C. and Chakraborty, T. (2020, July) ‘Unsupervised fake news
detection: A graph-based approach’, In Proceedings of the 31st ACM conference on
hypertext and social media (pp. 75-83).
Gupta, A., Kumar, N., Prabhat, P., Gupta, R., Tanwar, S., Sharma, G., Bokoro, P.N. and Sharma,
R. (2022) ‘Combating fake news: Stakeholder interventions and potential solutions’, Ieee
Access, 10, pp.78268-78289.
Hu, B., Sheng, Q., Cao, J., Shi, Y., Li, Y., Wang, D. and Qi, P. (2024, March) ‘Bad actor, good
advisor: Exploring the role of large language models in fake news detection’,
In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, No. 20, pp.
22105-22113.
Iqbal, A., Shahzad, K., Khan, S.A. and Chaudhry, M.S. (2023) ‘The relationship of artificial
intelligence (AI) with fake news detection (FND): a systematic literature review’, Global
Knowledge, Memory, and Communication.
Kaur, S., Kumar, P. and Kumaraguru, P. (2020) ‘Automating fake news detection system using
multi-level voting model’, Soft Computing, 24(12), pp.9049-9069.
Lai, C.M., Chen, M.H., Kristiani, E., Verma, V.K. and Yang, C.T. (2022) ‘Fake news classification
based on content level features’, Applied Sciences, 12(3), p.1116.
Meel, P. and Vishwakarma, D.K. (2020) ‘Fake news, rumor, information pollution in social media
and web: A contemporary survey of state-of-the-art, challenges and opportunities’, Expert
Systems with Applications, 153, p.112986.
36
Automatic AI for Detection of Fake News Student’s first and last name
Meesad, P. (2021) ‘Thai fake news detection is based on information retrieval, natural language
processing, and machine learning’, SN Computer Science, 2(6), p.425.
Merryton, A.R. and Augasta, G. (2020) ‘A survey on recent advances in machine learning
techniques for fake news detection’, Test Eng. Manag, 83, pp.11572-11582.
Molina, M.D., Sundar, S.S., Le, T. and Lee, D. (2021) ‘ “Fake news” is not simply false
information: A concept explication and taxonomy of online content’, American Behavioral
Scientist, 65(2), pp.180-212.
Paka, W.S., Bansal, R., Kaushik, A., Sengupta, S. and Chakraborty, T. (2021) ‘Cross-SEAN: A
cross-stitch semi-supervised neural attention model for COVID-19 fake news
detection’, Applied Soft Computing, 107, p.107393.
Patil, M., Yadav, H., Gawali, M., Suryawanshi, J., Patil, J., Yeole, A., Shetty, P. and Potlabattini, J.
(2024) ‘A Novel Approach to Fake News Detection Using Generative AI’, International
Journal of Intelligent Systems and Applications in Engineering, 12(4s), pp.343-354.
Prachi, N.N., Habibullah, M., Rafi, M.E.H., Alam, E. and Khan, R. (2022) ‘Detection of Fake
News Using Machine Learning and Natural Language Processing Algorithms [J]’, Journal
of Advances in Information Technology, 13(6).
Rohera, D., Shethna, H., Patel, K., Thakker, U., Tanwar, S., Gupta, R., Hong, W.C. and Sharma, R.
(2022) ‘A taxonomy of fake news classification techniques: Survey and implementation
aspects’, IEEE Access, 10, pp.30367-30394.
Seddari, N., Derhab, A., Belaoued, M., Halboob, W., Al-Muhtadi, J. and Bouras, A. (2022) ‘A
hybrid linguistic and knowledge-based analysis approach for fake news detection on social
media’, IEEE Access, 10, pp.62097-62109.
Setiawan, R., Ponnam, V.S., Sengan, S., Anam, M., Subbiah, C., Phasinam, K., Vairaven, M. and
Ponnusamy, S. (2021) ‘Certain investigatio’s of fake news detection from Facebook and
twitter using artificial intelligence approach’, Wireless Personal Communications, pp.1-26.
Sharma, D.K. and Garg, S. (2023) ‘IFND: a benchmark dataset for fake news detection’, Complex
& Intelligent Systems, 9(3), pp.2843-2863.
Sitaula, N., Mohan, C.K., Grygiel, J., Zhou, X. and Zafarani, R. (2020) ‘Credibility-based fake
news detection’, Disinformation, misinformation, and fake news in social media:
Emerging research challenges and Opportunities. pp.163-182.
Srinivas, J., Venkata Subba Reddy, K., Sunny Deol, G.J. and VaraPrasada Rao, P. (2021)
‘Automatic fake news detector in social media using machine learning and natural
language processing approaches’, In Smart Computing Techniques and Applications:
Proceedings of the Fourth International Conference on Smart Computing and Informatics,
Volume 2 (pp. 295-305). Springer Singapore.
Umer, M., Imtiaz, Z., Ullah, S., Mehmood, A., Choi, G.S. and On, B.W. (2020) ‘Fake news stance
detection using deep learning architecture (CNN-LSTM)’, IEEE Access, 8, pp.156695-
156706.
Zeng, J., Zhang, Y. and Ma, X. (2021) ‘Fake news detection for epidemic emergencies via deep
correlations between text and images’, Sustainable Cities and Society, 66, p.102652.
Zhang, X. and Ghorbani, A.A. (2020) ‘An overview of online fake news: Characterization,
detection, and discussion’, Information Processing & Management, 57(2), p.102025.
37
Automatic AI for Detection of Fake News Student’s first and last name
Zhou, X. and Zafarani, R. (2020) ‘A survey of fake news: Fundamental theories, detection
methods, and opportunities’, ACM Computing Surveys (CSUR), 53(5), pp.1-40.
38
Automatic AI for Detection of Fake News Student’s first and last name
39
Automatic AI for Detection of Fake News Student’s first and last name
nltk.download('stopwords')
tokenizer = Tokenizer()
tokenizer.fit_on_texts(df_str_text['text'].values)
vocab_size = len(tokenizer.word_index) + 1
print("Vocabulary Size :- ",vocab_size)
X = tokenizer.texts_to_sequences(df_str_text['text'].values)
max_length = 1000
# Padding
X = pad_sequences(X,maxlen = max_length, padding = 'post')
y = pd.get_dummies(df_str_text['label']).values
40
Automatic AI for Detection of Fake News Student’s first and last name
# Train SVM
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train_numeric, ytrain)
# Evaluate SVM
svm_predictions = svm_classifier.predict(X_test_numeric)
svm_accuracy = accuracy_score(ytest, svm_predictions)
print("SVM Accuracy:", svm_accuracy)
41
Automatic AI for Detection of Fake News Student’s first and last name
naive_bayes_predictions_onehot = np.zeros((naive_bayes_predictions.size,
naive_bayes_predictions.max()+1))
naive_bayes_predictions_onehot[np.arange(naive_bayes_predictions.size),
naive_bayes_predictions] = 1
else:
naive_bayes_predictions_onehot = naive_bayes_predictions
model = Sequential()
model.add(Embedding(vocab_size, 100, weights=[embedding_matrix],
input_length=max_length, trainable=False))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(16))
model.add(Dense(2, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())
# Calculate accuracy
lstm_accuracy = accuracy_score(y_test_labels, lstm_predictions_labels)
print("LSTM Accuracy:", lstm_accuracy)
42
Automatic AI for Detection of Fake News Student’s first and last name
print(lstm_classification_report)
data = {
'text': ["""In a groundbreaking discovery, a team of scientists has conclusively proven that the
Earth is, indeed, flat.
After years of research and experimentation, the team has debunked the centuries-old
misconception
that the Earth is a sphere. The findings have sent shockwaves through the scientific community
and have raised questions about the validity of previous space missions and astronomical
observations.""","""NASA has announced the discovery of a new exoplanet located in the
habitable zone of its host star,
with conditions similar to those found on Earth. The exoplanet, named Kepler-452b, is situated
approximately 1,400 light-years away from our solar system. Scientists believe that Kepler-452b
could potentially harbor liquid water and support life, making it an exciting target for future
exploration and study. The discovery marks a significant milestone in our quest to find
life beyond our own planet."""],
'label': [1, 0]
}
df = pd.DataFrame(data)
testing=vectorizer.transform(data['text'])
df['svm_test']=svm_classifier.predict(testing)
df['naive_test']=naive_bayes_classifier.predict(testing)
X = tokenizer.texts_to_sequences(data['text'])
X = pad_sequences(X,maxlen = max_length, padding = 'post')
labels=model.predict(X)
labels=np.argmax(labels, axis=1)
df['lstm_test']=labels
df
43