0% found this document useful (0 votes)
14 views43 pages

Dissertation cn600

The project focuses on developing an AI system to detect fake news using various machine learning and deep learning models, including GPT, Naïve Bayes, SVM, and LSTM. It emphasizes the importance of preprocessing data and evaluating model performance through metrics like accuracy and F1-score. The findings indicate that AI models, particularly LSTM, are effective in distinguishing between real and fake news, contributing to the integrity of digital information.

Uploaded by

yjdtbcmkt8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views43 pages

Dissertation cn600

The project focuses on developing an AI system to detect fake news using various machine learning and deep learning models, including GPT, Naïve Bayes, SVM, and LSTM. It emphasizes the importance of preprocessing data and evaluating model performance through metrics like accuracy and F1-score. The findings indicate that AI models, particularly LSTM, are effective in distinguishing between real and fake news, contributing to the integrity of digital information.

Uploaded by

yjdtbcmkt8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

SCHOOL OF ARCHITECTURE, COMPUTING AND ENGINEERING

Department of Engineering and Computing

Automatic AI for Detection of Fake


News

A report submitted in part fulfilment of the degree of

BSc (Hons) in Your Programme

Supervisor: NITISH CHOORAMUN

CN6000

27 February 2025
Automatic AI for Detection of Fake News Student’s first and last name

Abstract
The objective of this project is to figure out how advanced tools for artificial intelligence (AI) are
capable of helping detect fake news. The project's goal is to develop a strong and reliable system
that can identify the difference between real and fake news. In order to finish the task, several
distinct machine learning and deep learning models were implemented. Some of these are GPT,
Naïve Bayes, Support Vector Machines (SVM), and Long Short-Term Memory (LSTM) networks.

There are numerous significant phases that collectively make up the project method. A lot of pre-
processing steps are required for the written text in the first steps. For example, the HTML tags and
stopwords are filtered out, and the data is turned into tokens. In this manner, the written data is sure
to have been modified correctly and is now ready to be modeled. After that, it is divided into two
groups: training and testing. The models are subsequently shown what to do with the help of the
training data. To fully understand how effectively the models work, evaluation methods like the
F1-score, accuracy, precision, and recall are implemented. There are also stability tests that check
the extent to which the AI system can adapt to changes in the data and how it is distributed out.

The outcomes of this project demonstrate that AI-based models make it simpler to identify fake
news. The LSTM model is the most effective in terms of accuracy and overall classification
measurements. The SVM model, Naïve Bayes model, and LSTM model all function exceptionally
well. This indicates that models for natural language processing (NLP) could be helpful in this task.
It means that the more developed machine learning and deep learning methods can distinguish fake
news from the rest, they are more effective artificial intelligence techniques. This work is aimed at
decreasing the field of spread of fake news and gathering researchers’ fresh ideas to help to stop
spreading the fake news in online information systems.

2
Automatic AI for Detection of Fake News Student’s first and last name

Acknowledgments
I greatly appreciate the invaluable experience and guidance of my supervisor "Dr. Sujit Biswas"
who was an aide right through the entire work duration. In addition to providing their constant
support and invaluable advice, they did so in a way that steered the direction and implementation of
the study.

In addition to that, my colleagues and peers are valuable as they really do the analytical job,
sharing their insights that were necessary to polish the work. Moreover, I also want to point out to
the invaluable help of my college friends as well as friends in my business that were an important
source of useful comments and suggestions. Partnerships has played a pivotal role for the
attainment of the aim of this investigation.

At last, my family is the last people I want to thank for their patience and tolerance, they are the
source of the energy which is needed for my research together with a conducive environment for
my research.

3
Automatic AI for Detection of Fake News Student’s first and last name

C ontents

Abstract..............................................................................................................................2

Acknowledgments..............................................................................................................3

Chapter 1: Introduction..................................................................................................7

1.1 Background.................................................................................................................. 7

1.2 Problem Statement...................................................................................................... 7

1.3 Research Aim and Objectives......................................................................................7

1.3.1 Research Objectives.......................................................................................7

1.4 Significance of the Study............................................................................................. 8

1.5 Methodological Overview............................................................................................. 8

1.6 Structure of Report...................................................................................................... 8

Chapter 2: Literature Review.......................................................................................10

2.1 Introduction................................................................................................................ 10

2.2 Formulating Research............................................................................................... 10

2.2.1 Fake News.................................................................................................... 10

2.2.2 Automatic AI for Detection.............................................................................10

2.2.3 The role of Artificial Intelligence (AI) in addressing the challenge of fake news
10

2.3 Historical Perspective................................................................................................ 10

2.3.1 Examination of historical developments in the field of fake news detection.. 10

2.4 Evolution of AI applications in the context of combating misinformation....................11

2.5 Current State of Fake News.......................................................................................12

2.5.1 Analysis of the current landscape of fake News and its Impact on Society...12

2.6 AI in Fake News Detection......................................................................................... 13

2.6.1 AI techniques applied to identify and combat fake news...............................13

4
Automatic AI for Detection of Fake News Student’s first and last name

2.7 Review of Machine Learning Algorithms, Natural Language Processing (NLP), and
other AI Methodologies in the Context of Fake News Detection...........................................14

2.8 Comparative Analysis................................................................................................ 16

2.8.1 Evaluation of strengths and limitations of different models and algorithms.. .16

2.9 Methodology Development........................................................................................16

2.9.1 Examination of existing datasets used in AI-based fake news detection


research.................................................................................................................... 16

2.10 Identification of Key Approaches, algorithms, and Techniques Used in the research
methodology.......................................................................................................................... 17

2.11 Innovations and improvements in the system............................................................18

2.12 Summary................................................................................................................... 18

2.13 Research Gaps.......................................................................................................... 19

Chapter 3: Project Methodology..................................................................................20

3.1 Data Collection and Preprocessing............................................................................20

3.1.1 Data Sourcing............................................................................................... 20

3.1.2 Data-Preprocessing.......................................................................................20

3.2 Proposed Methodology.............................................................................................. 22

3.3 Feature Engineering.................................................................................................. 22

3.4 Model Development................................................................................................... 22

3.4.1 Selection of Algorithms..................................................................................22

3.4.2 Training and Validation..................................................................................23

3.5 Model Evaluation....................................................................................................... 23

3.5.1 Evaluation Metrics......................................................................................... 23

3.6 Comparative Analysis................................................................................................ 23

3.7 Implementation and Testing.......................................................................................23

3.8 Ethical Considerations and Limitations......................................................................23

Chapter 4: Results/Findings/Outcomes.......................................................................25

4.1 Data Analysis and Visualization.................................................................................25

4.2 Pre-Processing the Text............................................................................................26

4.3 Supervised Learning Models.....................................................................................27

4.3.1 Support Vector Machines (SVM)...................................................................27

4.3.2 Naïve Bayes.................................................................................................. 28

5
Automatic AI for Detection of Fake News Student’s first and last name

4.4 LSTM Model.............................................................................................................. 29

4.5 Prediction of News by GPT........................................................................................ 31

Chapter 5: Evaluation..................................................................................................33

5.1 Product Evaluation..................................................................................................... 33

5.2 Process Evaluation.................................................................................................... 33

Chapter 6: Conclusion.................................................................................................35

Reference List..................................................................................................................37

Appendix A - Initial Project Proposal...............................................................................40

Appendix B - Final Project Proposal................................................................................41

Appendix C – Source Code.............................................................................................43

List of Figures
Figure 1: Timeline for the evolution of fake news............................................................................12
Figure 2: An AI and ML-based methodology for detecting fake news and disinformation.............14
Figure 3: ML framework for Fake news detection............................................................................15
Figure 4: Trending Techniques to Detect Fake News.......................................................................17
Figure 5: Block Diagram...................................................................................................................22

6
Automatic AI for Detection of Fake News Student’s first and last name

Chapter 1: Introduction

1.1 Background
The proliferation of bogus news through digital media is highly detrimental. It is experienced
globally. There is something called "fake news" which is false information that is spread as real
news to trick, control, or change people's minds. The internet and social networks have made this
issue worse by speeding up the spread of fake and misleading information around the world. It's
important for people to think and act based on accurate information. It may have negative impacts
on public policies, and election results, and even contribute to increased aggression in individuals.

There are tech-based options being looked into, especially in the area of artificial intelligence (AI),
to stop the spread of fake news right away. AI has a lot of computer power and formulas that are
always changing, so it looks like it could be utilized to find and stop fake news on its own. It's very
important for our society that technology and the media work together to keep knowledge correct.

1.2 Problem Statement


The dynamic and sophisticated nature of fake news poses significant challenges to traditional fact-
checking methods, which are often labor-intensive and time-consuming. Artificial intelligence
(AI)-)-powered systems may provide scalable, quicker, and more effective ways to recognize and
remove deceptive content. Still, developing these type of artificial intelligence (AI) systems is
show some troubled with few difficult problems, like understanding language subtitles, context,
and the self-motivated strategies employed in disinformation operations.

1.3 Research Aim and Objectives


The name of this research project is "Automatic AI for Detection of Fake News," and its main
objective is to use AI for creating a system that can automatically predict fake news and stop fake
news. All the tricks of AI, machine learning, and natural language processing (NLP) will be
understood, which means fake news can be found. To get better at studying, it requires reading a
lot, gathering information, and figuring out what it all means. The main goal of the project is to
assist people in their larger objective of making sure that digital data is correct and dependable.

1.3.1 Research Objectives


 To develop and validate an AI model for detecting fake news using machine learning and
natural language processing techniques.

 To conduct a comprehensive literature review to establish a theoretical framework for AI


applications in fake news detection.

 To collect and preprocess relevant datasets for training and testing the AI model.

 To evaluate the effectiveness of the AI model in identifying fake news and analyze its
performance.

 To contribute to the field of digital media integrity by providing insights and


recommendations based on the research findings.

7
Automatic AI for Detection of Fake News Student’s first and last name

1.4 Significance of the Study


Studying this is crucial as it has the potential to revolutionize the current utilization of knowledge.
In the real world, it is very important, and it is also useful for the classroom because it helps AI
find fake news better. It is very important to keep the truth of information safe to avoid the harm
that false information might inflict, it is kept out of politics and public conversation. The
knowledge and abilities acquire in this course can be applied in a wide range of situations. In
future, they possibly lead jobs in the area related to writing, AI development, and data analysis.

1.5 Methodological Overview


The combination of qualitative and quantitative methodologies is working in this research very
well to achieve its objectives. The main goal achieving approach involves the development of an
AI model with machine learning (ML) and Natural Language Processing (NLP) techniques. This
process are written in various stages:

 Formulating Research Questions: Establishing clear research questions to guide the


investigation into AI application in fake news detection.

 Literature Review and Theoretical Framework: Conducting a comprehensive literature


review to build a solid theoretical foundation for the research.

 Data Collection and Preprocessing: Gathering relevant datasets and applying


preprocessing techniques to prepare the data for analysis.

 Methodology Development: Crafting a detailed research methodology outlining the


procedures for investigating the research questions, including the AI algorithms and
approaches to be employed.

1.6 Structure of Report


The foundation of this dissertation are written in following manner:

 Introduction: Outlining the research background, problem statement, aims, objectives, and
significance.

 Literature Review: A detailed analysis of existing literature in the fields of AI, NLP, and
fake news detection.

 Methodology: Describing the research design, data collection methods, AI model


development, and analysis techniques.

 Results and Discussion: Presenting the findings of the research, discussing the
implications, and comparing them with existing literature.

 Conclusion and Recommendations: By participating on primary findings, evaluating the


impact of research on the given field, recognising its limitations, and suggesting potential
directions for future inquiry.

 References: Creating a list of all the citation those are used in the dissertation.

 Appendices: It includes any related extra material for this research, like data samples, code
snippets, or comprehensive tables.

8
Automatic AI for Detection of Fake News Student’s first and last name

Chapter 2: Literature Review

2.1 Introduction
The popularity of wrong information on online platforms provides a significant threat to the
truthfulness in public conversations and the trustworthiness of information. This study provides in-
detailed analysis report related to the effective and efficient use of artificial intelligence (AI) in the
era of broadcasting a wrong information. There is only one primary objective is to observe various
AI and NLP techniques and regulate how to integrate them in a coherent way. The primary
objective is to develop a that type of system those are capable for accurately display truth and
defining the reliability of news articles. The day to day increasing impact of digital platforms on
public opinion, it is very important to develop strong approaches based on AI those easily and
automatically detect misinformation.

2.2 Formulating Research


Definition

2.2.1 Fake News


The phrase of "fake news" has grown in quick manner over time to cover a wide range of incorrect,
deceptive, or misleading content those are shared on various digital channels and networks. It is
very important to make a difference between intentional fraud and unintentional mistakes to
confirms the accuracy of research in this scenario. Exact detailed explanations provide a foundation
for understanding the multifaceted nature of fake news phenomena (Hamed, Ab Aziz, and Yaakub,
2023).

2.2.2 Automatic AI for Detection


Artificial intelligence (AI) is being used to spot fake news. This is a big change in how problems
with false information are handled. Computer programs called artificial intelligence (AI) use
complicated rules to understand what they read, spot trends, and look for signs of fake news. For
the generation and implementation of such autonomous solutions, machine learning, natural
language processing (NLP), and data analysis must collaborate (Buzea, Trausan-Matu, and
Rebedea, 2022).

2.2.3 The role of Artificial Intelligence (AI) in addressing the challenge of fake news
There's no doubt that fake news needs to be dealt with by artificial intelligence. Artificial
intelligence improves individuals' abilities by facilitating quick recognition and classification of
false or misleading information. This tool possesses other features that extend beyond the detection
of fake information. It also stops it from spreading. This part will talk about different ways AI can
be used, with a major focus on how it can make digital information ecosystems more reliable (Patil,
et al., 2024).

2.3 Historical Perspective


2.3.1 Examination of historical developments in the field of fake news detection.
The goal of this study by Sitaula et al., (2020), moves on finding fake news to figure out how truth
it is. it provides a valuable historical context. In the past, researchers primarily focused on
analysing the content of fake news and studying and making researches on how it works and

9
Automatic AI for Detection of Fake News Student’s first and last name

spreads through networks and various channels. This approach is used to investigate the
consistency has felt some changes, as Sitaula et al. analyse a new sources and writings. The study
shows how important an author's past links to fake news and the number of writers are work with
us and find out how trustworthy they are by looking at closely at fake news data those are
presenting in front of public. This changes in history provides ideas on related to the new way of
doing things that considers both content-focused factors and source-related hints about reliability.
These findings highlight needs to gain furthermore understanding on writing and consistency of
sources, specified the constantly changes on nature of fake news. Moreover, it recommended that
complete restoration of our strategy for identifying the wrong information may be needed.

The work of Gangireddy et al. (2020) on the use of a graph-based methodology for unsupervised
false news detection represents a significant landmark in the advancement of fake news detection.
Lots of fake information are find out with the help of supervised learning. For this, we needed a
large number of datasets those are written in correct manner. To solve the issues related on how
fake news spreads on social media platforms, this study come up with a new self-detection method
called GTUT. The methods provide services related to graph-based techniques such as feature
vector learning, biclique recognition, and label spreading to run this series in three major steps that
progressively increase the labelling process. Due to the limited availability of labelled historical
data, this study and research suggests novel and effective and efficient method for identifying
objects without any surveillance. Empirical experiments have demonstrated that GTUT surpasses
the current methodologies by a margin of over 10 percentage points in terms of accuracy. This
report suggests several possible paths for additional researches, it includes the incorporation of
emotions, analysis on social media connections, and this survey involved labelling inside the
graph-based framework. These additions main aim is to enhance the effectiveness and efficiency of
unsupervised detection process.

2.4 Evolution of AI applications in the context of


combating misinformation
This study is conducted by Ahmad et al. (2020) on fake news identification, which uses machine
learning joint methods, those are directly relate to the ongoing progress of AI applications on
wrong information warfare. It is clear mentioned inside the study that the World Wide Web and
social media sites have had a large impact on how people share information on multiple platforms.
The difficult task is automatically putting incorrect languages into the right category, the study uses
a mixture of machine-learning methods that main objective is to combining many algorithms and
characteristics. This study highlights the higher performance of individual models where joint
learning is applied on real-world datasets. The urgency for extra investigation is to address
unresolved challenges in the detection of false news is emphasized in the conclusion. Several
potential areas for future research are also suggested, it includes the implementation of real-time
credentials in videos and the credentials of critical elements in the distribution of false information.

10
Automatic AI for Detection of Fake News Student’s first and last name

Figure 1: Timeline for the evolution of fake news

According to Zhou and Zafarani's (2020), A detailed analysis on wrong information involves
considering its historical background and the growth made in artificial intelligence systems is
designed to identify it. The analysis indicates the fastest spread of false information is creating
harmful consequences on democracy, justice, and public trust on government. Journalism, political
science, computer science, and the social sciences should work together on projects that span
multiple fields. because this study provides aspects on how to find out fake news based on elements
it includes writing style, distribution method, and the consistency of sources. The conclusion
highlights the importance of the survey in classifying fake news, developing basic theories, and
identifying problems and locations that require more investigation. It gives them ideas and makes
people want to work together to create a system that can find fake news and explain it. In the other
paper Choraź et al., (2021), say that a full mapping study on advanced machine learning methods
for finding fake news can help us see how far AI has come in the last few decades in the fight
against fake news online. The paper examines fake news from many eras and locations, with a
primary emphasis on its current application in information warfare. It is hard to validate how false
news is the biggest reason for important societal issues. This study provides a full of guidance
because it provides answers based on the expert works, reviews, and stresses how to use smart
systems it provides important ways to find places where false information comes from. This study
provides suggestions on captivating examination the spread of wrong information, the size of
educational involvements helps to promote continuous learning and the requirements for
transparency in machine learning systems develop to identify and respond to the false news.

2.5 Current State of Fake News


2.5.1 Analysis of the current landscape of fake News and its Impact on Society
A paper by Meel and Vishwakarma, (2020), investigated the currently running concerns relating to
information like pollution, use of internet, false information, and falsehoods. This study provides
aspects that how harmful, dangerous, and unreliable content changes the lives, views and the vision
of billions of users in various different and indistinct ways. This article conducts an extensive
analysis of containment technologies and strategies for harmful information by proposing a
taxonomy to classify it into distinct phases. The importance of nurturing collaboration among
legislators, researchers, and society as a whole to enhance the dependability and sustainability of

11
Automatic AI for Detection of Fake News Student’s first and last name

the online information ecosystem is underscored in the conclusion. Moreover, This shows the risks
that come with the spread of useless information. What the review mostly talks about are study
gaps. These are mostly in how false information spreads on various platforms and languages, as
well as how networks change over time. These voids provide academics with crucial new insights
and highlight areas that require further investigation.

The study by Zhang and Ghorbani (2020) examines the huge amount of fake news on the internet
and how it changes society. The US presidential election of 2016 is a prime example of this.
Identifying fake news could be difficult, the authors admit, because there is so much information
online. However, they stress how important it is for people and technology to work together. The
study takes a close look at the current ways of finding fake news, paying special attention to things
like user, content, and context. It also shows places where more study can be done to improve
detection frameworks and datasets. The main goals of the survey are to identify and categorize
forms of misinformation, evaluate various methodologies for identifying it, and identify specific
areas that necessitate additional research to enhance online surveillance and detection systems
designed to counteract false news.

The investigation conducted by Molina et al., (2021), analyses the changing definition of "fake
news," which goes beyond simple lies to involve a variety of online content kinds. Their seven-
category system helps to see the bigger picture. Satire, fake news, and amateur journalism are all
part of it. This paper shows a classification of false news based on its message, source, structure,
and network traits. This will enhance understanding of its essence. They stress the importance of
taking into account the objective and methodology of machine learning. They also say that the
amount of information that can be gathered and the subjects that can be studied are limited. They
think that the computer and social sciences should work together to make it better at finding fake
news. More statistical testing of features is what they desire.

2.6 AI in Fake News Detection


2.6.1 AI techniques applied to identify and combat fake news.
A study by Gupta et al. (2022), Explores the growing phenomenon of an "infodemic" characterized
by the proliferation of incorrect information, providing a clear understanding of its various effects
on society. Fake news, which is spread via social media and is motivated by prejudice in the social,
religious, political, and economic spheres, threatens law and order. The paper provides a thorough
overview of current AI-based detection systems and social network analysis while recognizing the
complex technological, psychological, and financial difficulties in the fight against false
information. Stakeholder interventions are examined, from government regulation to user
awareness campaigns. To strike a balance between user privacy concerns and national security, the
authors suggest in-app access to an independent news-verification service. They draw attention to
the continuous technological difficulty in confirming multimedia content and recommend using
multiple methods until a complete solution is found.

12
Automatic AI for Detection of Fake News Student’s first and last name

Figure 2: An AI and ML-based methodology for detecting fake news and disinformation

The growing risks of fake news and disinformation (FNaD) penetrating social media and online
platforms, which can seriously affect decision-making and disrupt supply chains, are discussed by
Akhtar et al. (2023). The study draws attention to the paucity of research on creating FNaD-
specific AI and ML models to reduce supply chain disruptions (SCDs). Based on a blend of
artificial intelligence, machine learning, and case studies from Pakistan, Malaysia, and Indonesia,
the authors suggest a FNaD detection algorithm intended to prevent SCDs. The approach shows
efficacy in managerial decision-making, utilizing a variety of data sources. The study adds to the
literature on supply chains and AI-ML. It provides useful insights and recommends future research
directions, emphasizing the need for a focus on particular FNaD and supply chain operations, the
integration of operational performance measures, and longitudinal studies to explore evolving
SCDs.

2.7 Review of Machine Learning Algorithms, Natural


Language Processing (NLP), and other AI
Methodologies in the Context of Fake News Detection
Prachi et al. (2022) explores the growing difficulty of differentiating between real and fraudulent
news that is spreading across the internet, particularly on social media. NLP, deep learning, logistic
regression, decision trees, SVM, naive Bayes, LSTM, and bidirectional encoder presentation from
transformers (BERT) are among the machine learning, NLP, and deep learning approaches utilized
in the study. There were a number of success factors used to match these models up. The LSTM
model was 95% accurate, and the NLP-based BERT model was 98% accurate. This paper shows
that how feature extraction methods help to make better for more difficult classifications and
multilingual use. It also stresses that how automatic system are important that can quickly identify
fake news. On the other hand, Meesad's (2021), study puts forth an in-detailed framework for the

13
Automatic AI for Detection of Fake News Student’s first and last name

credentials of wrong information in Thailand it is possible way to tackle the unescapable issue of
false informations. At the time of two critical model development and data acquisition—machine
learning, natural language processing (NLP), and information recovery are effectively used in this
research. This study executes several machine learning models that organize content from Thai
online news sources by using web-crawler information extraction techniques and natural language
processing. LSTM, which gives 100% on test set accuracy, memory, precision, and f-measure, it is
knowns as the best model. Once the research is completed, a web app that automatically identify
fake news online will be launched. This shows that the problem of fake news needs flexible
solutions.

Figure 3: ML framework for Fake news detection

14
Automatic AI for Detection of Fake News Student’s first and last name

In their survey, Merryton and Augasta, (2020), address the important issues of fake news on social
media, these show that machine learning, and especially deep learning, can be used in this
situation. The authors draw attention to the growing difficulty in separating authentic
communications from phony ones, particularly during occasions such as general elections when
political parties use social media to disseminate possibly false material widely. The research
explores a range of machine learning techniques, comparing the effectiveness of deep learning—a
subset that emulates the functioning of the human brain—with more conventional methods. The
authors believe that deep neural networks show the potential to outperform standard methods,
especially when dealing with complicated applications and big data volumes. The report
summarises effective categorization techniques for identifying fake news, highlighting the possible
overlap between traditional machine learning techniques and deeper learning approaches.

2.8 Comparative Analysis


2.8.1 Evaluation of strengths and limitations of different models and algorithms.
A study by Seddari et al., (2022), addresses the rapidly growing issues of misinformation by
contributing significantly to the vital field of social media fake news identification. Their hybrid
technology is a mix of knowledge-based and language methods. It has features like analyzing
mood, word count, title, and cutting-edge fact-checking tools. The second part has press coverage,
the website's image, and comments from reliable sources that have been checked to make sure they
are correct. The system only uses eight traits, which is interesting because more cutting-edge
methods use more than that. Machine learning techniques like Random Forest and Logistic
Regression are used to train the suggested algorithm on the Buzzfeed Political News dataset. It
works very well with 94.4% accuracy. The author suggested that for future developments it looks
elements based on style and vision to support the detection system in against of various types of
fake information. On the other hand, the paper of Kaur et al., (2020), addresses that the increasing
risk of fake news on the internet and it highlights the strong verification techniques in-place since
incorrect information can be circulated Fastly on various social media platforms. It introducing a
multi-level voting ensemble model, this study evaluates the performance of twelve machine
learning classifiers, it includes Logistic Regression, Linear Support Vector, and Passive
Aggressive, in conjunction with three feature extraction methods. Looking at the many times it gets
it wrong, it's interesting to see that their model does better than each predictor on its own. Tests on
three datasets showed that it's much better, and it can be used right now to find fake content on
social media. The authors outline their goals for future research, which include building a web-
based GUI for real-time classification and annotated datasets for image-based false news
identification. This research, which has the potential to stop the spread of misleading information
for the benefit of society, is funded by the Visvesvaraya PhD Scheme.

2.9 Methodology Development


2.9.1 Examination of existing datasets used in AI-based fake news detection
research
a study by Sharma and Garg (2023) explores the issue of identifying bogus news, emphasizing the
dearth of comprehensive benchmark datasets and paying particular attention to news from India.
By merging textual and visual content from 2013 to 2021, the authors bridge a sizable gap with the
launch of the IFND (Indian Fake News Dataset). The dataset has been carefully chosen, and a
sophisticated augmentation technique together with latent Dirichlet allocation (LDA) is used for
topic modeling. They experimenting with various multi-modal and Machine Learning (ML)
classifiers to show how dataset is beneficial for us. This work defines the limitations of the IFND
dataset and the significance of social information context. But it validates the dataset those
accelerate the identification of fake news, and making it a valuable resource for future research.

15
Automatic AI for Detection of Fake News Student’s first and last name

Figure 4: Trending Techniques to Detect Fake News

The comprehensive evaluation of existing literature, known as a systematic literature review (SLR),
was conducted by Iqbal et al. (2023) to reveal the complicated interaction between Fake News
Detection (FND) and artificial intelligence (AI). It uses "Preferred Reporting Items for Systematic
Reviews and Meta-Analyses" to look at 25 studies that understand and reviewed by other experts.
From what was found, FND and AI are linked. Peoples very well-known about that receiving fake
information can hurt them and it also effect on its health and it make a risk related to the health. It
is very important to be able to tell people apart now. As effective and efficient countermeasures to
false information, digital literacy, fact-checking websites, automated technology, and big data
analytics are all helpful assets. Moreover, apart from contribution of researchers in these theoretical
visions, this study also transfers managerial suggestions for IT specialists, legislators, and
educators. As a result, it creates a crucial standard to prevent the widespread supply of false
information on social media platforms. Research by Merryton and Augasta (2020) investigates the
importance of machine learning in handling universal issue of fake messages on social media. They
stress how important it is to be able to tell the difference between real and fake news, especially
when it comes to politics. The study looks at a lot of different types of machine learning, but deep
learning is the one that gets the most attention.
A substantial subset of machine learning, deep neural networks autonomously derive high-level
features from unprocessed data to thrive in intricate applications. The survey-style study offers
insights into various approaches used in false news detection research and emphasizes the benefits
of deep learning approaches over traditional ML techniques.

2.10 Identification of Key Approaches, algorithms, and


Techniques Used in the research methodology
More and more false information is being shared on social media sites, especially Facebook and
Twitter, as discussed by Setiawan et al. (2021), Those who emphasise the need of tools that can
swiftly identify issues. A study demonstrates the utilisation of Machine Learning to enhance the
categorization of news articles, leveraging Artificial Intelligence (AI). The study employs natural
language processing and achieves impressive results, accurately predicting outcomes 91.23% of the
time. This is achieved by utilizing the frequency of words in text documents and a hybrid support

16
Automatic AI for Detection of Fake News Student’s first and last name

vector machine. Another important point made by the study is that more information, like about the
author, is needed to better spot fake news. The writers believe that in the future, computer fact-
checking models will be used, so they focus on knowledge-based approaches to improve accuracy
and help users understand better. Rohera et al. (2022) examine the widespread problem of fake
news spreading on social media platforms and highlight its negative effects in their study. The
researchers provide a taxonomy of current methods for identifying fake news, with an emphasis on
social media sites including Twitter, Facebook, WhatsApp, and Telegram. With the help of a self-
aggregated dataset, the study trains four machine learning models: LSTM, Random Forest (RF),
Passive Aggressive Algorithm, and Naive Bayes (NB). 92.34% of the time, LSTM can tell the
difference between real and fake news. The study advocates for the implementation of a hybrid
approach that integrates NB and LSTM techniques to improve the accuracy of detection.

2.11 Innovations and improvements in the system


New ideas and better systems are still very important when it comes to finding fake news. Experts
say that the current methods could be made better by including complicated language analysis to
help people understand false information better. Additionally, it is necessary to integrate real-time
user feedback systems to dynamically adjust detection algorithms. To adopt a thorough
methodology, it is recommended to employ multimedia analysis, encompassing the scrutiny of
both images and videos with textual content. It has been suggested that explainable AI models be
made so that users can trust them more and find them easier to understand. All of these changes
make the system stronger against the different ways that people who spread false information are
trying to get around it. It means that fake news will be found more correctly and with more trust.

2.12 Summary
Different areas' methods have made it much easier to spot fake news, as we can see from the
research that has already been done. To find false information on digital platforms, academics use a
mix of mixed models, machine learning, and natural language processing. Robust identification
methods comprise LSTM, SVM, and fact-verification functionalities. Furthermore, the impact of
geographical limitations is alleviated by establishing reference datasets such as the Indian Fake
News Dataset. The accuracy of the models has grown, but the complexity of changing news is still
a problem. Innovations include letting users give feedback, changing things in real-time, and
making multimedia analysis better. The study's findings highlight How important it is to develop
detecting methods to counteract false news's constant evolution.

Research Accuracy with Each


Reference Model Used Model Dataset Used

Gangireddy et GTUT (Graph-based Technique for >10% accuracy gain


al. (2020) Unsupervised Detection) over state-of-the-art Real data

Prachi et al. LSTM: 95%, BERT: Not explicitly


(2022) LSTM, BERT 98% mentioned

Thai online news


Meesad (2021) LSTM 100% sources

17
Automatic AI for Detection of Fake News Student’s first and last name

Seddari et al. Buzzfeed Political


(2022) Random Forest, Logistic Regression 94.4% News dataset

Setiawan et al. Not explicitly


(2021) Hybrid Support Vector Machine 91.23% mentioned

Rohera et al. LSTM, Random Forest, Passive Self-aggregated


(2022) Aggressive Algorithm, Naive Bayes LSTM: 92.34% dataset

2.13 Research Gaps


Even though methods have improved, there are still big gaps in study when it comes to finding fake
news. A critical gap is required for larger and more varied datasets, mainly those that represent
complex cultural situations. Although ML models exhibit potential, the lack of research on the
models' interpretability and transparency poses a risk to user confidence. Literature also raises
concerns about how inadequate it is to deal with the changing times of fake news. It is known that
temporal traits are combined and that false stories change over time. To detect false news more
successfully, future research projects try to close these gaps in the literature by utilizing a
comprehensive approach that Considers cultural idiosyncrasies, models that may be understood,
and changes over time.

18
Automatic AI for Detection of Fake News Student’s first and last name

Chapter 3: Project Methodology


To mitigate the risk of spreading fake news in the social media network, it is important to segregate
the fake and real news by use of the AI techniques. The primary goal of this research is to
implement the AI techniques to develop an automated systems that is capable of recognizing the
fake and legitimate news over social media. This section of the report provides a comprehensive
presentation for the methodology followed for building such system for recognition of fake and
real news. The approach presented in this study, make use of both NLP and ML techniques for the
classification of two labels of news. The complete methodology chapter entails on data collection,
preprocessing, model development, and evaluation, culminating in a robust AI model that
addresses the challenges of fake news detection.

3.1 Data Collection and Preprocessing


3.1.1 Data Sourcing
The collection of the dataset is the primary requirement for proceeding further. A diverse set of
data that must include both fake and real news articles. Publicly accessible site will be used for the
collection of the dataset. The primary source that is considered to be used for the collection of the
dataset is Kaggle.

Fake and real news dataset: This dataset is available on the Kaggle site that contains separate
files for fake and real news articles. Considering the appropriate requirements for the dataset, this
data is found suitable to use in this project.

There are approximately 40,000 news articles are present in this dataset folder which is sourced
from the Kaggle site. The link for the dataset is attached at the bottom of this page. The new
articles present in the dataset is separated in two groups as real news articles and fake news articles.
The objective of using this dataset is to supervise the proposed machine learning models and after
training of the models, the subset of dataset will be used for the performance evaluation using
different performance metrics. Individual articles in this dataset is labelled accordingly that provide
a clear distinction that assists in training the supervised machine learning models.

There are suitable number of instances are present in the dataset for both labels of the news making
is large volume dataset that ensures that model trained on this dataset can learn the intricate
difference in language, style, and presentation.

The inclusion of both fake and real news articles in significant volumes ensures that models trained
on this dataset can learn the nuanced differences in language, style, and presentation that typically
distinguish factual information from misinformation or disinformation.

-------------------------------

https://www.kaggle.com/code/madz2000/nlp-using-glove-embeddings-99-87-accuracy

3.1.2 Data-Preprocessing
Data preprocessing is too very important step to prepare the raw data for analysis. This phase
involves some aspects those are written in this manner:

The preprocessing steps records are very crucial in preparing text data for machine learning (ML)
and natural language processing (NLP) tasks, similarly for applications like fake news detection.
Here is a more detailed explanation of each step:

19
Automatic AI for Detection of Fake News Student’s first and last name

1. Cleaning

Text data, specifically collected from the web, it contains lots of irrelevant information that could
be misleading or unhelpful for analysis. Cleaning the data involves removing these unnecessary
parts to confirm that the machine learning model focuses only on meaningful content. It includes:

HTML tags: Meanwhile web pages are created by using HTML, scraping a content directly from
them can result in a mix of content and HTML markup. HTML tags do not contribute to
understanding the text's meaning and are thus removed.

Advertisements: Ad content are mixed-up with the real news content, which can skew the
analysis. It’s very important to remove these to focus on the news text itself.

Non-textual elements: It includes images, videos, and any implanted multimedia. Then our focus
is on textual analysis, these elements are removed.

2. Normalization

Normalization is the process of transforming text into a single canonical form that it might not have
had before. It reduces the complexity for NLP tasks. This step includes:

Converting to lowercase: It confirms that the same words are known as identical regardless of
their place in a sentence or their usage, e.g., "The" and "the" are treated the same.

Removing punctuation and special characters: Punctuation marks and special characters are
introducing extra complexity without contributing meaningfully to understanding the text's
meaning. Removing them simplifies the data.

3. Tokenization

Tokenization is the process of splitting a text object into smaller units known as tokens.
Examples of tokens can be words, characters, numbers, symbols, or n-grams . This stage is
foundational for text analysis as it transforms a text from a string of characters into a list of tokens
that can be analysed individually.

4. Stop Words Removal

Stop word removal is one of the most used preprocessing steps across different NLP applications.
The idea is simply removing the words that occur commonly across all the documents in the
corpus. Typically, articles and pronouns are generally classified as stop words, such as "the", "is",
"at", "which", and "on". These words are removed for reducing the dataset size and improve
processing speed. On the behalf of fake news detection, focusing on more meaningful words must
improve the model's ability to learn discriminative features.

5. Stemming and Lemmatization

Both stemming and lemmatization are techniques used to reduce words to their base or root form
but in slightly different ways:

Stemming: Stemming is the process of removing the last few characters of a given word, to
obtain a shorter form, even if that form does not have any meaning . It is a basic experimental
process that cuts the ends off words based on common prefixes or suffixes that can be found in an
inflected word, leading to a reduced form called the "stem".

Lemmatization: The purpose of lemmatization is same as that of stemming but overcomes the
drawbacks of stemming. Its main aims is to remove inflectional endings only and to return the

20
Automatic AI for Detection of Fake News Student’s first and last name

improper or vocabulary form of a word, known as the "lemma". Lemmatization is classier and
using a vocabulary and morphological analysis, thus easily handling irregular words better.

These preprocessing steps is very important for reducing the complexity of the text data, This
focuses on the most meaningful elements, and at the end improving the performance of machine
learning models in tasks like fake news detection.

3.2 Proposed Methodology

Figure 5: Block Diagram

3.3 Feature Engineering


Feature engineering is the process of extracting features (characteristics, properties, and attributes)
from raw data to support training a downstream statistical model. It includes technical elements
such as (e.g., use of sensationalist language), content features like (e.g., subjectivity, sentiment),
and metadata like (e.g., author reputation, source credibility). Advanced NLP techniques such as
TF-IDF (Term Frequency-Inverse Document Frequency) and word embeddings (e.g., Word2Vec,
GloVe) remain employed to transform textual data into a format suitable for ML algorithms.

3.4 Model Development


3.4.1 Selection of Algorithms
A variety of Machine Learning (ML) and Natural Language Processing (NLP) algorithms are
discovered to identify the most useful combination for fake news detection. These findings include:

21
Automatic AI for Detection of Fake News Student’s first and last name

Supervised Learning Models: Such as Logistic Regression, Support Vector Machines (SVM), and
Naive Bayes classifiers, are very well known for their effectiveness and efficiency in text
classification tasks.

Deep Learning Models: It include Convolutional Neural Networks (CNNs) and Recurrent Neural
Networks (RNNs), with a focus on Long Short-Term Memory (LSTM) networks, to capture the
sequential nature of textual data.

Ensemble Methods: To interconnect the multiple models for improves the prediction accuracy,
such as Random Forests and Gradient Boosting Machines.

3.4.2 Training and Validation


The collected dataset will be divided into training, validation, and test sets. This training dataset is
used to train a model, whereas the validation set helps to guide the hyperparameter modification
and model selection process. Cross-validation techniques are useful for confirming the
generalizability and robustness of the choosing models.

3.5 Model Evaluation


3.5.1 Evaluation Metrics
The performance of AI model is measured by using a range of metrics, it involves accuracy, recall,
precision, F1-score, and Receiver Operating Characteristic (ROC) curve. These metrics provides a
deep understanding of the model ability to identify fake news correctly.

3.6 Comparative Analysis


The developed model's performance will be benchmarked against existing models and state-of-the-
art approaches in fake news detection. This comparative analysis will be highlighted the strengths
and limitations of the proposed system, it providing the good understandings into the areas for
further improvement.

3.7 Implementation and Testing


The final model will be executed in a software application, it providing a user-friendly interface for
entering news articles, and receiving forecasts on their authenticity. The application will experience
hard testing to ensure its reliability and usability in real-world scenarios.

3.8 Ethical Considerations and Limitations


Ethical considerations will be dominant throughout the investigate process, especially about data
privacy and the imaginable significances of misclassification. The limitations of the proposed
methodology, including potential biases in the training datasets and the challenges of adapting to
evolving disinformation strategies, will be acknowledged, and addressed.

This methodology provides a well-structured approach on developing an AI-driven system for the
detection of fake news. By merging advanced Machine learning (ML) and Natural Language
Processing (NLP) techniques with a complete evaluation framework, this estimated system main
objective is meaningfully improve the ability to identify and ease the spread of half-truth. Future

22
Automatic AI for Detection of Fake News Student’s first and last name

work will focus on purifying the model through continuous learning and variation to new forms of
fake news, confirming the system remains actual in the ever-evolving digital landscape.

23
Automatic AI for Detection of Fake News Student’s first and last name

Chapter 4: Results/Findings/Outcomes
This chapter addresses the study's real results and findings, which examined into how to mark news
stories as being reliable or not. Several types of analyses of data and machine learning models are
employed to look at an impartial assessment of the methods and the efficacy of each model. The
first part of the chapter presents a summary of the aspects of the dataset. After that, it discusses in
more detail about the findings that were discovered by showing and examining the data. This is
followed by testing three supervised learning approaches to see how effectively they may classify
news stories. Neural networks such as Long Short-Term Memory (LSTM), Support Vector
Machines (SVM), and NaÅve Bayes. The positive and negative aspects of each model are
discussed about in detail, and evaluation determines are used to provide useful information. This
helps to move text methods for classification forward in the domain of news categorization.

4.1 Data Analysis and Visualization


The information was examined and mapped out at the beginning of the study to learn additional
information regarding its features. Different illustrations have been developed to aid individuals in
comprehending the distribution of names, word counts, and author evaluations.

1. Distribution of Labels: A histogram was utilized to display how the names in the data
were unevenly distributed. The two labels, 0 and 1, were spread out pretty equitably, as
shown by the histogram. This suggests that there are an adequate number of articles in both
the "reliable" and "unreliable" categories, with about 10,000 articles in each category.

2. Word Count Distribution: An illustration was made to demonstrate how the number of
words in the stories was spread out. It was almost impossible to find articles with word
counts over 2,500. The number of stories with word counts over 1,000 dropped sharply.
This means that the dataset had a lot of short items.

24
Automatic AI for Detection of Fake News Student’s first and last name

3. Author Analysis: A bar chart has been created to show the top 10 authors based on the
number of articles they have written. The most commonly recognized author was "Pam
Key," followed by "admin" and "Jerome Hudson." This meant that each of these
individuals had added a great deal to the knowledge.

4. Text Length vs. Label: A box plot was used to see how the amounts of text on two labels
were spread out and compare them. Articles that were reliable (label "0") had a narrow
spread of low word counts with few outliers. Articles that were not trustworthy (label "1"),
on the other hand, had an occasionally higher average word count with a couple of very
high counts.

4.2 Pre-Processing the Text


Text pre-processing was an essential phase in getting the text data prepared to be used in models. It
eliminated the use of HTML tags, stopwords, as well as tokenization. The NLTK tool made it
remarkably easy to obtain the English stopwords database and utilize it. The written content was

25
Automatic AI for Detection of Fake News Student’s first and last name

further cleaned up by eliminating a number of stopwords. The objective of the procedure was to get
rid of noise while enhancing features better so that they could be examined later. Also,
BeautifulSoup was employed to get rid of the HTML tags in the text, leaving only the text that was
significant to be expanded on further.

It was very essential to tokenize the text data so that algorithms for machine learning could
interpret it as a number code. The Tokenizer class from the Keras library was employed to fit the
text data, and it was determined that the size of the vocabulary was 237,927 words. It was trimmed
to make certain that all the inputs were the same size, and 1000 words were the largest string that
could be utilized. It was also easy to add helpful details from an extensive quantity of text when
GloVe pre-trained word embeddings were utilized. The GloVe Twitter dataset was loaded, which
has 1,193,514 word vectors with 100-dimensional embeddings. Pre-processing steps transformed
the text data so that it could be employed later for models and analysis. This provided a strong
foundation to create machine learning models that can identify fake news and function properly.

4.3 Supervised Learning Models


4.3.1 Support Vector Machines (SVM)

Model Training and Evaluation

The SVM algorithm did an outstanding task of organising articles into groups, as can be seen by its
94.13% score on the test data. The total number of errors was approximately the same for both
classes, corresponding to the confusion matrix and classification report.

According to the confusion matrix, the SVM model appropriately categorised 3212 fake news
articles and 3238 real news articles. There was an appropriate amount of error between the two
classes, as it wrongfully classified 214 real articles as fake and 188 fake articles as real.

Visualization of Performance

The confusion matrix looked superior as a heatmap, which made it simpler to determine how well
the model performed. The heatmap, which presented correct and wrong classifications of real and
fake news articles, emphasized the balanced error dispersion.

26
Automatic AI for Detection of Fake News Student’s first and last name

4.3.2 Naïve Bayes


Model Training and Evaluation

The test data demonstrated that the Naïve Bayes classifier was able to identify the distinction
between real and fake articles because it got 87.41% of them correct. Despite experiencing a higher
error rate than SVM, Naive Bayes showed significant improvements in article classification.

According to the confusion matrix, Nave Bayes appropriately categorized 3091 fake news articles
and 2898 real news articles. However, it has been seen that a higher error rate in misclassifying real
articles as fake (335) and fake as real (528).

27
Automatic AI for Detection of Fake News Student’s first and last name

Visualization of Performance

A heatmap of the confusion matrix for Naïve Bayes demonstrated the extent to which it worked in
the same fashion that SVM did. It was simple to see which labels were wrong on the
heatmap which demonstrated where the accuracy and general classification were missing.

In conclusion, both SVM and Naive Bayes classifiers demonstrated amazing capabilities for article
classification, with SVM demonstrating slightly more favorable outcomes in terms of accuracy and
uniformly distributed error. Nave Bayes was able to identify the difference between fake and real
news articles regardless of their higher error percentages. These findings show how helpful
machine learning algorithms are for determining how true news articles are and then classifying
them.

4.4 LSTM Model


Model Architecture and Training

The LSTM neural network, defined by its multi-layer architecture, carried out extremely well,
accurately classifying 96.51% of the test data. It has been demonstrated just how effective the
LSTM model is at text classification by performing more effectively than both the SVM and Naïve
Bayes algorithms.

28
Automatic AI for Detection of Fake News Student’s first and last name

The model is composed of an embedding layer, 3 LSTM layers, and a thick layer for classification.
It makes use of word embeddings that were previously learned to show words in a vector space that
continues on and on. The linear dependencies in the raw data are selected up by the LSTM layers.
This allows the model to learn how to cope with long-term dependencies successfully.

As it was trained, the LSTM model experienced 50 epochs, which made it stronger over time. They
learned how to employ the Adam algorithm to determine the best number for the cross-entropy of a
binary loss function. A list of the model's factors and layers in the report demonstrated how it had
been assembled together.

Performance Evaluation

The LSTM model performed astonishingly well, correctly identifying 3318 fake news articles and
3295 real news articles. It was precise 96.51% of the time and had high precision, recall, and F1-
score for both classifications.

29
Automatic AI for Detection of Fake News Student’s first and last name

This matrix demonstrated that there were just a few wrong classes, which added to the
demonstration of the manner in which the model performed. It was very accurate given that only
108 fake articles were categorised as real and 131 real articles were recognized fakes.

The heatmap of the confusion matrix made it easy to determine how well the model performed.
Darker shades demonstrated correct labels, while lighter shades indicated wrong ones. LSTM
performs well at text classification tasks because it is accurate and has very few errors. It is
particularly efficient at telling the difference between fake and real news articles.

4.5 Prediction of News by GPT


In the final phase of the study, GPT was employed to make predictions about two sample news
articles. After that, the labels that were assigned to the articles were compared to these predictions
to see how effectively the model performed all around.

The first news story indicated that the Earth is flat. It was considered an important discovery, but it
was not believed to be true (label "1"). It is remarkable that all three models—SVM, Naïve Bayes,
and LSTM—constantly corroborated with the label's assessment that this story lacked
dependability.

30
Automatic AI for Detection of Fake News Student’s first and last name

The second news story, which had been designated as reliable (label "0"), was about NASA's
discovery of an unknown planet that might have sufficient drinking water for life. Additionally, all
three models generated identical predictions, which is why the article was correctly marked as
reliable.

The SVM, Nave Bayes, and LSTM models all did exceptionally well at predicting what was going
to occur this time. This shows how effectively they are able to identify the difference between
articles that they can trust and ones that they shouldn't. This consistency indicates how well the
models can classify news articles. This demonstrates that they may assist in stopping the global
distribution of fake news and false information.

31
Automatic AI for Detection of Fake News Student’s first and last name

Chapter 5: Evaluation

5.1 Product Evaluation


Accuracy Assessment

The AI-based system for identifying fake news has been put through a lot of tests, and all of the
models generated results that were outstanding. Among the models, the Support Vector Machines
(SVM) model was 94.13% right, the Naïve Bayes model was 87.41% right, the LSTM model was
96.51% right, and the GPT model always provided precise predictions. The models are capable of
identifying the difference between real and fake news articles as they have high accuracy rates.
This means that they might assist in fight false information.

Performance Metrics

In addition to accuracy, additional performance metrics such as precision, recall, and F1-score were
additionally employed to completely evaluate the models. All the models did reasonably well, but
the LSTM model did extremely well. It had the highest accuracy, recall, and F1-score. This
demonstrates that not only can it correctly classify articles into groups, but it is also effective at
cutting down on mistakes. This contributes to the system to identify fake news more accurate
overall.

Robustness Testing

There were evaluations of the AI system's robustness by providing the models various data sets and
settings to work with. It was obvious that the models were capable of handling changes in how the
data is distributed and what it includes because their accuracy scores were always high across
various data sets. Another test that demonstrated the AI system was strong was a sensitivity
assessment, which investigated what happened to model performance when specific variables were
changed. In all circumstances, the results demonstrated the same level of success.

Comparative Analysis with Baseline Models

A study was conducted to see how well the basic models and the new AI models worked together.
In terms of accuracy, precision, recall, and F1-score, the results revealed that the AI-based models
did much better than the baseline models. These results demonstrate that advanced machine
learning and deep learning methods are more effective at finding fake news. The difficulties that
lies and false information cause might be simple to handle if these methods were utilized.

5.2 Process Evaluation


Methodological Review

A lot of attention was paid to what steps to include in the steps that could implement a secret AI
system that can identify fake news. The phases were that of data being prepared, choosing a model,
instructing it, using it testing and then making practical uses. The process of the system to work in
a perfect flow as well as a reliable type was a lot of planning and thinking in order to better my
chances of making it. A patterned strategy is another way of tackling the hard work of
differentiating fake news from true one.

Challenges Faced

32
Automatic AI for Detection of Fake News Student’s first and last name

The AI algorithm is complicated to develop and check due to some difficulties. The information
categorization was a big challenge; accuracy was critical; the model had to be further enhanced;
and computational shortages were too apparent. Data improvement techniques advanced model
frameworks, and algorithms for optimization are some of the creative and fresh methods that these
issues have been solved. As the preceding instances show, it is crucial to be flexible and
imaginative when concerns arise.

Lessons Learned

AI systems that can identify fake news can now be used owing to the project's helpful lessons and
learned facts. After a lot of work, the right model had been selected and the data was reviewed to
make sure it was correct. It additionally advised to keep trying and improve so that it continues to
uncover fake news even if things evolve. This demonstrates just how important it is to be flexible
and receptive to new thoughts in a field that is always developing.

Future Directions and Improvements

There is a great deal of optimism that studies is going to soon result in significant advancements.
As per the study, discovering new trends of false information quickly could be accomplished by
adding real-time monitoring, and combined techniques in machine learning could help the model
do superior. Everyone who employs the model will comprehend it better and find it easier to
operate if there are methods for users to provide feedback. In order to tackle fake news, an
environment was created that supports growth by collaborating with experts in the field and other
significant individuals. Digital information will be more safeguarded in the long run as things
evolve. This will help it fight lies better and be less inclined to believe them.

33
Automatic AI for Detection of Fake News Student’s first and last name

Chapter 6: Conclusion
This project's primary objective is to show how useful it is to make use of advanced AI to stop the
growth of fake news, which is a required job in today's information world. A lot of various
algorithms for deep learning and machine learning have been meticulously developed and tried out.
Support Vector Machines (SVM) and Naïve Bayes classifiers are two of the simpler ones. Long
Short-Term Memory (LSTM) networks and GPT-based models are two of the more complex ones.
These very complicated models demonstrate that the people who made them actually know how to
distinguish the difference between real and fake news. This is proof that AI can very precisely
identify the difference between truth and lies.

Key Findings

 The LSTM, SVM, and Naïve Bayes models all did an excellent job, getting excellent
results for F1-scores, accuracy, precision, and recall. It is believed that the LSTM model to
be extremely intelligent. There are significant connections between time and written data in
recurrent neural networks, as demonstrated here.
 The forecasts that were based on GPT led to accurate and consistent classifications. So, it
demonstrates that natural language processing (NLP) models may be helpful when it's
necessary to find fake news. This is evidence of how essential it is to use advanced
language models that were recently trained on an immense quantity of text data.
 The AI system was capable of handling different types of data and materials, as shown by
tests that demonstrated it was robust. News articles can be quite distinct in style, matter,
and source, which is essential for practical applications.

Limitations

 Although AI models are highly precise now, they might still have difficulty finding types
of lies and misinformation that are very complicated and will evolve over time. Systems
have difficulty finding fake news because it evolves all the time.
 It is more challenging for the models to operate with data they have not encountered before
when labeled datasets are employed for model training, which may lead to error. It is
essential to work to get rid of these biases and make certain the models are fair for ethical
application.
 AI-based systems might not be frequently utilized to identify fake news because they have
trouble expanding and don't have sufficient computing power. This is particularly true on
large platforms that must deal with a lot of data concurrently.

Future Opportunities

 Researchers could look into methods for ensemble learning in the future to make models
that find fake news even more precise and helpful. Ensemble techniques may be able to
enhance the accuracy and reliability of classification by bringing data together from
multiple models.
 AI systems could do their work better and more publicly if they had real-time tracking
resources and methods for users to provide feedback. They could also find patterns that
might be fake news. In the future, this way of making models smoother might make the
system work smoother.
 Fighting fake and false data is most beneficial when writers, social media sites, and experts
in the field of study work together. For better detection systems, working with people from
various industries can bring about fresh concepts and perspectives.
 On top of identifying fake news, this work could be made greater by looking for other
damaging content, such as sexist remarks, propaganda, and online scams. Researchers may

34
Automatic AI for Detection of Fake News Student’s first and last name

utilize similar AI methods and modify them so that they work in different environments to
help protect data stored in digital spheres.

In short, this project has accomplished a lot to enhance the way AI finds fake news, but there is still
a lot to be learned and inventive concepts to come up with. Further research is warranted to identify
and address the problems discovered, as well as to expand upon the primary discoveries, in order to
establish more robust, scalable, and morally acceptable mechanisms for preventing the
dissemination of forged information and safeguarding the reliability of online information
ecosystems. There is a lot of room for further development to be made in this essential field of
study as things transform.

35
Automatic AI for Detection of Fake News Student’s first and last name

Reference List
Aggarwal, A., Chauhan, A., Kumar, D., Verma, S. and Mittal, M. (2020) ‘Classification of fake
news by fine-tuning deep bidirectional transformers based language model’, EAI Endorsed
Transactions on Scalable Information Systems, 7(27), pp.e10-e10.

Ahmad, I., Yousaf, M., Yousaf, S. and Ahmad, M.O. (2020) ‘Fake news detection using machine
learning ensemble methods’, Complexity, 2020, pp.1-11.

Akhtar, P., Ghouri, A.M., Khan, H.U.R., Amin ul Haq, M., Awan, U., Zahoor, N., Khan, Z. and
Ashraf, A. (2023) ‘Detecting fake news and disinformation using artificial intelligence and
machine learning to avoid supply chain disruptions’, Annals of Operations Research,
327(2), pp.633-657.

Choraś, M., Demestichas, K., Giełczyk, A., Herrero, Á., Ksieniewicz, P., Remoundou, K., Urda, D.
and Woźniak, M. (2021) ‘Advanced Machine Learning techniques for fake news (online
disinformation) detection: A systematic mapping study’, Applied Soft Computing, 101,
p.107050.

Collins, B., Hoang, D.T., Nguyen, N.T. and Hwang, D. (2021) ‘Trends in combating fake news on
social media–a survey’, Journal of Information and Telecommunication, 5(2), pp.247-266.

Du, Y., Bosselut, A. and Manning, C.D. (2022, June) ‘Synthetic disinformation attacks on
automated fact verification systems’, In Proceedings of the AAAI Conference on Artificial
Intelligence, Vol. 36, No. 10, pp. 10581-10589..

Faustini, P.H.A. and Covoes, T.F. (2020) ‘Fake news detection in multiple platforms and
languages’, Expert Systems with Applications, 158, p.113503.

Gangireddy, S.C.R., P, D., Long, C. and Chakraborty, T. (2020, July) ‘Unsupervised fake news
detection: A graph-based approach’, In Proceedings of the 31st ACM conference on
hypertext and social media (pp. 75-83).

Gupta, A., Kumar, N., Prabhat, P., Gupta, R., Tanwar, S., Sharma, G., Bokoro, P.N. and Sharma,
R. (2022) ‘Combating fake news: Stakeholder interventions and potential solutions’, Ieee
Access, 10, pp.78268-78289.

Hu, B., Sheng, Q., Cao, J., Shi, Y., Li, Y., Wang, D. and Qi, P. (2024, March) ‘Bad actor, good
advisor: Exploring the role of large language models in fake news detection’,
In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, No. 20, pp.
22105-22113.

Iqbal, A., Shahzad, K., Khan, S.A. and Chaudhry, M.S. (2023) ‘The relationship of artificial
intelligence (AI) with fake news detection (FND): a systematic literature review’, Global
Knowledge, Memory, and Communication.

Kaur, S., Kumar, P. and Kumaraguru, P. (2020) ‘Automating fake news detection system using
multi-level voting model’, Soft Computing, 24(12), pp.9049-9069.

Lai, C.M., Chen, M.H., Kristiani, E., Verma, V.K. and Yang, C.T. (2022) ‘Fake news classification
based on content level features’, Applied Sciences, 12(3), p.1116.

Meel, P. and Vishwakarma, D.K. (2020) ‘Fake news, rumor, information pollution in social media
and web: A contemporary survey of state-of-the-art, challenges and opportunities’, Expert
Systems with Applications, 153, p.112986.

36
Automatic AI for Detection of Fake News Student’s first and last name

Meesad, P. (2021) ‘Thai fake news detection is based on information retrieval, natural language
processing, and machine learning’, SN Computer Science, 2(6), p.425.

Merryton, A.R. and Augasta, G. (2020) ‘A survey on recent advances in machine learning
techniques for fake news detection’, Test Eng. Manag, 83, pp.11572-11582.

Molina, M.D., Sundar, S.S., Le, T. and Lee, D. (2021) ‘ “Fake news” is not simply false
information: A concept explication and taxonomy of online content’, American Behavioral
Scientist, 65(2), pp.180-212.

Paka, W.S., Bansal, R., Kaushik, A., Sengupta, S. and Chakraborty, T. (2021) ‘Cross-SEAN: A
cross-stitch semi-supervised neural attention model for COVID-19 fake news
detection’, Applied Soft Computing, 107, p.107393.

Patil, M., Yadav, H., Gawali, M., Suryawanshi, J., Patil, J., Yeole, A., Shetty, P. and Potlabattini, J.
(2024) ‘A Novel Approach to Fake News Detection Using Generative AI’, International
Journal of Intelligent Systems and Applications in Engineering, 12(4s), pp.343-354.

Prachi, N.N., Habibullah, M., Rafi, M.E.H., Alam, E. and Khan, R. (2022) ‘Detection of Fake
News Using Machine Learning and Natural Language Processing Algorithms [J]’, Journal
of Advances in Information Technology, 13(6).

Rohera, D., Shethna, H., Patel, K., Thakker, U., Tanwar, S., Gupta, R., Hong, W.C. and Sharma, R.
(2022) ‘A taxonomy of fake news classification techniques: Survey and implementation
aspects’, IEEE Access, 10, pp.30367-30394.

Seddari, N., Derhab, A., Belaoued, M., Halboob, W., Al-Muhtadi, J. and Bouras, A. (2022) ‘A
hybrid linguistic and knowledge-based analysis approach for fake news detection on social
media’, IEEE Access, 10, pp.62097-62109.

Setiawan, R., Ponnam, V.S., Sengan, S., Anam, M., Subbiah, C., Phasinam, K., Vairaven, M. and
Ponnusamy, S. (2021) ‘Certain investigatio’s of fake news detection from Facebook and
twitter using artificial intelligence approach’, Wireless Personal Communications, pp.1-26.

Sharma, D.K. and Garg, S. (2023) ‘IFND: a benchmark dataset for fake news detection’, Complex
& Intelligent Systems, 9(3), pp.2843-2863.

Sitaula, N., Mohan, C.K., Grygiel, J., Zhou, X. and Zafarani, R. (2020) ‘Credibility-based fake
news detection’, Disinformation, misinformation, and fake news in social media:
Emerging research challenges and Opportunities. pp.163-182.

Srinivas, J., Venkata Subba Reddy, K., Sunny Deol, G.J. and VaraPrasada Rao, P. (2021)
‘Automatic fake news detector in social media using machine learning and natural
language processing approaches’, In Smart Computing Techniques and Applications:
Proceedings of the Fourth International Conference on Smart Computing and Informatics,
Volume 2 (pp. 295-305). Springer Singapore.

Umer, M., Imtiaz, Z., Ullah, S., Mehmood, A., Choi, G.S. and On, B.W. (2020) ‘Fake news stance
detection using deep learning architecture (CNN-LSTM)’, IEEE Access, 8, pp.156695-
156706.

Zeng, J., Zhang, Y. and Ma, X. (2021) ‘Fake news detection for epidemic emergencies via deep
correlations between text and images’, Sustainable Cities and Society, 66, p.102652.

Zhang, X. and Ghorbani, A.A. (2020) ‘An overview of online fake news: Characterization,
detection, and discussion’, Information Processing & Management, 57(2), p.102025.

37
Automatic AI for Detection of Fake News Student’s first and last name

Zhou, X. and Zafarani, R. (2020) ‘A survey of fake news: Fundamental theories, detection
methods, and opportunities’, ACM Computing Surveys (CSUR), 53(5), pp.1-40.

38
Automatic AI for Detection of Fake News Student’s first and last name

Appendix C – Source Code


import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
from tqdm import tqdm
from bs4 import BeautifulSoup
import nltk
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
from nltk.corpus import stopwords
from keras.preprocessing import text, sequence
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Dense,Embedding,LSTM,Dropout
from keras.callbacks import ReduceLROnPlateau
import pandas as pd
import numpy as np
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
from keras.preprocessing.text import one_hot
from keras.preprocessing.sequence import pad_sequences
from keras.models import Sequential
from keras.layers import Flatten
from keras.preprocessing.text import Tokenizer
from numpy import array
from numpy import asarray
from numpy import zeros
from wordcloud import WordCloud
import keras
import plotly.express as px
import plotly.graph_objects as go

from google.colab import drive


drive.mount('/content/drive')

# Load the dataset


data = pd.read_csv('/content/drive/MyDrive/train.csv')

# Graph 1: Distribution of Labels


label_distribution = data['label'].value_counts()
fig1 = px.bar(x=label_distribution.index, y=label_distribution.values, labels={'x':'Label',
'y':'Count'}, title='Distribution of Labels')
fig1.show()

# Graph 3: Word Count Distribution


word_counts = data['text'].apply(lambda x: len(str(x).split()))
fig3 = px.histogram(x=word_counts, nbins=50, title='Word Count Distribution')
fig3.show()

39
Automatic AI for Detection of Fake News Student’s first and last name

# Graph 4: Author Analysis


author_distribution = data['author'].value_counts().head(10)
fig4 = px.bar(x=author_distribution.index, y=author_distribution.values, labels={'x':'Author',
'y':'Count'}, title='Top 10 Authors')
fig4.show()

# Graph 5: Text Length vs. Label


fig5 = px.box(data, x='label', y=word_counts, title='Text Length Distribution by Label',
labels={'label':'Label', 'y':'Text Length'})
fig5.show()

nltk.download('stopwords')

df_str_text = data[data['text'].apply(lambda x: isinstance(x, str))]

df_str_text['text'] = df_str_text['text'].apply(lambda x: BeautifulSoup(x, 'html.parser').get_text())


stop_words = set(stopwords.words('english'))
df_str_text['text'] = df_str_text['text'].apply(lambda x: ' '.join([word for word in x.split() if
word.lower() not in stop_words]))

tokenizer = Tokenizer()
tokenizer.fit_on_texts(df_str_text['text'].values)
vocab_size = len(tokenizer.word_index) + 1
print("Vocabulary Size :- ",vocab_size)
X = tokenizer.texts_to_sequences(df_str_text['text'].values)

max_length = 1000
# Padding
X = pad_sequences(X,maxlen = max_length, padding = 'post')
y = pd.get_dummies(df_str_text['label']).values

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,random_state=53)


print(X_train.shape,y_train.shape)
print(X_test.shape,y_test.shape)

# load the whole embedding into memory


embeddings_index = dict()
f = open('/content/drive/MyDrive/glove.twitter.27B.100d.txt')
for line in f:
values = line.split()
word = values[0]
coefs = asarray(values[1:], dtype='float32')
embeddings_index[word] = coefs
f.close()
print('Loaded %s word vectors.' % len(embeddings_index))
embedding_matrix = zeros((vocab_size, 100))
for word, i in tokenizer.word_index.items():
embedding_vector = embeddings_index.get(word)
if embedding_vector is not None:
embedding_matrix[i] = embedding_vector

40
Automatic AI for Detection of Fake News Student’s first and last name

xtrain, xtest, ytrain, ytest = train_test_split(df_str_text['text'], df_str_text['label'],


test_size=0.33,random_state=53)
# Train TF-IDF Vectorizer
vectorizer = TfidfVectorizer(max_features=1000)
X_train_numeric = vectorizer.fit_transform(xtrain)
X_test_numeric = vectorizer.transform(xtest)

# Train SVM
svm_classifier = SVC(kernel='linear')
svm_classifier.fit(X_train_numeric, ytrain)

# Evaluate SVM
svm_predictions = svm_classifier.predict(X_test_numeric)
svm_accuracy = accuracy_score(ytest, svm_predictions)
print("SVM Accuracy:", svm_accuracy)

# If svm_predictions is not one-hot encoded, convert it to one-hot encoding


if len(svm_predictions.shape) == 1:
svm_predictions_onehot = np.zeros((svm_predictions.size, svm_predictions.max()+1))
svm_predictions_onehot[np.arange(svm_predictions.size), svm_predictions] = 1
else:
svm_predictions_onehot = svm_predictions

# Compute confusion matrix and classification report


svm_confusion_matrix = confusion_matrix(y_test.argmax(axis=1),
svm_predictions_onehot.argmax(axis=1))
svm_classification_report = classification_report(y_test.argmax(axis=1),
svm_predictions_onehot.argmax(axis=1))
print("\nSVM Classification Report:")
print(svm_classification_report)

# Plot confusion matrix as a heatmap


plt.figure(figsize=(8, 6))
sns.heatmap(svm_confusion_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Fake', 'Real'], yticklabels=['Fake', 'Real'])
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('SVM Confusion Matrix')
plt.show()

# Train Naive Bayes


naive_bayes_classifier = MultinomialNB()
naive_bayes_classifier.fit(X_train_numeric, ytrain)

# Evaluate Naive Bayes


naive_bayes_predictions = naive_bayes_classifier.predict(X_test_numeric)
naive_bayes_accuracy = accuracy_score(ytest, naive_bayes_predictions)
print("\nNaive Bayes Accuracy:", naive_bayes_accuracy)

# If naive_bayes_predictions is not one-hot encoded, convert it to one-hot encoding


if len(naive_bayes_predictions.shape) == 1:

41
Automatic AI for Detection of Fake News Student’s first and last name

naive_bayes_predictions_onehot = np.zeros((naive_bayes_predictions.size,
naive_bayes_predictions.max()+1))
naive_bayes_predictions_onehot[np.arange(naive_bayes_predictions.size),
naive_bayes_predictions] = 1
else:
naive_bayes_predictions_onehot = naive_bayes_predictions

# Compute confusion matrix and classification report


naive_bayes_confusion_matrix = confusion_matrix(y_test.argmax(axis=1),
naive_bayes_predictions_onehot.argmax(axis=1))
naive_bayes_classification_report = classification_report(y_test.argmax(axis=1),
naive_bayes_predictions_onehot.argmax(axis=1))

print("\nNaive Bayes Confusion Matrix:")


print(naive_bayes_confusion_matrix)
print("\nNaive Bayes Classification Report:")
print(naive_bayes_classification_report)

# Plot confusion matrix as a heatmap


plt.figure(figsize=(8, 6))
sns.heatmap(naive_bayes_confusion_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Fake', 'Real'], yticklabels=['Fake', 'Real'])
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('Naive Bayes Confusion Matrix')
plt.show()

model = Sequential()
model.add(Embedding(vocab_size, 100, weights=[embedding_matrix],
input_length=max_length, trainable=False))
model.add(LSTM(128, return_sequences=True))
model.add(LSTM(64, return_sequences=True))
model.add(LSTM(16))
model.add(Dense(2, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
print(model.summary())

model.fit(X_train, y_train, epochs=50, verbose=1)

# Make predictions on the test data


lstm_predictions = model.predict(X_test)

lstm_predictions_labels = np.argmax(lstm_predictions, axis=1)


y_test_labels = np.argmax(y_test, axis=1)

# Calculate accuracy
lstm_accuracy = accuracy_score(y_test_labels, lstm_predictions_labels)
print("LSTM Accuracy:", lstm_accuracy)

# Generate classification report


lstm_classification_report = classification_report(y_test_labels, lstm_predictions_labels)
print("\nLSTM Classification Report:")

42
Automatic AI for Detection of Fake News Student’s first and last name

print(lstm_classification_report)

# Compute confusion matrix


lstm_confusion_matrix = confusion_matrix(y_test_labels, lstm_predictions_labels)
print("\nLSTM Confusion Matrix:")
print(lstm_confusion_matrix)

# Plot confusion matrix as a heatmap


plt.figure(figsize=(8, 6))
sns.heatmap(lstm_confusion_matrix, annot=True, fmt='d', cmap='Blues',
xticklabels=['Fake', 'Real'], yticklabels=['Fake', 'Real'])
plt.xlabel('Predicted Labels')
plt.ylabel('True Labels')
plt.title('LSTM Confusion Matrix')
plt.show()

Prediction of a news given by GPT

data = {
'text': ["""In a groundbreaking discovery, a team of scientists has conclusively proven that the
Earth is, indeed, flat.
After years of research and experimentation, the team has debunked the centuries-old
misconception
that the Earth is a sphere. The findings have sent shockwaves through the scientific community
and have raised questions about the validity of previous space missions and astronomical
observations.""","""NASA has announced the discovery of a new exoplanet located in the
habitable zone of its host star,
with conditions similar to those found on Earth. The exoplanet, named Kepler-452b, is situated
approximately 1,400 light-years away from our solar system. Scientists believe that Kepler-452b
could potentially harbor liquid water and support life, making it an exciting target for future
exploration and study. The discovery marks a significant milestone in our quest to find
life beyond our own planet."""],
'label': [1, 0]
}

df = pd.DataFrame(data)

testing=vectorizer.transform(data['text'])
df['svm_test']=svm_classifier.predict(testing)

df['naive_test']=naive_bayes_classifier.predict(testing)

X = tokenizer.texts_to_sequences(data['text'])
X = pad_sequences(X,maxlen = max_length, padding = 'post')

labels=model.predict(X)
labels=np.argmax(labels, axis=1)
df['lstm_test']=labels

df

43

You might also like