0% found this document useful (0 votes)

26 views25 pages

DV Report 1

Data Visualization

Uploaded by

renushree bedre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views25 pages

DV Report 1

Data Visualization

Uploaded by

renushree bedre

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 25

FAKE NEWS DETECTION 2023-24

CHAPTER 1

INTRODUCTION

In today's digital age, the widespread use of online platforms and social media has led to an
unprecedented surge in the dissemination of information. However, amidst the vast sea of data, the
prevalence of fake news has become a significant concern. Fake news, characterized by misleading
or false information presented as genuine news, can have detrimental effects on individuals, society,
and even influence political landscapes. To address this challenge, researchers and technologists are
turning to machine learning techniques for fake news detection.

Machine learning, a branch of artificial intelligence, empowers computers to learn patterns and make
predictions based on data. In the context of fake news detection, machine learning algorithms analyze
vast amounts of textual information, identifying subtle cues and patterns that distinguish authentic
news from deceptive content. These algorithms are trained on datasets containing both real and fake
news, enabling them to develop a discerning ability.

The process involves extracting features from news articles, such as language style, contextual
information, and historical data. These features serve as inputs to machine learning models, which
learn to recognize the intricate differences between trustworthy and fabricated news sources. As a
result, when presented with a new piece of information, these models can assess its authenticity,
providing a valuable tool in the ongoing battle against misinformation.

Fake news detection using machine learning is a dynamic and evolving field, with continuous
advancements aimed at enhancing the accuracy and efficiency of detection models. The ultimate goal
is to create robust systems that can swiftly and reliably identify fake news, thereby fostering a more
informed and resilient society in the digital age.

1.1 APPLICATIONS OF FAKE NEWS DETECTION

1. Preserving Credibility: Fake news detection helps maintain the credibility of information
circulating online, ensuring that users are not misled by false or misleading content.

2. Protecting Public Opinion: By identifying and flagging misinformation, fake news detection
contributes to safeguarding public opinion from being swayed by deceptive narratives.

Department of AI&DS Page | 1

FAKE NEWS DETECTION 2023-24

3. Enhancing Trust in Media: Media organizations can use fake news detection to reinforce trust in
their content, demonstrating a commitment to delivering accurate and reliable information.

4. Preventing Spread of Misinformation: Rapid identification of fake news helps in preventing the
widespread dissemination of false information, reducing its potential impact on individuals and
society.

5. Supporting Fact-Checking Efforts: Fake news detection tools support fact-checking initiatives
by automating the process of verifying information, making it more efficient and scalable.

6. Mitigating Social Unrest: By curbing the influence of fake news, these detection systems play a
role in preventing the potential social unrest or conflicts that misinformation can trigger.

7. Strengthening Online Platforms: Social media and online platforms can deploy fake news
detection to create a safer and more reliable environment for users, fostering a healthier digital
community.

8.Educating Users: Fake news detection tools contribute to user education by raising awareness
about the prevalence of misinformation and encouraging critical thinking when consuming online
content.

9. Political Integrity: In the political realm, fake news detection aids in maintaining the integrity of
elections and political processes by identifying and rectifying false narratives.

10. Global Information Security: As misinformation often transcends borders, fake news detection
supports global information security by minimizing the impact of false narratives on an international
scale.

Department of AI&DS Page | 2

FAKE NEWS DETECTION 2023-24

CHAPTER 2

SYSTEM ANALYSIS

2.1 EXISTING SYSTEM

Manually detecting fake news involves a careful examination of various elements within a news
article to assess its credibility. Fact-checkers and individuals look for red flags such as
sensationalized headlines, biased language, or the absence of reliable sources. They cross-reference
information with reputable news outlets and use critical thinking to identify inconsistencies or
improbable claims. Analyzing the author's expertise, checking for proper citations, and verifying the
publication date are common practices. Additionally, manual detection may involve considering the
overall tone of the article and being cautious of emotionally charged language. While technology and
automated systems play a role, manual detection relies on human judgment and a nuanced
understanding of journalistic standards to distinguish between accurate and misleading information

2.2 PROPOSED SYSTEM

Designing a system for Fake News Detection involves several steps.

1. Data Loading and Preprocessing:

 The code loads two datasets, "Fake.csv" and "True.csv," containing fake and true news
articles, respectively, using the Pandas library.
 It preprocesses the data by removing unnecessary columns, such as "title," "subject," and
"date," and handles specific patterns like "(Reuters)" in the true news text.

2. Data Concatenation and Exploration:

 It combines the fake and true datasets into a single dataframe.
 Utilizes data visualization, specifically a seaborn countplot, to illustrate the distribution of
news articles based on their subjects.
3. Text Cleaning:

Department of AI&DS Page | 3

FAKE NEWS DETECTION 2023-24
 Defines a function called wordopt for text cleaning. This function converts text to lowercase,
removes square brackets, removes specific patterns like "( )", eliminates non-alphanumeric
characters, and performs other text cleaning operations.

4. Feature Extraction:
 The code splits the data into training and testing sets and utilizes the TF-IDF (Term
Frequency-Inverse Document Frequency) vectorizer to convert the text data into numerical
features for machine learning models.
5. Model Training:
 Implements three different machine learning models: Logistic Regression, Decision Tree
Classifier, and Random Forest Classifier.
 Trains these models on the TF-IDF transformed training data.
6. Model Evaluation:
 Evaluates the performance of each model using accuracy scores and generates confusion
matrices and classification reports.
7. Manual Testing:
 Defines a function called manual_testing for predicting the label of a manually input news
article. It preprocesses the input and uses the trained models to predict whether the input is
fake or not.
8. User Interaction:
 Prompts the user to input a news article for manual testing.

Department of AI&DS Page | 4

FAKE NEWS DETECTION 2023-24
Fig 2.1 Proposed System

2.3 ADVANTAGES OF PROPOSED SYSTEM

 Data Preprocessing Enhancements: Removing unnecessary columns and handling specific

patterns in the data improves the quality of input for machine learning models, reducing noise
and irrelevant information.
 Comprehensive Data Exploration: Combining datasets and visualizing news article
distribution by subject provides insights into the composition of the data, aiding in better
understanding and potential feature selection.
 Effective Text Cleaning: The text cleaning function enhances the quality of textual data by
converting it to lowercase, removing noise (e.g., square brackets), and addressing specific
patterns, contributing to more accurate feature extraction.
 Advanced Feature Extraction: Utilizing TF-IDF vectorization for feature extraction
captures the importance of words in distinguishing between fake and true news, providing a
robust numerical representation of textual data for machine learning models.
 Diverse Model Implementation: Implementing multiple machine learning models, including
Logistic Regression, Decision Tree Classifier, and Random Forest Classifier, allows for
model diversity, increasing the chances of accurately capturing patterns in the data.
 Thorough Model Training: Training models on TF-IDF transformed data ensures that the
algorithms learn from the relevant features, enhancing their ability to generalize and make
accurate predictions on new data.
 Comprehensive Model Evaluation: Evaluating models using accuracy scores, confusion
matrices, and classification reports provides a holistic understanding of their performance,
aiding in the selection of the most effective model for fake news detection.
 User-Friendly Manual Testing: The manual testing function allows users to interact with the
system, input news articles, and receive predictions. This user-friendly feature promotes
practical usability and engagement.

Department of AI&DS Page | 5

FAKE NEWS DETECTION 2023-24

CHAPTER 3

SYSTEM REQUIREMENT

3.1 HARDWARE REQUIREMENTS:

 OS (Operating System): Windows 10

 Processor: Intel I5 2.1 Ghz.
 Storage: 100 GB.
 RAM: 4 GB

3.2 SOFTWARE REQUIREMENTS:

 Programming Language: Python

 Front End or Web Technologies: HTML5, CSS, BOOTSTRAP 5.0.2
 Web Frame works: Django
 APIs: Pandas, NumPy, Sklearn, Matplotlib, Seaborn
 Technology used: Machine Learning, Natural Language Processing

3.2.1 HTML

HTML is the Hyper Text Markup Language it is used for creation of websites or web pages. For
creation of website/web pages we are using Cascading Style Sheet (CSS) it is used to create styles for
your web pages like font, color, animation and JavaScript it is used for validation purpose. Web
browser get HTML file from a web server and we can see the website page in any type of browsers.
HTML describes the structure of a web page and it is the tag-based language.

3.2.2 CSS

Department of AI&DS Page | 6

FAKE NEWS DETECTION 2023-24
CSS is used for while creating web page adding style in that in a simple and easiest way. CSS
explanation "Cascading Style Sheet". Cascading Style Sheets, also known as CSS, it is simple style-
based language to make website attractive.

3.2.3 BOOTSTRAP 5.0.2

Bootstrap is an open-source framework used to develop the responsive web applications or responsive
designs. Responsive means application should be runs on smaller screens like mobile phones and
tablets. Every element of the HTML document gets stacked when the page gets smaller or minimized.
By default, bootstrap takes 12 columns of width with equal separation of the columns that means every
column having same size.

But you can alter the default values and you can make layouts, design according to your requirements
using <span> tag. Bootstrap provide grid system for all kind of devices such as normal, medium and
short which can help to run the app on every device. Further it provides some stylish buttons, forms,
tables and so on. Bootstrap 5.0.2 is the newest version with some additional features compare to
previous versions. In this project bootstrap 5.0.2 is used for the front development along with the
Django framework.

3.2.4 MACHINE LEARNING

Machine learning is the type of AI in this it will learn automatically without having the user
knowledge (Ex. Robot- in that if we feed the data based on that it will follow instructions and
experience by own, we don’t need to insist each and every time.)

Machine learning has three types:

1. Supervised Learning: In this type, the algorithm is trained on a labeled dataset, where the input
data is paired with the corresponding correct output. The goal is for the algorithm to learn a
mapping from inputs to outputs, enabling it to make predictions or classifications on new, unseen
data.

2. Unsupervised Learning: Here, the algorithm is given unlabeled data and is tasked with finding
patterns or structures within it. Clustering and dimensionality reduction are common tasks in

Department of AI&DS Page | 7

FAKE NEWS DETECTION 2023-24
unsupervised learning, where the algorithm tries to group similar data points or reduce the
complexity of the data.

3. Reinforcement Learning: This type involves an agent that learns to make decisions by interacting
with an environment. The agent receives feedback in the form of rewards or penalties, allowing it
to

learn optimal strategies over time. Reinforcement learning is often used in scenarios where an
agent must take sequential actions to achieve a goal.

3.2.5 Natural Language Processing

Natural Language Processing (NLP) is a field of artificial intelligence that focuses on enabling
computers to understand and analyze human language. In the context of this code, NLP is employed to
distinguish between fake and genuine news articles. The process involves loading datasets, cleaning
text by removing specific strings, assigning labels, and crucially, using the "wordopt" function to
preprocess the text-converting it to lowercase and removing various elements like special characters
and URLs. The data is then shuffled for effective machine learning model training. NLP techniques
further include splitting the text data into training and testing sets and utilizing a TF-IDF vectorizer to
convert textual information into numerical features. Three machine learning models—Logistic
Regression, Decision Tree Classifier, and Random Forest Classifier—are trained on the transformed
data to predict the authenticity of news articles. The models' accuracies are evaluated, and a manual
testing function allows inputting new articles for authenticity prediction. In summary, NLP plays a
pivotal role in teaching computers to understand and classify news content, enhancing their ability to
make accurate predictions about the authenticity of news articles based on trained machine learning
models.

3.2.6 DJANGO

Django is high level web framework in python which is developed and maintain by DSF (Django
Software Foundation). Now a days Django widely in used because of its more built-in functionalities.
There are some famous and well-known companies and apps are using Django for the development of

Department of AI&DS Page | 8

FAKE NEWS DETECTION 2023-24
their websites and those companies and apps are Google, Instagram, Disqus, Spotify, You Tube,
Pinterest, it is used in web development in python.

It supports templates and static files that means you can easily render the HTML pages by putting all
the HTML files in the directory called ‘templates’ and similarly you can place all the files related to
styles like CSS and JS will be placed inside the directory called ‘static’. In this project Django is used
for the front-end development.

3.2.7 PANDAS

Pandas is a powerful Python library for data manipulation and analysis. It provides data structures like
Data Frames for handling and analyzing structured data, making tasks such as cleaning, filtering, and
transforming data more efficient. Pandas is widely used in data science and machine learning for its
ease of use and flexibility.

3.2.8 SKLEARN (Scikit-learn)

Scikit-learn is a popular machine learning library in Python. It provides a simple and efficient tool for
data analysis and modeling, including various machine learning algorithms for classification,
regression, clustering, and more. Scikit-learn is built on NumPy, SciPy, and Matplotlib, making it a
comprehensive and easy-to-use library for implementing machine learning workflows.

3.2.9 Matplotlib

Matplotlib is a popular Python library for creating visualizations such as charts, plots, and graphs. It
provides a flexible and user-friendly interface for generating a variety of static, animated, and
interactive plots. With Matplotlib, users can visualize data in a clear and meaningful way, making it
easier to understand trends, patterns, and relationships within the data. The library offers a wide range
of customization options, allowing users to tailor the appearance of their visualizations to suit specific
needs.

3.2.10 Seaborn

Department of AI&DS Page | 9

FAKE NEWS DETECTION 2023-24
Seaborn is a Python data visualization library that works in conjunction with Matplotlib to simplify the
creation of aesthetically pleasing statistical graphics. It provides a high-level interface for producing
informative and visually appealing plots with minimal code. Seaborn comes with built-in themes and
color palettes, making it easy to create professional-looking visualizations without extensive
customization.

CHAPTER 4

PROBLEM STATEMENT

4.1 Problem Statement:

In today's digital age, fake news is a big problem. With so much information online, especially on
social media, it's hard to tell what's true and what's not. Fake news spreads quickly and can harm how
much we trust what we read online. This is a serious issue because it can affect our opinions on
important matters like politics, health, and how we get along with each other.

The problem is that current methods to spot fake news can't keep up with the tricky ways people
spread false information. We need new and smart solutions to catch fake news and stop it from causing
problems. This way, we can make sure the information we rely on is accurate, and everyone can be
better informed and connected.

4.2 Problem Description:

In the digital era, the surge of fake news poses a substantial threat to the integrity of information
available online. With the overwhelming volume of content circulating on various platforms,
particularly social media, distinguishing authentic information from deceptive narratives has become
increasingly challenging. The rapid dissemination of false information through fake news not only

Department of AI&DS Page | 10

FAKE NEWS DETECTION 2023-24
undermines the reliability of online content but also has far-reaching consequences on public opinions,
shaping perceptions on critical issues like politics, health, and societal relationships.

Current strategies employed to identify and combat fake news often fall short due to the evolving and
sophisticated tactics employed by purveyors of misinformation. As a result, there is a pressing need for
advanced and efficient solutions that can adapt to the dynamic landscape of misinformation.
Addressing this issue is crucial to safeguarding the credibility of information sources, fostering a more
discerning online community, and curbing the detrimental impact of fake news on societal trust and
decision-making processes. A comprehensive and accessible approach to fake news detection is
essential to ensure that individuals can navigate the digital realm with confidence in the authenticity of
the information they encounter.

CHAPTER 5

IMPLEMENTATION

In this project, we will develop and evaluate the performance and the predictive power of a model
trained and tested on data collected from the dataset. Once we get a good fit, we will use this model to
Detect the Fake News.

5.1 Dataset

This dataset is designed for the classification of news articles, specifically to discern whether the
content is genuine or fake. In terms of inputs (features) and outputs, the characteristics of the dataset
can be summarized as follows:

 Title: The title or headline of the news article.

 Text: The content or main text of the news article.
 Subject: The subject or category to which the news article belongs (e.g., "News" or "worldnews").
 Date: The publication date of the news article.
 Target: A binary target variable (0 or 1), indicating whether the news article is associated with a
certain target or category.

Department of AI&DS Page | 11

FAKE NEWS DETECTION 2023-24
 The overview of the dataset is as shown in table 5.1 with its features:

Table 5.1: Overview of Dataset

Fig 5.1: Count of each

Subject in the Dataset

5.2 Algorithm

5.2.1 Logistic Regression

Department of AI&DS Page | 12

FAKE NEWS DETECTION 2023-24
 Logistic regression is one of the most popular Machine Learning algorithms, which comes
under the Supervised Learning technique. It is used for predicting the categorical dependent
variable using a given set of independent variables.
 Logistic regression predicts the output of a categorical dependent variable. Therefore, the
outcome must be a categorical or discrete value. It can be either Yes or No, 0 or 1, true or
False, etc. but instead of giving the exact value as 0 and 1, it gives the probabilistic values
which lie between 0 and 1.
 Logistic Regression is much similar to the Linear Regression except that how they are used.
Linear Regression is used for solving Regression problems, whereas Logistic regression is used
for solving the classification problems.
 In Logistic regression, instead of fitting a regression line, we fit an "S" shaped logistic function,
which predicts two maximum values (0 or 1).
 The curve from the logistic function indicates the likelihood of something such as whether the
cells are cancerous or not, a mouse is obese or not based on its weight, etc.
 Logistic Regression is a significant machine learning algorithm because it has the ability to
provide probabilities and classify new data using continuous and discrete datasets.

 Logistic Regression can be used to classify the observations using different types of data and
can easily determine the most effective variables used for the classification. The below image
is showing the logistic function:

Fig 5.2: Logistic Regression Graph

 Advantages of the Logistic Regression

 Logistic regression is easier to implement, interpret, and very efficient to train .

Department of AI&DS Page | 13

FAKE NEWS DETECTION 2023-24
 It makes no assumptions about distributions of classes in feature space.
 It can easily extend to multiple classes (multinomial regression) and a natural probabilistic
view
of class predictions.
 Disadvantages of the Logistic Regression

 The major limitation of Logistic Regression is the assumption of linearity between the
dependent variable and the independent variables.
 It can only be used to predict discrete functions. Hence, the dependent variable of Logistic
Regression is bound to the discrete number set.

Table 5.2: Classification Report of Logistic Regression

Fig 5.3: Confusion Matrix of Logistic Regression

Department of AI&DS Page | 14

FAKE NEWS DETECTION 2023-24
5.2.2 Decision Tree

 Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.
 In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
 The decisions or the test are performed on the basis features of the given dataset.
 It is a graphical representation for getting all the possible solutions to a problem/decision based
on given conditions.
 It is called a decision tree because, similar to a tree, it starts with the root node, which expands
on further branches and constructs a tree-like structure.
 In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
 A decision tree simply asks a question, and based on the answer (Yes/No), it further split the
tree into subtrees.
 Below diagram explains the general structure of a decision tree:

Fig 5.4: Decision Tree

Department of AI&DS Page | 15

FAKE NEWS DETECTION 2023-24
 Advantages of the Decision Tree

 It is simple to understand as it follows the same process which a human follow while making
any decision in real-life.
 It can be very useful for solving decision-related problems.
 It helps to think about all the possible outcomes for a problem.
 There is less requirement of data cleaning compared to other algorithms.
 Disadvantages of the Decision Tree

 The decision tree contains lots of layers, which makes it complex.

 It may have an overfitting issue, which can be resolved using the Random Forest algorithm.
 For more class labels, the computational complexity of the decision tree may increase.

Table 5.3: Classification Report of Decision Tree

Figure 5.5: Confusion Matrix of Decision Tree

Department of AI&DS Page | 16

FAKE NEWS DETECTION 2023-24
5.2.3 Random Forest
 Random Forest is a popular machine learning algorithm that belongs to the supervised learning
technique. It can be used for both Classification and Regression problems in ML.
 It is based on the concept of ensemble learning, which is a process of combining multiple
classifiers to solve a complex problem and to improve the performance of the model.
 As the name suggests, "Random Forest is a classifier that contains a number of decision
trees on various subsets of the given dataset and takes the average to improve the
predictive accuracy of that dataset." Instead of relying on one decision tree, the random forest
takes the prediction from each tree and based on the majority votes of predictions, and it predicts
the final output.
 The below diagram explains the working of the Random Forest algorithm

Fig 5.6: Random Forest

 Advantages of the Random Forest

 Random Forest is capable of performing both Classification and Regression tasks.

 It is capable of handling large datasets with high dimensionality.
 It enhances the accuracy of the model and prevents the overfitting issue.

 Disadvantages of the Random Forest

 Although random forest can be used for both classification and regression tasks, it is not more
suitable for Regression tasks.

Department of AI&DS Page | 17

FAKE NEWS DETECTION 2023-24

Table 5.4: Classification Report of Random Forest

Fig 5.7: Confusion Matrix of Random Forest

Department of AI&DS Page | 18

FAKE NEWS DETECTION 2023-24
Fig 5.8: Accuracy of Different Classifiers

5.3 Programming Steps

 Data Loading: Two CSV files, "Fake.csv" and "True.csv," are loaded into pandas DataFrames
(df_fake and df_true).

 Data Preprocessing:
 Irrelevant information, such as "(Reuters)" in the "True.csv" text column, is removed.
 Target labels (0 for fake, 1 for true) are added to the DataFrames.
 Columns "title," "subject," and "date" are dropped from both DataFrames.

 Data Integration and Exploration:

 The DataFrames are concatenated to create a single DataFrame (df).
 A count plot is created to visualize the distribution of news articles across different
subjects.

 Text Cleaning:
 A function wordopt is defined to clean the text data.
 Text is converted to lowercase, and various patterns such as URLs, special characters,
and numbers are removed.

 Text Vectorization:
 The TfidfVectorizer is used to convert the text data into numerical vectors.
 The dataset is split into training and testing sets.

 Model Training:
 Logistic Regression, Decision Tree, Random Forest is used to train the model using the
Tfidf-transformed training data.
 Model accuracy is evaluated on the test set.
 Confusion matrix and classification report are generated for performance assessment.

 Manual Testing Function:

 A function manual_testing is defined to test the trained models on custom news input.
 The input news is preprocessed and vectorized using the TfidfVectorizer.

Department of AI&DS Page | 19

FAKE NEWS DETECTION 2023-24
 Predictions from Logistic Regression, Decision Tree, and Random Forest models are
displayed.
 User Input and Testing:
 The user is prompted to input a news article. and the manual_testing function is called to
predict whether the input news is fake or true using each of the trained models.

CHAPTER 6

SNAPSHOTS

Department of AI&DS Page | 20

FAKE NEWS DETECTION 2023-24

Fig 6.1: Input page

6.1 Input page: Accept the text(news) from the user and predicts whether the news is fake or not.

Fig 6.2: Accepting user input

6.2: Accepting user input: here user provided the inputs to predicts whether the news is fake
or not.

Department of AI&DS Page | 21

FAKE NEWS DETECTION 2023-24
Fig 6.3: Output as Real News

6.3: Output as Real News: Here the output is predicted as Real News based on the user input.

Fig 6.4: Output as Fake News

6.4: Output as Fake News: Here the output is predicted as Fake News based on the user input.

CONCLUSION

In conclusion, the fake news detection process involves thorough data loading, preprocessing,
integration, and exploration using pandas DataFrames, followed by text cleaning and vectorization
using the TfidfVectorizer. Three different models—Logistic Regression, Decision Tree, and Random
Forest—are trained on the transformed data, and their accuracy is evaluated on a test set, with
performance assessed through confusion matrices and classification reports. The manual_testing
function allows users to input custom news articles for predictions. This comprehensive approach
combines machine learning techniques with user interaction, enabling effective identification of fake
or true news articles. The integration of various steps, from data cleaning to model testing, creates a
robust framework for combating misinformation and promoting a more reliable news environment.

Department of AI&DS Page | 22

FAKE NEWS DETECTION 2023-24

FUTURE ENHANCEMENT

Looking to the future, the ongoing battle against fake news detection calls for continuous
enhancements in technological solutions and interdisciplinary collaborations. Advancements in
artificial intelligence, particularly in the realms of machine learning and natural language processing,
hold significant promise. Future systems could employ more sophisticated algorithms capable of
recognizing subtle linguistic nuances, context, and evolving patterns of misinformation. Additionally,
the integration of advanced data analytics and deep learning models can further refine the accuracy and
efficiency of fake news detection. Collaborations between technology developers, media
organizations, and fact-checking initiatives will be crucial in refining algorithms and ensuring real-
time adaptation to emerging forms of misinformation. As our digital landscape evolves, a proactive

Department of AI&DS Page | 23

FAKE NEWS DETECTION 2023-24
approach involving ongoing research, education, and public awareness campaigns will be essential to
stay ahead of the ever-changing tactics employed by purveyors of fake news. Furthermore,
international cooperation and policy frameworks can play a pivotal role in creating a cohesive and
robust global strategy to combat the proliferation of false information across borders. The future of
fake news detection lies in a holistic and adaptive approach that leverages the power of technology
while promoting a resilient and well-informed society.

REFERENCES

[1] Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu, “Fake News Detection on
Social Media: A Data Mining Perspective”.

[2] S. Asghar, S. Mahmood, and H. Kamran, "Fake news detection using machine learning: A
survey,".

[3] J. H. Kim, S. H. Lee, and H. J. Kim, "Fake news detection using ensemble learning with context
and attention mechanism,”.

Department of AI&DS Page | 24

FAKE NEWS DETECTION 2023-24
[4] M. F. Hossain, M. M. Islam, M. A. H. Khan, and J. J. Jung, "Fake news detection using hybrid
machine learning algorithms,”.

[5] S. S. Ghosh, A. Mukherjee, and N. Ganguly, "A multi-perspective approach to fake news
detection,".

Department of AI&DS Page | 25

Fake News Detection-1
No ratings yet
Fake News Detection-1
37 pages
Encryption & Decryption Apk
No ratings yet
Encryption & Decryption Apk
27 pages
Project
No ratings yet
Project
13 pages
FAKENEWSDETECTION
No ratings yet
FAKENEWSDETECTION
9 pages
Mini Project
No ratings yet
Mini Project
19 pages
Ariba Nasir - 034 MPR
No ratings yet
Ariba Nasir - 034 MPR
28 pages
Aimll Report Fake News Detection
No ratings yet
Aimll Report Fake News Detection
27 pages
Fake News Proposal
No ratings yet
Fake News Proposal
18 pages
Daa - Mini - Project (1) Orginal
No ratings yet
Daa - Mini - Project (1) Orginal
21 pages
FYP
No ratings yet
FYP
14 pages
Fake News Detetction
No ratings yet
Fake News Detetction
10 pages
Fake News Final Report
No ratings yet
Fake News Final Report
29 pages
ML Paper 5
No ratings yet
ML Paper 5
8 pages
Case Study On Software Engineering
No ratings yet
Case Study On Software Engineering
19 pages
Fake News Detection System
No ratings yet
Fake News Detection System
13 pages
MAJOR PROJECT REPORT (1) - For Merge
No ratings yet
MAJOR PROJECT REPORT (1) - For Merge
46 pages
Project File On Fake News (Soniya Rawat)
No ratings yet
Project File On Fake News (Soniya Rawat)
53 pages
FYP Copy
No ratings yet
FYP Copy
42 pages
Dar Es Salaam Institutes of Technolog1
No ratings yet
Dar Es Salaam Institutes of Technolog1
8 pages
Identifying Fake News in Real Time 230603 103213
No ratings yet
Identifying Fake News in Real Time 230603 103213
6 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
18 pages
Fake News Detection
No ratings yet
Fake News Detection
10 pages
Case Study 406
No ratings yet
Case Study 406
10 pages
Fake News Detector Project Abstract
No ratings yet
Fake News Detector Project Abstract
9 pages
Project Synopsis Report Format
No ratings yet
Project Synopsis Report Format
9 pages
TBW Project Report
No ratings yet
TBW Project Report
4 pages
IJISAE 19 Vyankatesh+Rampurkar 3 2189
No ratings yet
IJISAE 19 Vyankatesh+Rampurkar 3 2189
7 pages
Fake News Detector
No ratings yet
Fake News Detector
7 pages
Fake News Detectio3
No ratings yet
Fake News Detectio3
24 pages
Fake News Detection
No ratings yet
Fake News Detection
3 pages
FAke News Report
No ratings yet
FAke News Report
16 pages
Tarp Rev3
No ratings yet
Tarp Rev3
32 pages
ML PPT
No ratings yet
ML PPT
16 pages
Fake News Detection
No ratings yet
Fake News Detection
24 pages
Fake News Detection PPT (AIB602)
No ratings yet
Fake News Detection PPT (AIB602)
11 pages
My Project Report
No ratings yet
My Project Report
34 pages
Fake News Detection Overview
No ratings yet
Fake News Detection Overview
16 pages
Aiml Project Report
No ratings yet
Aiml Project Report
46 pages
Fake News Detection2
No ratings yet
Fake News Detection2
12 pages
Fake News Detection
No ratings yet
Fake News Detection
9 pages
Fake News Research Paper 2022
No ratings yet
Fake News Research Paper 2022
26 pages
Screenshot - 2025 05 06 07 52 31 592 - Com - Android.chrome
No ratings yet
Screenshot - 2025 05 06 07 52 31 592 - Com - Android.chrome
10 pages
Fake News
No ratings yet
Fake News
10 pages
(NetCrypt) Review Paper PDF
No ratings yet
(NetCrypt) Review Paper PDF
5 pages
Synopsis
No ratings yet
Synopsis
5 pages
JPNR 2022 04 140
No ratings yet
JPNR 2022 04 140
7 pages
Review Paper
No ratings yet
Review Paper
7 pages
Similarity-ManpreetKaur5521 BTP Final Proje
No ratings yet
Similarity-ManpreetKaur5521 BTP Final Proje
19 pages
D13 Manuscript
No ratings yet
D13 Manuscript
12 pages
Final Year of Computer Engineering 2022-23 Semester VII Project Synopsis
No ratings yet
Final Year of Computer Engineering 2022-23 Semester VII Project Synopsis
11 pages
BT P Final Project
No ratings yet
BT P Final Project
11 pages
Fake News Synopsis 1
No ratings yet
Fake News Synopsis 1
6 pages
BTPFINALPROJECT
No ratings yet
BTPFINALPROJECT
10 pages
Fake News Detection
No ratings yet
Fake News Detection
5 pages
Fake News Detection Using Machine Learning: Nihel Fatima Baarir Abdelhamid Djeffal
No ratings yet
Fake News Detection Using Machine Learning: Nihel Fatima Baarir Abdelhamid Djeffal
6 pages
BTP Research Paper
No ratings yet
BTP Research Paper
9 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
4 pages
Electronics 12 05041
No ratings yet
Electronics 12 05041
12 pages
Iccee Research Paper
No ratings yet
Iccee Research Paper
8 pages
Blood Bank Management System - Corrected
No ratings yet
Blood Bank Management System - Corrected
55 pages
MULTIMEDIA Unit 1
No ratings yet
MULTIMEDIA Unit 1
14 pages
Chapter 1 To 4
No ratings yet
Chapter 1 To 4
45 pages
Knockoutjs Succinctly
No ratings yet
Knockoutjs Succinctly
70 pages
BE 5 IT GECBVN BharatVainsh Unit 5 (JavaScript) WD 3151606 2023
No ratings yet
BE 5 IT GECBVN BharatVainsh Unit 5 (JavaScript) WD 3151606 2023
98 pages
VIVA QUESTIONS OF ECOMMERCE (Shikha Bhalla)
No ratings yet
VIVA QUESTIONS OF ECOMMERCE (Shikha Bhalla)
8 pages
Dokumen - Tips - Best Practices Lessons Learned From The Field On Emc Documentum XCP 20
No ratings yet
Dokumen - Tips - Best Practices Lessons Learned From The Field On Emc Documentum XCP 20
46 pages
MVC - Restful API
No ratings yet
MVC - Restful API
22 pages
Kenya Methodist University: Endof3 Trimester 2018 (FT) Examination
100% (1)
Kenya Methodist University: Endof3 Trimester 2018 (FT) Examination
2 pages
(783217017) Ctwtlab Manual Final (13-2-16)
No ratings yet
(783217017) Ctwtlab Manual Final (13-2-16)
163 pages
Web Technology Notes
No ratings yet
Web Technology Notes
18 pages
G8 - Lago, April Pearl - Detailed Lesson Plan - How To Apply Colors in Different Ways - Tle - Ict - Ils
No ratings yet
G8 - Lago, April Pearl - Detailed Lesson Plan - How To Apply Colors in Different Ways - Tle - Ict - Ils
7 pages
Pre-National Exam OCT 2024 - Information and Computer Studies 1 (Revised V)
No ratings yet
Pre-National Exam OCT 2024 - Information and Computer Studies 1 (Revised V)
7 pages
Demo
No ratings yet
Demo
45 pages
B.C.A. Third Year (Effective From Session 2020-21) : BCA-301: JAVA Programming
No ratings yet
B.C.A. Third Year (Effective From Session 2020-21) : BCA-301: JAVA Programming
6 pages
RESUME (Vaibhav Bose)
No ratings yet
RESUME (Vaibhav Bose)
1 page
Level 5 - Question Tips For Each Unit
No ratings yet
Level 5 - Question Tips For Each Unit
100 pages
Godrej Properties
No ratings yet
Godrej Properties
2 pages
FSD - Experiment - VI - For II-II-CSE-A - 6abcd
No ratings yet
FSD - Experiment - VI - For II-II-CSE-A - 6abcd
18 pages
HTML Notes
No ratings yet
HTML Notes
2 pages
L02 - HTML Basics Part 1
No ratings yet
L02 - HTML Basics Part 1
20 pages
DOCTYPE HTML
No ratings yet
DOCTYPE HTML
7 pages
Exbii Com Showthread PHP T 971754 Page 11 PDF
No ratings yet
Exbii Com Showthread PHP T 971754 Page 11 PDF
10 pages
Dreamweaver Tutorial
No ratings yet
Dreamweaver Tutorial
36 pages
Mubarik Neja
No ratings yet
Mubarik Neja
1 page
HTML:: Cascading Style Sheets (CSS) Is A Style Sheet Language Used For Describing The
No ratings yet
HTML:: Cascading Style Sheets (CSS) Is A Style Sheet Language Used For Describing The
12 pages
Mba Regular - r16 III Semester Syllabus
No ratings yet
Mba Regular - r16 III Semester Syllabus
27 pages
IT 273 - Assignment 1
No ratings yet
IT 273 - Assignment 1
5 pages
M.Sc. in Cyber Security (MSCS)
No ratings yet
M.Sc. in Cyber Security (MSCS)
21 pages
Lending Tree Final
No ratings yet
Lending Tree Final
84 pages

DV Report 1

Uploaded by

DV Report 1

Uploaded by

FAKE NEWS DETECTION 2023-24

1.1 APPLICATIONS OF FAKE NEWS DETECTION

Department of AI&DS Page | 1

Department of AI&DS Page | 2

2.1 EXISTING SYSTEM

2.2 PROPOSED SYSTEM

Designing a system for Fake News Detection involves several steps.

1. Data Loading and Preprocessing:

2. Data Concatenation and Exploration:

Department of AI&DS Page | 3

Department of AI&DS Page | 4

2.3 ADVANTAGES OF PROPOSED SYSTEM

 Data Preprocessing Enhancements: Removing unnecessary columns and handling specific

Department of AI&DS Page | 5

3.1 HARDWARE REQUIREMENTS:

 OS (Operating System): Windows 10

3.2 SOFTWARE REQUIREMENTS:

 Programming Language: Python

Department of AI&DS Page | 6

3.2.3 BOOTSTRAP 5.0.2

3.2.4 MACHINE LEARNING

Machine learning has three types:

Department of AI&DS Page | 7

3.2.5 Natural Language Processing

Department of AI&DS Page | 8

3.2.8 SKLEARN (Scikit-learn)

Department of AI&DS Page | 9

4.1 Problem Statement:

4.2 Problem Description:

Department of AI&DS Page | 10

 Title: The title or headline of the news article.

Department of AI&DS Page | 11

Table 5.1: Overview of Dataset

Fig 5.1: Count of each

5.2.1 Logistic Regression

Department of AI&DS Page | 12

Fig 5.2: Logistic Regression Graph

 Advantages of the Logistic Regression

 Logistic regression is easier to implement, interpret, and very efficient to train .

Department of AI&DS Page | 13

Table 5.2: Classification Report of Logistic Regression

Fig 5.3: Confusion Matrix of Logistic Regression

Department of AI&DS Page | 14

Fig 5.4: Decision Tree

Department of AI&DS Page | 15

 The decision tree contains lots of layers, which makes it complex.

Table 5.3: Classification Report of Decision Tree

Figure 5.5: Confusion Matrix of Decision Tree

Department of AI&DS Page | 16

Fig 5.6: Random Forest

 Advantages of the Random Forest

 Random Forest is capable of performing both Classification and Regression tasks.

 Disadvantages of the Random Forest

Department of AI&DS Page | 17

Table 5.4: Classification Report of Random Forest

Fig 5.7: Confusion Matrix of Random Forest

Department of AI&DS Page | 18

5.3 Programming Steps

 Data Integration and Exploration:

 Manual Testing Function:

Department of AI&DS Page | 19

Department of AI&DS Page | 20

Fig 6.1: Input page

Fig 6.2: Accepting user input

Department of AI&DS Page | 21

Fig 6.4: Output as Fake News

Department of AI&DS Page | 22

Department of AI&DS Page | 23

Department of AI&DS Page | 24

Department of AI&DS Page | 25

You might also like