0% found this document useful (0 votes)
35 views19 pages

Mini Project

This document is a project report submitted by three students - Nikhil Sanjeev Thakare, Jatin Kumar, and Nizamudin - for their Bachelor of Engineering degree in Computer Science and Engineering. The report describes the development of a Fake News Detector, which analyzes text to determine the likelihood that information is factually accurate or misleading by leveraging natural language processing and machine learning techniques.

Uploaded by

jatinkumar20308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

Mini Project

This document is a project report submitted by three students - Nikhil Sanjeev Thakare, Jatin Kumar, and Nizamudin - for their Bachelor of Engineering degree in Computer Science and Engineering. The report describes the development of a Fake News Detector, which analyzes text to determine the likelihood that information is factually accurate or misleading by leveraging natural language processing and machine learning techniques.

Uploaded by

jatinkumar20308
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Fake News Detector

A PROJECT REPORT

Submitted by

Nikhil Sanjeev Thakare (21BCS1999)

Jatin Kumar (21BCS1949)

Nizamudin(21BCS1833)

in partial fulfilment for the award of the degree of

BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING

Chandigarh University

November 2023
BONAFIDE CERTIFICATE

Certified that this project report Fake News Detection is the bonafide work of
Nikhil Sanjeev Thakare, Jatin Kumar and Nizamudin who carried out the
project work under my/our supervision.

SIGNATURE SIGNATURE

Er. Kirat Kaur


SUPERVISOR

HEAD OF THE DEPARTMENT ASSISTANT PROFESSOR

B.E. CSE DEPARTMENT B.E. CSE DEPARTMENT

Submitted for the project viva-voce examination held on __________.

INTERNAL EXAMINER EXTERNALEXAMINER


TABLE OF CONTENTS
List of Figures ............................................................................................................................i

List of Tables ............................................................................................................................ ii

List of Standards ...................................................................................................................... iii

CHAPTER 1. INTRODUCTION.................................................................. 07
1.1. Identification of Client/ Need/ Relevant Contemporary issue ....................................... 07

1.2. Identification of Problem ............................................................................................... 07

1.3. Identification of Tasks .................................................................................................... 07

1.4. Timeline ......................................................................................................................... 08

1.5. Organization of the Report ............................................................................................. 09

CHAPTER 2. DESIGN FLOW/PROCESS……………………........... 11


2.1. Evaluation & Selection of Specifications/Features………………………………….…..11

2.2. Design Constraints………………………………………………………………………11

2.3. Analysis of Features and finalization subject to constraints…………………………….12

2.4. Design Flow……………………………………………………………………………..13

2.5. Design selection…………………………………………………………………………14

2.6. Implementation plan/methodology………………………………………………….…..14

CHAPTER 3. RESULTS ANALYSIS AND VALIDATION………………16


3.1. Tools, Libraries and the Network Used…………………………………………….……16

3.2 Network Visualization of Dolphins Social Network……………………………………..16

3.3 Descriptive Analysis……………………………………………………………………..17

3.4 Centrality Measures………………………………………………………………………17

3.5 Connectivity Analysis……………………………………………………………………19

3.6 Community Detection and Visualization………………………………………………...22

CHAPTER 4. CONCLUSION AND FUTURE WORK ............................ 25


4.1. Conclusion……………………………………………………………………………….25

4.2. Future work……………………………………………………………………………...26

USER MANUAL .......................................................................................... 18


List of Figures
Figure 2.1: Flowchart for Implementation Plan……………………………………………..15

Figure 3.1: Importing Necessary Libraries………………………………………………….16

Figure 3.2: Importing Dolphins Network……………………………………………………16

Figure 3.3: Visualizing Dolphins Social Network…………………………………………..16

Figure 3.4: Descriptive Analysis of the Dolphins Social Network………………………….17

Figure 3.5: Degree Centrality of top five nodes with the highest degree centrality………..17

Figure 3.6: Betweenness Centrality of top five nodes with the highest betweenness
centrality……………………………………………………………………………………...17

Figure 3.7: Eigenvector Centrality of top five nodes with the highest eigenvector
centrality……………………………………………………………………………………...18

Figure 3.8: Closeness Centrality of top five nodes with the highest closeness centrality…..18

Figure 3.9: Load Centrality of top five nodes with the highest load centrality……………..18

Figure 3.10: Most Influential Nodes and their respective Centrality Measures……………..18

Figure 3.11: Visualizing the Most Influential Nodes and their respective Centrality
Measures……………………………………………………………………………………...19

Figure 3.12: Connected Components in the Dolphins Social Network……………………..19

Figure 3.13: List of Isolated Nodes (Dolphins) in the Dolphins Social Network…………...20

Figure 3.14: List of Articulation Points in the Dolphins Social Network…………………...20

Figure 3.15: Code Snippet to Remove Articulation Points from a Network………………..20

Figure 3.16: Recalculated Connected Components in the New Dolphins Social Network…21

Figure 3.17: Recalculated List of Isolated Nodes in the New Dolphins Social Network…...21

Figure 3.18: Re-Visualizing the New Dolphins Social Network……………………………21

Figure 3.19: Detected Communities based on Louvain Algorithm…………………………22

Figure 3.20: Visualizing the Detected Communities based on Louvain Algorithm………...22

Figure 3.21: Detected Communities based on Girvan-Newman Algorithm………………...23

Figure 3.22: Visualizing the Detected Communities based on Girvan-Newman Algorithm..23

Figure 3.23: Detected Communities based on Label Propagation Algorithm………………23


Figure 3.24: Visualizing the Detected Communities based on Label Propagation
Algorithm…………………………………………………………………………………….24

List of Tables
Table 3.1: Table Depicting Various Centrality Measures of Most Influential Nodes……….19
ABSTRACT

The proliferation of fake news has become a critical issue in the digital age, posing
significant threats to society, democracy, and public discourse. The Fake News Detector is an
innovative tool designed to combat this problem by leveraging advanced natural language
processing (NLP) and machine learning techniques. This abstract provides an overview of the
Fake News Detector's key features, functionality, and the underlying technology that powers
its effectiveness.

The Fake News Detector is an automated system capable of analyzing textual content from
various sources, including news articles, social media posts, and websites, to determine the
likelihood of the information being factually accurate or misleading. Its core functionality
includes the following:

Text Analysis: The system employs NLP algorithms to assess the linguistic and semantic
features of the content, including grammar, sentiment, and language style. It evaluates the
coherence and structure of the text to identify suspicious patterns.

Source Credibility: It assesses the credibility of the sources, examining the publication's
history and reliability to gauge the likelihood of bias or fabrication.

Fact-Checking: The Fake News Detector cross-references the information with a vast
database of factual knowledge to check for accuracy and inconsistencies. It uses fact-
checking databases and verified sources to validate claims.

Social Media Monitoring: It tracks the dissemination of information on social media


platforms, identifying the spread of potentially false information and its origins.

Machine Learning Model: The system utilizes a trained machine learning model that has been
fed with a vast dataset of known fake and real news articles, enabling it to make predictions
based on patterns and features observed in the text.

Real-Time Updates: The Fake News Detector is continually updated to adapt to evolving
disinformation tactics, staying at the forefront of fake news detection.

User Interface: The tool provides a user-friendly interface for users to submit articles or links
for analysis and view the results in a clear and easily understandable format.

By combining these elements, the Fake News Detector offers a reliable solution for
identifying and flagging potential instances of fake news. It provides users with a confidence
score, indicating the likelihood of the content being accurate or deceptive. This tool can be
used by individuals, news organizations, and social media platforms to combat the spread of
false information and promote more informed and responsible online communication.
CHAPTER 1:
INTRODUCTION

1.1. Identification of Client /Need / Relevant Contemporary issue


Identification of Client

Potential clients for a fake news detector include:

 Individuals: Consumers of news and information who want to be able to identify and
avoid fake news.

 Organizations: Businesses, governments, and other organizations that need to protect


themselves from the negative consequences of fake news, such as reputational
damage, financial loss, and operational disruption.

 Media outlets: News organizations that want to be able to identify and correct fake
news stories that may have been published on their websites or in their print editions.

 Social media platforms: Social media companies that want to be able to reduce the
spread of fake news on their platforms.

Identification of Need

The need for a fake news detector is driven by the following factors:

 The rise of social media: Social media has made it easier than ever for people to
create and share news and information, including fake news.
 The increasing sophistication of fake news: Fake news creators are becoming
increasingly sophisticated in their methods, making it more difficult to distinguish
between real and fake news.
 The negative consequences of fake news: Fake news can have a number of negative
consequences, including:
o Misinform the public
o Damage reputations
o Sow discord and division
o Influence elections
o Lead to violence

1.2. Identification of Problem


Fake news detectors are still under development, and there are a number of problems that
they need to overcome in order to be effective. These problems include:
 Lack of labeled data: Fake news detectors are trained on machine learning
models, which require large amounts of labeled data. However, it can be difficult and
expensive to label fake news data, and there is a shortage of publicly available labeled
data.

 Sophistication of fake news creators: Fake news creators are becoming increasingly
sophisticated in their methods, making it more difficult for fake news detectors to
identify fake news stories.

 Bias: Fake news detectors may be biased towards certain types of fake news or
towards certain political viewpoints.

 Interpretability: It can be difficult to interpret the results of fake news


detectors, making it difficult to understand why a particular story has been flagged as
fake news.

1.3. Identification of Tasks


 Feature extraction: Fake news detectors extract features from news articles and other
types of text that are relevant to identifying fake news. These features may include the
following:

 Feature selection: Fake news detectors select the most informative features from the
extracted features. This is done to reduce the dimensionality of the data and to
improve the performance of the fake news detector.

 Classification: Fake news detectors use a machine learning algorithm to classify news
articles and other types of text as real or fake.

1.4. Timeline
The project timeline is as follows:

Project Initiation and Planning:

 Define project scope, objectives, and requirements.


 Research and select the development stack, including Python and the NetworkX
library.
 Create a project plan with milestones and deadlines.
 Set up the development environment.
 Time Period: 01/10/2023 - 15/10/2023

Development Process:

 Design the network data handling and analysis components.


 Implement data loading, descriptive analysis, centrality measures, connectivity
analysis, community detection, and visualization.
 Conduct thorough testing and debugging.
 Optimize performance and security.
 Time Period: 16/10/2023 - 30/10/2023

Documentation and User Testing:

 Create user documentation and guidelines for using the tool.


 Conduct user testing and gather feedback for final improvements.
 Ensure the application is user-friendly and meets the needs of various users.
 Time Period: 01/11/2024 - 15/11/2024

1.5. Organization of the Report


The project report is structured into distinct chapters to provide a clear and comprehensive
overview of the project. Each chapter serves a specific purpose and contributes to the
understanding of the project's scope, progress, and findings. The following is an outline of the
organization of the report:

I. INTRODUCTION: This chapter outlines the foundational aspects of the project, including
the identification of the client or target audience, the need for the project, the relevant
contemporary issue addressed, the problem definition, and the tasks to be accomplished. It
also presents the project's timeline and work division among the team members. The final
sub-chapter explains how the report is organized to guide readers through the content.

II. DESIGN FLOW/PROCESS: This chapter focuses on the design flow and process adopted
for the project. It elaborates on the methodology and approach used in designing the network
analysis tool. Various design constraints and considerations are discussed to provide insight
into the project's development process.

III. RESULT ANALYSIS/VALIDATION: In this chapter, the findings of the project are
thoroughly analyzed and validated. The key themes and trends that have emerged during the
course of the project are highlighted, and evidence is presented to support these findings. This
chapter offers valuable insights into the potential of the network analysis tool to transform
various industries and domains.

IV. CONCLUSION AND FUTURE WORK: This section serves as the conclusion of the
project report. It summarizes the key findings, accomplishments, and challenges encountered
during the project. Additionally, it discusses the future scope of the project and offers
recommendations for future research and practical applications of similar network analysis
tools.
V. REFERENCES: The final chapter is dedicated to providing a comprehensive list of all the
sources and references cited throughout the report. This section is essential as it enables
readers to access the sources used in the paper, facilitating further exploration and research.

By following this structured organization, the project report aims to provide a clear and
informative narrative of the project's development, findings, and potential impact.
CHAPTER 2:
DESIGN FLOW/PROCESS

2.1 Evaluation & Selection of Specifications/Features


The evaluation and selection of specifications and features in fake news detectors is a
complex and challenging task. There are a number of factors to consider, including the
following:

 Accuracy: How well does the fake news detector identify real and fake news articles?

 Robustness: How resistant is the fake news detector to adversarial attacks?

 Explainability: Can the fake news detector explain why it classified a particular article
as real or fake?

 Efficiency: How quickly and efficiently can the fake news detector classify news
articles?

 Scalability: Can the fake news detector be scaled to handle large volumes of news
articles?

 Deployability: Can the fake news detector be easily deployed and used by a wide
range of users?

2.2 Design Constraints


Fake news detectors are subject to a number of design constraints, including the
following:

 Availability of data: Fake news detectors need to be trained on a large and


representative dataset of real and fake news articles. However, it can be difficult and
expensive to collect and label fake news data.

 Sophistication of fake news creators: Fake news creators are becoming increasingly
sophisticated in their methods, making it more difficult for fake news detectors to
identify fake news stories.

 Bias: Fake news detectors may be biased towards certain types of fake news or
towards certain political viewpoints.

 Interpretability: It can be difficult to interpret the results of fake news


detectors, making it difficult to understand why a particular story has been flagged as
fake news.

 Cost: Developing and deploying a fake news detector can be expensive.


 Privacy: Fake news detectors need to protect the privacy of users. For example, they
should not collect or store personal data without the consent of the user.

2.3 Analysis and Feature Finalization Subject to Constraints


Analyzing and finalizing the features of a fake news detector subject to constraints is a
crucial step in developing an effective and practical system for identifying false information.
Here is a breakdown of the process:

1. Data Collection and Preprocessing:

 Gather a diverse dataset of news articles with labels indicating whether they
are real or fake.

 Preprocess the text data, including tokenization, stop-word removal, and


stemming/lemmatization.

2. Feature Selection:

 Conduct an initial feature selection process to identify a wide range of


potential features, including textual, semantic, and contextual elements.

 Features could include the frequency of specific words or phrases, sentiment


analysis, source credibility, writing style, and more.

3. Feature Analysis:

 Analyze the selected features to understand their importance and relevance in


distinguishing real from fake news.

 Use techniques like correlation analysis, mutual information, or feature


importance scores from machine learning models.

4. Constraint Identification:

 Define constraints based on practical considerations. Constraints can include


computational resources, model complexity, and the need for real-time
processing.

 Ensure that the selected features align with these constraints.

5. Dimensionality Reduction:

 Apply dimensionality reduction techniques such as Principal Component


Analysis (PCA) or feature selection algorithms to reduce the number of
features, especially if computational constraints exist.

2.4 Design Flow


The design flow in fake news detectors typically consists of the following steps:
1. Data collection and labeling: The first step is to collect a dataset of real and fake news
articles. This dataset should be as large and representative as possible. Once the
dataset has been collected, it needs to be labeled, meaning that each article needs to be
identified as real or fake. This can be done manually by human experts or using
automated methods.

2. Feature extraction: Once the dataset has been labeled, the next step is to extract
features from the articles. These features can be linguistic, stylistic, semantic, or
social in nature. For example, linguistic features could include the use of certain
keywords or phrases, while stylistic features could include the length and complexity
of sentences. Semantic features could represent the topics covered in the article, and
social features could represent the number of shares and likes the article has received.

3. Feature selection: Once the features have been extracted, the next step is to select a
subset of features that are most informative and useful for distinguishing between real
and fake news articles. This can be done using a variety of feature selection
techniques, such as recursive feature elimination (RFE) or Lasso.

4. Model training: Once the features have been selected, the next step is to train a
machine learning model to classify news articles as real or fake. There are a variety of
machine learning algorithms that can be used for this task, such as logistic
regression, support vector machines (SVMs), and decision trees.

5. Model evaluation: Once the model has been trained, it needs to be evaluated on a
held-out test set of labeled news articles. This is done to assess the performance of the
model and to identify any areas where it needs to be improved.

6. Model deployment: Once the model has been evaluated and is performing well, it can
be deployed to production. This means making the model available to users so that
they can classify news articles as real or fake.

2.5 Design Selection


There are a number of different design options that can be used in fake news
detectors. Some of the most common options include:

 Rule-based systems: Rule-based systems use a set of handcrafted rules to classify


news articles as real or fake. These rules are typically based on features such as the
source of the article, the author of the article, and the content of the article.

 Machine learning systems: Machine learning systems use a machine learning model
to classify news articles as real or fake. The model is trained on a dataset of labeled
news articles.

 Hybrid systems: Hybrid systems combine rule-based and machine learning


systems. These systems typically use the rule-based system to identify low-hanging
fruit, such as news articles from known fake news websites. The machine learning
system is then used to classify the remaining news articles.

Design Selection Process

The design selection process for a fake news detector should involve the following
steps:

1. Identify the requirements: The first step is to identify the specific requirements of the
fake news detector. This includes identifying the target users, the types of news
articles that need to be classified, and the desired level of accuracy.

2. Evaluate the design options: The next step is to evaluate the different design options
available. This should involve considering the technical and social and ethical factors
listed above.

3. Select the design: Once the design options have been evaluated, the next step is to
select the design that best meets the requirements.

2.6 Implementation Plan/Methodology


The implementation plan/methodology in fake news detectors typically consists of the
following steps:

1. Data collection and preparation

The first step is to collect a dataset of real and fake news articles. The dataset should be as
large and representative as possible. Once the dataset has been collected, it needs to be
prepared for training the fake news detector. This involves cleaning the data, removing
outliers, and normalizing the features.

2. Feature engineering

Once the data has been prepared, the next step is to engineer features. This involves
creating new features from the existing data that are more informative and useful for
distinguishing between real and fake news articles. For example, one could create a
feature that represents the number of claims in the article that are not supported by
evidence.

3. Model selection and training

The next step is to select a machine learning model and train it on the prepared dataset.
There are a variety of machine learning algorithms that can be used for this task, such as
logistic regression, support vector machines (SVMs), and decision trees.

4. Model evaluation
Once the model has been trained, it needs to be evaluated on a held-out test set of labeled
news articles. This is done to assess the performance of the model and to identify any
areas where it needs to be improved.

5. Model deployment

Once the model has been evaluated and is performing well, it can be deployed to
production. This means making the model available to users so that they can classify
news articles as real or fake.
CHAPTER 3:

RESULT ANALYSIS AND VALIDATION


3.1 Tools, Libraries and the Network Used

CHAPTER 4:

CONCLUSION AND FUTURE WORK


4.1. Conclusion
Fake news detectors are a complex and challenging technology. They need to be accurate,
robust, interpretable, efficient, scalable, deployable, fair, transparent, and accountable. By
carefully considering the technical and social and ethical factors involved, it is possible to
design and implement fake news detectors that can help to combat the spread of
misinformation.

4.1.1. Expected Results and Outcomes

The expected results and outcomes of fake news detectors include:

 Reduced spread of misinformation: Fake news detectors can help to reduce the spread
of misinformation by identifying and flagging fake news articles. This can help to
prevent users from being exposed to and believing false information.

 Increased awareness of fake news: Fake news detectors can help to increase
awareness of fake news and how to identify it. This can help users to become more
critical consumers of information and to make more informed decisions.

 Improved accuracy of online discourse: Fake news detectors can help to improve the
accuracy of online discourse by identifying and correcting false information. This can
create a more informed and productive public sphere.
 Enhanced trust in online sources: Fake news detectors can help to enhance trust in
online sources by providing a way to verify the accuracy of information. This can
make it easier for users to find and rely on reliable sources of information.

4.1.2. Deviation from Expected Results

Fake news detectors are still under development, and there are a number of ways in which
they can deviate from expected results. Some of the most common deviations include:

 False positives: Fake news detectors may incorrectly flag real news articles as
fake. This can happen for a variety of reasons, such as if the fake news detector is
trained on a biased dataset or if it is using features that are not generalizable to all
types of news articles.

 False negatives: Fake news detectors may fail to identify fake news articles. This can
happen if the fake news creators are using sophisticated techniques or if the fake news
detector is not trained on a sufficiently representative dataset of fake news articles.

 Bias: Fake news detectors may be biased towards certain types of fake news or
towards certain political viewpoints. This can happen if the fake news detector is
trained on a biased dataset or if it is using features that are correlated with bias.

 Interpretability: It can be difficult to interpret why a particular article has been


classified as fake. This can make it difficult to trust the results of the fake news
detector and to understand how to improve it.

4.1.3. Reasons for Deviation

There are a number of reasons why fake news detectors may deviate from expected results.
Some of the most common reasons include:

Technical limitations

 Limited training data: Fake news detectors are trained on data sets of labeled news
articles. If the training data is limited or biased, it can lead to the detector being less
accurate or having biases of its own.

 Sophisticated fake news creators: Fake news creators are constantly developing new
techniques to evade detection. This can make it difficult for fake news detectors to
keep up and can lead to false negatives.

 Feature selection: The features that are selected to train the fake news detector can
also have a significant impact on its performance. If the wrong features are
selected, the detector may be less accurate or have biases.

 Model complexity: The complexity of the fake news detector model can also affect its
performance. A model that is too complex may be more prone to overfitting, which
can lead to false positives.
 Computational resources: Fake news detectors can be computationally expensive to
train and deploy. This may limit the amount of data that can be used to train the
detector or the complexity of the model that can be used.

Social and ethical factors

 Bias in training data: If the training data is biased, the fake news detector will also be
biased. This can lead to the detector being more likely to flag certain types of news
articles as fake, even when they are real.

 Lack of transparency: If the fake news detector is not transparent about how it works
or the data it is trained on, it can be difficult to identify and address any biases or
limitations that it may have.

 Misuse of the detector: Fake news detectors can be misused for a variety of
purposes, such as to censor legitimate news or to promote misinformation. This can
lead to negative consequences for society.

4.2. Future Work


4.2.1. Way Ahead

Here are some specific areas of research that are promising for the future of fake news
detectors:

 Transfer learning from other domains. Researchers are exploring ways to use transfer
learning from other domains, such as computer vision and natural language
processing, to improve the performance of fake news detectors.

 Multimodal learning. Researchers are also exploring ways to use multimodal


learning, which involves combining data from different modalities, such as text and
images, to improve the performance of fake news detectors.

 Active learning. Active learning is a machine learning technique that allows models to
learn more efficiently by selectively querying the user for labels. Researchers are
exploring ways to use active learning to improve the performance of fake news
detectors.

 Adversarial training. Adversarial training is a machine learning technique that makes


models more robust to adversarial attacks by training them on adversarial
examples. Researchers are exploring ways to use adversarial training to improve the
robustness of fake news detectors.

4.2.2. Suggestions for Extending the Solution

Here are some suggestions for extending the solution in a fake news detector:
Improve the accuracy and robustness of the model. This can be done by using larger and
more diverse training datasets, developing new features that are more resistant to adversarial
attacks, and using training techniques such as adversarial training and transfer learning.

Make the model more interpretable. This can be done by using explainable AI (XAI)
techniques to make the model more transparent and understandable to users. This will help
users to trust the model and to identify areas where it needs to be improved.

Integrate the model with other tools and services. For example, the model could be integrated
with social media platforms to flag fake news articles or with search engines to rank fake
news articles lower in search results.

Make the model more accessible and easier to use. This can be done by developing user-
friendly interfaces and by making the model available through cloud-based services.

You might also like