0% found this document useful (0 votes)
30 views49 pages

Final Report Vericheck

Uploaded by

AMAN SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views49 pages

Final Report Vericheck

Uploaded by

AMAN SHARMA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

Veri-Check

-ML Powered Fake News


Detection System
DISSERTATION

Submitted in partial fulfillment of the


Requirements for the award of the degree
of

Bachelor of Technology

in

Information Technology

By:

Aman Sharma (009/IT-1/2024)


Akhilesh Kumar Jha (008/IT-1/2024)
Kuldeep Tomar (051/IT-1/2024)
Under the guidance of
Mrs. Debleena

Department of Information Technology


Guru Tegh Bahadur Institute of
Technology
Guru Gobind Singh Indraprastha
University
Dwarka, New Delhi
Year 2020-2024
DECLARATION

We hereby declare that all the work presented in the dissertation entitled “Veri-Check-ML
powered fake news detection system ” in the partial fulfillment of the requirement for the
award ofthe degree of Bachelor of Technology in Information Technology, Guru Tegh
Bahadur Institute of Technology, Guru Govind Singh Indraprastha University, New Delhi
is an authentic record of our own work carried out under the guidance of Mrs. Debleena.

Date:

Aman Sharma (009/IT-1/2024)


Akhilesh Kumar Jha (008/IT-1/2024)
Kuldeep Tomar (051/IT-1/2024)

ii
CERTIFICATE

This is to certify that dissertation entitled “Veri-Check-ML powered fake news


detection system”, which issubmitted by Aman Sharma (009/IT1/2024) , Akhilesh
Kumar Jha(008/IT1/2024) , Kuldeep Tomar (051/IT1/2024) in partial fulfillment
of the requirement for the award of the degree of Bachelor of Technology in
Information Technology, Guru Tegh Bahadur Institute of Technology, New Delhi
is an authentic record of the candidate’s own work carried out by them under our
guidance. The matter embodied in this thesis is original and has not been submitted for
the award of any other degree.

Mrs. Debleena Dr. Savneet Kaur


(Project Guide ) Head of IT Department

Date:

iii
ACKNOWLEDGEMENT

We would like to express our great gratitude towards our supervisor, Mrs. Debleena,
who has given us support and suggestions. Without his/her help we could not have
presented this dissertation up to the present standard. We also take this opportunity to
give thanks to all others who gave us support for the project or in other aspects of our
study at Guru Tegh Bahadur Institute of Technology.
Date:

Aman Sharma (009/IT-1/2024)


[email protected]

Akhilesh Kumar Jha (008/IT-1/2024)


[email protected]

Kuldeep Tomar (051/IT-1/2024


[email protected]

iv
ABSTRACT

In the realm of contemporary information dissemination, the proliferation of fake news


has emerged as a formidable challenge, undermining the veracity of public discourse
and impinging upon the integrity of democratic institutions. In response to this
pressing societal concern, this project endeavors to engineer a sophisticated Fake News
Detection system leveraging the prowess of machine learning methodologies.
Central to this endeavor is the acquisition of a robust dataset sourced from Kaggle,
comprising an expansive corpus of approximately 10,000 meticulously curated
instances. These instances, each encapsulating a unique news article, serve as the
crucible upon which our system's discerning capabilities are honed and refined.
Embarking upon the odyssey of data preprocessing, we traverse the terrain of textual
refinement with the aid of venerable Python libraries, including the venerable NumPy
and the indomitable Pandas. Through their meticulous ministrations, we cleanse our
corpus of textual impurities, expunging superfluous stopwords and effectuating the
ethereal process of stemming, thereby distilling the essence of each article into its
quintessential form.
With our data thus sanctified, we embark upon the crucible of feature extraction,
employing the venerable TF-IDF vectorization technique to bestow numerical
semblance upon our corpus, thereby endowing it with the semantic acuity requisite for
machine learning discernment.
Guided by the omniscient aegis of scikit-learn (sklearn), we undertake the epochal task
of model selection, electing to entrust our system's integrity to the esteemed
RandomForestClassifier. Renowned for its felicitous fusion of expediency and
efficacy, this classifier stands as a paragon of virtue in the annals of text classification.
In consonance with the exigencies of contemporary technological zeitgeist, our system
proffers an interactive user interface, a veritable pantheon of accessibility, realized
through the venerable Streamlit library. Through its intuitive embrace, users are
empowered to submit news articles for adjudication, bearing witness to the sagacious
discernment of our machine learning oracle.
Yet, in the crucible of validation, our mettle is tested, and it is here that we invoke the
sacred rites of model evaluation. Through the hallowed metrics of accuracy, precision,
recall, and the venerable F1-score, we quantify the efficacy of our system's

v
discernment, ensuring its judicious dispensation of truth amidst the tempestuous seas
of misinformation.
In sum, this project serves as a clarion call to arms in the struggle against the nefarious
specter of fake news, a testament to the indomitable spirit of human ingenuity and the
transformative potential of machine learning. Through its august endeavors, it heralds
a new epoch in the annals of information integrity, standing as a beacon of
enlightenment amidst the encroaching shadows of falsehood.

vi
LIST OF FIGURES & TABLES

Fig No Figure Name Page

1.1 Fake News Detector Logo 2

2.1 Browsers 6

3.1 Architecture Diagram 8

3.2 Sequence Diagram 9

3.3 Use Case Diagram 10

4.1 Python 12

4.2 Jupyter Notebook 13

4.3 Pycharm 13

4.4 Numpy 14

4.5 Pandas 14

4.6 Sklearn 15

4.7 Streamlit 15

vii
CONTENTS

Chapter Page No.

Title Page

Declaration ii

Certificate iii

Acknowledgement iv

Abstract v, vi

List of Figures and Tables vii

Chapter 1: Introduction 1

Chapter 2: Requirement Analysis (SRS) 4

2.1: Introduction 5

2.2: Environment 5

2.3: Hardware and Software Requirements 5

Chapter 3: System Design 7

3.1: Architecture Diagram 8

3.2: Sequence Diagram 9

3.3: Use Case Diagram 10

Chapter 4: Tools and Technologies 11


4.1: Python
4.2: Jupyter Notebook
4.3: Pycharm
4.4: Numpy
4.5: Pandas
4.6: Sklearn
4.7: Streamlit

viii
4.8:Pickle
4.9:regular expression (re)
4.10:Porter stemmer
Chapter-5: Implementation 17

5.1: Problem Statement 18

5.2: Idea 18

5.3: Functionality 18

5.4: Sections of Project 18

Chapter-6: Testings 19

6.1: Unit Testing 20

6.2: Integration Testing 20

6.3: User Acceptance Testing 20

6.4: Stress Testing 20

6.5: Security Testing 20

6.6: Deployment Testing 20

6.7: Regression Testing 21


6.8: Documentation Testing 21
Chapter-7: Result 22

7.1: Result Obtained 23

Chapter-8: Conclusions and Summary 24

8.1: Conclusion 25

8.2: Future Scope 26

References 27

Appendix A. (Screenshots) 29

Appendix B.(Source Code) 32

ix
CHAPTER-1
INTRODUCTION

1
Introduction
In an era defined by the rapid dissemination of information, the prevalence of fake
news poses a critical threat to the integrity of public discourse and societal cohesion.
This project endeavors to combat this menace through the development of a Fake
News Detection system, employing machine learning techniques to discern the
veracity of news articles. Leveraging a robust dataset and state-of-the-art algorithms,
our system aims to provide a bulwark against the propagation of misinformation,
thereby safeguarding the sanctity of truth in the digital age.

Figure 1.1: Fake News Detector Logo

 Why Fake News Detector?

Combatting Misinformation: In an era where misinformation and fake news


proliferate across digital platforms, it's imperative to develop tools that can discern
between credible and dubious sources of information.

Preserving Information Integrity: The propagation of fake news erodes public trust
in the media and undermines the credibility of democratic institutions. This project
aims to preserve the integrity of information dissemination by providing a means to
identify and flag potentially misleading content.

2
Protecting Public Discourse: Fake news can sway public opinion, influence elections,
and incite social unrest. By detecting and mitigating its impact, this project contributes
to maintaining a healthy and informed public discourse.

Empowering Users: Through an interactive user interface, this project empowers


individuals to verify the authenticity of news articles, enabling them to make more
informed decisions about the information they consume and share.

Harnessing Machine Learning: Leveraging machine learning algorithms allows for


the scalable and efficient analysis of vast amounts of textual data, enabling the
detection of patterns and characteristics indicative of fake news.

Addressing a Societal Need: With the prevalence of fake news on the rise, there is a
pressing societal need for reliable tools that can distinguish between factual reporting
and misinformation. This project seeks to address this need by providing a viable
solution.

Advancing Technological Solutions: By developing a Fake News Detection system,


this project contributes to the advancement of technological solutions to contemporary
societal challenges, showcasing the transformative potential of machine learning in
addressing real-world problems.

3
CHAPTER-2
REQUIREMENT ANALYSIS(SRS)

4
2.1 Introduction:

In this chapter we mentioned the software and hardware requirements, which are
necessary for successfully running this system. The major element in building systems
is selecting compatible hardware and software. The system analyst has to determine
what software package is best for “VeriCheck-ML powered fake news detection
system” and, where software is not an issue, the kind of hardware and peripherals
needed for the final conversion.

2.2 Environment:
After analysis, some resources are required to convert the abstract system into the real
one. The hardware and software selection begins with requirement analysis, followed
by a request for proposal and vendor evaluation.
Software and real system are identified. According to the provided functional
specification all the technologies and its capacities are identified. Basic functions and
procedures and methodologies are prepared to implement. Some of the Basic
requirements such as hardware and software are described as follows: -

2.3 Hardware and Software Specification:

2.3.1 Software Requirements:

• Python (version 3.9)


• Numpy, Pandas
• Matplotlib, Seaborn,
• Jupyter Notebook
• Scikit-learn,Streamlit
• Pickle,Stopwords
• TF-IDF Vectorizer, Porter Stemmer
• re (Regular Expressions)
It is supported by all popular web browsers like:

5
• Google Chrome.
• Safari.
• Mozilla Firefox.
• Opera
• Microsoft Edge.

Figure 2.1: Browsers

2.3.2 Hardware Requirements :

1. RAM 8 GB / 16 GB(Recommended),
2. 1 GB free disk space

6
CHAPTER-3
SYSTEM DESIGN

7
3.1 Architecture Diagram
This diagram illustrates the sequential flow of processes within the Fake News
Detection system, from data acquisition and preprocessing to model training,
evaluation, deployment, and user interaction.

Figure 3.1: Architecture Diagram

8
3.2 Sequence Diagram

This sequence diagram depicts the interaction between different components of the
Fake News Detection system, including the user interface, data preprocessing, model
training, and evaluation. It illustrates the sequential flow of actions, from the user
submitting a news article to receiving a prediction on its authenticity.

User User Interface Data Preprocessing Model Training Model Evalua on


| | | | | |
| Submit news ar cle | | | |
|------------------>| | | | |
| | | | | |
| | Preprocess news ar cle | | |
| |------------------->| | | |
| | | | | |
| | | | Train model | |
| | | |-------------------> | |
| | | | | |
| | | | | Evaluate model|
| | | | |------------------->|
| | | | | |
| | | | | Return metrics |
| | | | |<-------------------|
| | | | | |
| Receive predic on | | | |
|<-------------------| | | | |

Figure 3.2: Sequence Diagram

3.3 Use Case Diagram


In this diagram:

• The "User" interacts with the "User Interface" to submit a news article.
• The "User Interface" forwards the submitted article to the "Fake News Detection
System."
• The "Fake News Detection System" processes the article through "Data
Preprocessing" and provides a prediction.

9
• The prediction is sent back to the "User Interface," which then presents it to the user.

Figure 3.3: Use Case Diagram

10
CHAPTER-4
TOOLS AND TECHNOLOGIES

10
 This project was created using:-
4.1 Python:
Python is a high-level, interpreted programming language known for its simplicity and
readability.

It is widely used in various domains, including web development, data analysis,


machine learning, and scientific computing.

Python's extensive standard library and vast ecosystem of third-party packages make
it a versatile choice for building complex software applications.

In Fake News Detection project, Python serves as the primary programming language
for implementing data preprocessing, model training, user interface development, and
other tasks.

Figure 4.1: Python Programming Language

4.2 Jupyter Notebook:


Jupyter Notebook is an open-source web application that allows you to create and
share documents containing live code, equations, visualizations, and narrative text.

It supports various programming languages, including Python, R, and Julia, making it


ideal for data analysis, visualization, and prototyping.

Jupyter Notebook provides an interactive computing environment where you can


execute code cells individually and see the results in real-time, which is particularly
useful for exploring data and experimenting with machine learning algorithms.
In project, you may use Jupyter Notebook for exploratory data analysis, experimenting
with different preprocessing techniques, and prototyping machine learning models
before integrating them into the final system.

Figure 4.2: Jupyter Notebook

4.3 PyCharm:
PyCharm is a popular integrated development environment (IDE) specifically
designed for Python development.

It offers a wide range of features, including code editor, debugger, version control
integration, and support for various frameworks and libraries.

PyCharm provides advanced code analysis, refactoring tools, and code completion,
which can enhance productivity and streamline the development process.

In Fake News Detection project, PyCharm can be used as the primary IDE for writing
and organizing code, managing project files, and collaborating with team members. Its
rich set of features and intuitive interface make it well-suited for developing complex
Python applications, such as machine learning systems.

Figure 4.3: Pycharm IDE


4.4 NumPy: NumPy is a fundamental library for numerical computing in Python. It
provides support for multidimensional arrays, mathematical functions, and linear
algebra operations. In your project, NumPy is likely used for efficient data
manipulation and numerical operations.

Figure 4.4: Numpy library

4.5 Pandas: Pandas is a powerful data analysis and manipulation library in Python. It
offers data structures like DataFrame and Series, which make it easy to work with
structured data. Pandas is commonly used for data preprocessing tasks such as
cleaning, transforming, and analyzing datasets.

Figure 4.5: Pandas library

4.6 scikit-learn (sklearn): Scikit-learn is a versatile machine learning library in


Python. It provides a wide range of tools for various machine learning tasks, including
classification, regression, clustering, and dimensionality reduction. In your project,
scikit-learn is used for model training, evaluation, and preprocessing techniques like
TF-IDF vectorization.

Figure 4.6: Sklearn library

4.7 Streamlit: Streamlit is a popular Python library for building interactive web
applications. It allows you to create user-friendly interfaces for machine learning
models with minimal code. In your project, Streamlit is used to develop an interactive
user interface where users can input news articles and receive predictions on their
authenticity.

Figure 4.7: Streamlit library


4.8 Pickle: Pickle is a standard library in Python used for serializing and deserializing
Python objects. It allows you to save trained machine learning models to disk and
reload them later for inference without needing to retrain the model each time. In your
project, Pickle is likely used to save and load the trained Fake News Detection model.

4.9 Regular Expressions (re): Regular Expressions, often abbreviated as "re," is a


module in Python used for pattern matching and string manipulation. It provides
powerful tools for searching, matching, and replacing text based on specified patterns.
In your project, regular expressions may be used for text preprocessing tasks like
removing special characters or extracting relevant information from news articles.

4.10 Porter Stemmer: The Porter Stemmer is an algorithm for reducing words to their
root or base form. It is commonly used in natural language processing tasks to
normalize text data by removing inflections and variations of words. In your project,
the Porter Stemmer may be used as part of the text preprocessing pipeline to
standardize the vocabulary across news articles.
CHAPTER-5
IMPLEMENTATION
5.1 Problem Statement

Addressing the pervasive issue of fake news proliferation, this project aims to develop
a Fake News Detection system leveraging machine learning. By analyzing textual
content, the system will differentiate between credible and misleading news articles.
Objectives include dataset acquisition, preprocessing, model implementation,
performance evaluation, and user interface development. The system's deployment
will enhance accessibility and scalability, contributing to the ongoing battle against
misinformation in the digital realm.

5.2 Idea
Utilize machine learning to develop a system capable of distinguishing between real
and fake news articles. By analyzing textual content, the system will extract features
indicative of misinformation, empowering users to make informed decisions about the
credibility of news sources.

5.3 Functionality

1.Acquire and preprocess a dataset of news articles.


2.Implement machine learning algorithms for feature extraction and classification.
3.Evaluate system performance using relevant metrics.
4.Develop an intuitive user interface for article submission and prediction retrieval.
5.Deploy the system for widespread accessibility and usability in combatting fake
news.

5.4 Sections of the Project

 Upload News Article: Users can upload news articles or input text directly into the
application for analysis.

 Prediction Trigger: A button or action trigger (e.g., "Predict") allows users to initiate
the prediction process.

18
 Prediction Output: After the pSrediction process completes, the system displays the
result (e.g., "Reliable" or "Unreliable") along with any additional information or
metrics.

19
CHAPTER-6
TESTING
6.1 Unit Tests:

 Test each data preprocessing function (e.g., text cleaning, tokenization, TF-IDF
vectorization) with sample inputs and expected outputs.
 Verify that the model training process updates the classifier's parameters correctly
using synthetic data.
 Test the evaluation metrics calculation (e.g., accuracy, precision, recall, F1-score) with
a small test dataset.

6.2 Integration Tests:

 Perform end-to-end testing by inputting news articles and verifying the predictions.
 Test the integration between the user interface components (e.g., file upload,
prediction button) and the backend machine learning model.
 3. System Tests:
 Validate the system's response to invalid inputs (e.g., empty text, non-textual input)
and ensure appropriate error messages are displayed.
 Measure the system's response time for processing and predicting on different sizes of
news articles and datasets.

6.3 User Acceptance Tests (UAT):

 Gather feedback from end-users on the usability and intuitiveness of the user interface.
 Have users submit news articles and verify the accuracy of the predictions against their
expectations.

6.4 Stress Testing:

 Simulate multiple users accessing the system simultaneously to evaluate its


performance under high load conditions.

6.5 Security Testing:

 Test the system's resilience to common security threats such as SQL injection or cross-
site scripting by inputting malicious payloads.

20
6.6 Deployment Tests:

 Test the deployed application on the hosting platform to ensure that it functions as
expected in the production environment.

6.7 Regression Tests:

 Conduct regression tests after making updates or changes to the system to ensure that
existing functionalities remain unaffected.

6.8 Documentation Tests:

 Review and verify the accuracy of project documentation, including user guides,
installation instructions, and API documentation.

By implementing these tests, you can ensure the robustness, reliability, and usability
of your Fake News Detection system across various scenarios and usage conditions.

21
CHAPTER-7
RESULT
7.1 Result Obtained

In this result section, we provide a concise overview of the outcomes and findings of
the Fake News Detection project.

Firstly, we present the performance metrics of the system, including accuracy,


precision, recall, and F1-score, which measure its ability to classify news articles
accurately.

Additionally, a confusion matrix visually depicts the model's predictions compared to


the ground truth labels. Secondly, we outline user feedback, assessing the usability and
satisfaction levels of the system.

This qualitative evaluation captures end-users' perceptions of the system's ease of use
and effectiveness. Next, we conduct a system evaluation, examining response time and
scalability under varying loads and datasets.

Comparative analysis against existing solutions or baseline methods offers insights


into the system's efficacy. Visual aids, such as graphs and charts, provide a clear
representation of key metrics and trends, facilitating comprehension of the results.

Qualitative analysis delves into insights gleaned from user feedback sessions,
highlighting common issues, suggestions for improvement, and areas of strength.

Finally, we address limitations encountered during the project and propose future
directions for research and development. By presenting comprehensive results and
insights in this section, stakeholders can gain a thorough understanding of the Fake
News Detection system's performance, usability, and potential for further
enhancement.

23
CHAPTER-8
CONCLUSIONS AND SUMMARY

28
8.1 Conclusion

The Fake News Detection system effectively distinguishes between reliable and
unreliable news articles, achieving high accuracy and positive user feedback. Its
scalability and comparative analysis validate its efficacy. Addressing limitations will
enhance its utility, contributing to combating misinformation and upholding information
integrity.

Key Achievements:

 Developed a robust Fake News Detection system leveraging machine learning


techniques.

 Achieved high accuracy and performance metrics in classifying news articles.

 Received positive user feedback on the system's usability and effectiveness.

 Demonstrated scalability and viability for real-world deployment.

 Conducted comparative analysis validating the system's efficacy against existing


solutions.

 Contributed to combating misinformation and upholding information integrity in the


digital age.

8.2 Future Scope

• Enhanced Model Architecture: Explore advanced machine learning algorithms and


ensemble techniques to improve classification accuracy and robustness.

• Multi-lingual Support: Extend the system's capabilities to detect fake news in


multiple languages to address global misinformation challenges.

• Real-time Monitoring: Implement a real-time monitoring system to continuously

25
analyze news streams and promptly flag potentially misleading content.

• Deep Learning Integration: Investigate the integration of deep learning models, such
as recurrent neural networks (RNNs) or transformers, for more nuanced feature
extraction and classification.

• Dynamic Training: Implement mechanisms for dynamic retraining of the model with
updated datasets to adapt to evolving patterns of misinformation.

• Cross-platform Deployment: Extend deployment to mobile platforms and browser


extensions for wider accessibility and user reach

• User Engagement Features: Integrate user engagement features, such as user


feedback mechanisms and community-driven content validation, to enhance system
effectiveness.

• Collaborative Filtering: Incorporate collaborative filtering techniques to personalize


news recommendations and enhance user trust in the platform

• Ethical Considerations: Address ethical implications, such as privacy concerns and


algorithmic biases, to ensure responsible development and deployment of the system.

• Partnerships and Outreach: Collaborate with media organizations, fact-checking


initiatives, and governmental agencies to amplify the impact of the Fake News
Detection system and promote information integrity globally.

26
REFERENCES

31
References

[1]. Parikh, S. B., & Atrey, P. K. (2018, April). Media-Rich Fake News Detection: A
Survey. In 2018 IEEE Conference on Multimedia Information Processing and
Retrieval (MIPR) (pp. 436-441). IEEE.
[2]. Conroy, N. J., Rubin, V. L., & Chen, Y. (2015, November). Automatic deception
detection: Methods for finding fake news.
[3]. Helmstetter, S., & Paulheim, H. (2018, August). Weakly supervised learning for
fake news detection on Twitter. In 2018 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining (ASONAM) (pp. 274-277). IEEE.
[4]. Stahl, K. (2018). Fake News Detection in Social Media.
[5]. Della Vedova, M. L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., & de
Alfaro, L. (2018, May). Automatic Online Fake News Detection Combining Content
and Social Signals. In 2018 22nd Conference of Open Innovations Association
(FRUCT) (pp. 272-279). IEEE.
[6] Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S., & de Alfaro, L. (2017).
Some like it hoax: Automated fake news detection in social networks. arXiv preprint
arXiv:1704.07506.
[7]. Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017). The
spread of fake news by social bots. arXiv preprint arXiv:1707.07592, 96-104.
[8]. Chen, Y., Conroy, N. J., & Rubin, V. L. (2015, November). Misleading online
content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on
Workshop on Multimodal Deception Detection (pp. 15-19). ACM. 32
[9]. Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., &
Muharemagic, E. (2015). Deep learning applications and challenges in big data
analytics. Journal of Big Data, 2(1), 1.
[10]. Haiden, L., & Althuis, J. (2018). The Definitional Challenges of Fake News.
[11].GitHub (https://github.com/): A platform for open-source code repositories and
collaboration.
[12]. Scikit-learn Documentation (https://scikit-learn.org/): Comprehensive resources
for machine learning in Python.
[13] Stack Overflow (https://stackoverflow.com/): Community-driven platform for
technical Q&A.
[14] Kaggle (https://www.kaggle.com/): Platform for data science and machine
learning competitions.
[15] Towards Data Science (https://towardsdatascience.com/): Online publication
featuring articles on data science and machine learning.

28
APPENDIX
A.SCREENSHOTS

33
Main Detector

30
Feedback Form

31
APPENDIX
B.SOURCE CODE

44
Jupyter Notebook Code-

import pandas as pd
import numpy as np
df = pd.read_csv(r"C:\Users\aman\OneDrive\Desktop\Major Project\train.csv")
df.head()
id title author \
0 0 House Dem Aide: We Didn’t Even See Comey’s Let... Darrell Lucus
1 1 FLYNN: Hillary Clinton, Big Woman on Campus - ... Daniel J. Flynn
2 2 Why the Truth Might Get You Fired Consortiumnews.com
3 3 15 Civilians Killed In Single US Airstrike Hav... Jessica Purkiss
4 4 Iranian woman jailed for fictional unpublished... Howard Portnoy

text label
0 House Dem Aide: We Didn’t Even See Comey’s Let... 1
1 Ever get the feeling your life circles the rou... 0
2 Why the Truth Might Get You Fired October 29, ... 1
3 Videos 15 Civilians Killed In Single US Airstr... 1
4 Print \nAn Iranian woman has been sentenced to... 1
df.shape
(20800, 5)
df.isnull().sum()
id 0
title 558
author 1957
text 39
label 0
dtype: int64
df = df.fillna(' ')
df.isnull().sum()
id 0
title 0
author 0
text 0
label 0
dtype: int64
df.head()
id title author \
0 0 House Dem Aide: We Didn’t Even See Comey’s Let... Darrell Lucus
1 1 FLYNN: Hillary Clinton, Big Woman on Campus - ... Daniel J. Flynn
2 2 Why the Truth Might Get You Fired Consortiumnews.com

33
3 3 15 Civilians Killed In Single US Airstrike Hav... Jessica Purkiss
4 4 Iranian woman jailed for fictional unpublished... Howard Portnoy

text label
0 House Dem Aide: We Didn’t Even See Comey’s Let... 1
1 Ever get the feeling your life circles the rou... 0
2 Why the Truth Might Get You Fired October 29, ... 1
3 Videos 15 Civilians Killed In Single US Airstr... 1
4 Print \nAn Iranian woman has been sentenced to... 1
df['content'] = df['author'] + " " + df['title']
df['content']

0 Darrell Lucus House Dem Aide: We Didn’t Even S...


1 Daniel J. Flynn FLYNN: Hillary Clinton, Big Wo...
2 Consortiumnews.com Why the Truth Might Get You...
3 Jessica Purkiss 15 Civilians Killed In Single ...
4 Howard Portnoy Iranian woman jailed for fictio...
...
20795 Jerome Hudson Rapper T.I.: Trump a ’Poster Chi...
20796 Benjamin Hoffman N.F.L. Playoffs: Schedule, Ma...
20797 Michael J. de la Merced and Rachel Abrams Macy...
20798 Alex Ansary NATO, Russia To Hold Parallel Exer...
20799 David Swanson What Keeps the F-35 Alive
Name: content, Length: 20800, dtype: object
df
id title \
0 0 House Dem Aide: We Didn’t Even See Comey’s Let...
1 1 FLYNN: Hillary Clinton, Big Woman on Campus - ...
2 2 Why the Truth Might Get You Fired
3 3 15 Civilians Killed In Single US Airstrike Hav...
4 4 Iranian woman jailed for fictional unpublished...
... ... ...
20795 20795 Rapper T.I.: Trump a ’Poster Child For White S...
20796 20796 N.F.L. Playoffs: Schedule, Matchups and Odds -...
20797 20797 Macy’s Is Said to Receive Takeover Approach by...
20798 20798 NATO, Russia To Hold Parallel Exercises In Bal...
20799 20799 What Keeps the F-35 Alive

author \
0 Darrell Lucus
1 Daniel J. Flynn
2 Consortiumnews.com
3 Jessica Purkiss
4 Howard Portnoy
... ...
20795 Jerome Hudson
20796 Benjamin Hoffman
20797 Michael J. de la Merced and Rachel Abrams

34
20798 Alex Ansary
20799 David Swanson

text label \
0 House Dem Aide: We Didn’t Even See Comey’s Let... 1
1 Ever get the feeling your life circles the rou... 0
2 Why the Truth Might Get You Fired October 29, ... 1
3 Videos 15 Civilians Killed In Single US Airstr... 1
4 Print \nAn Iranian woman has been sentenced to... 1
... ... ...
20795 Rapper T. I. unloaded on black celebrities who... 0
20796 When the Green Bay Packers lost to the Washing... 0
20797 The Macy’s of today grew from the union of sev... 0
20798 NATO, Russia To Hold Parallel Exercises In Bal... 1
20799 David Swanson is an author, activist, journa... 1

content
0 Darrell Lucus House Dem Aide: We Didn’t Even S...
1 Daniel J. Flynn FLYNN: Hillary Clinton, Big Wo...
2 Consortiumnews.com Why the Truth Might Get You...
3 Jessica Purkiss 15 Civilians Killed In Single ...
4 Howard Portnoy Iranian woman jailed for fictio...
... ...
20795 Jerome Hudson Rapper T.I.: Trump a ’Poster Chi...
20796 Benjamin Hoffman N.F.L. Playoffs: Schedule, Ma...
20797 Michael J. de la Merced and Rachel Abrams Macy...
20798 Alex Ansary NATO, Russia To Hold Parallel Exer...
20799 David Swanson What Keeps the F-35 Alive

[20800 rows x 6 columns]


from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
import re
port_stem=PorterStemmer()
port_stem
<PorterStemmer>
def stemming(content):
stemmed_content = re.sub('[^a-zA-Z]',' ',content)
stemmed_content = stemmed_content.lower()
stemmed_content = stemmed_content.split()
stemmed_content = [port_stem.stem(word) for word in stemmed_content if not w
ord in stopwords.words('english')]
stemmed_content = ' '.join(stemmed_content)
return stemmed_content
df['content'] = df['content'].apply(stemming)

35
x=df['content']
y=df['label']
from sklearn.model_selection import train_test_split
x_train , x_test , y_train, y_test = train_test_split(x, y, test_size=0.20)
from sklearn.feature_extraction.text import TfidfVectorizer
vect=TfidfVectorizer()
x_train=vect.fit_transform(x_train)
x_test=vect.transform(x_test)
from sklearn.ensemble import RandomForestClassifier
RF = RandomForestClassifier(random_state = 0)
RF.fit(x_train,y_train)
RandomForestClassifier(random_state=0)
pred_RF = RF.predict(x_test)
RF.score(x_test,y_test)
0.9932692307692308
import pickle
pickle.dump(vect, open('vector.pkl', 'wb'))
pickle.dump(RF, open('model.pkl', 'wb'))
vector_form=pickle.load(open('vector.pkl', 'rb'))
load_model=pickle.load(open('model.pkl', 'rb'))
def fake_news(news):
news=stemming(news)
input_data=[news]
vector_form1=vector_form.transform(input_data)
prediction = load_model.predict(vector_form1)
return prediction
val=fake_news("""Daniel Nussbaum Jackie Mason: Hollywood Would Love Trump i
f He Bombed North Korea over Lack of Trans Bathrooms (Exclusive Video) - Breitb
art
""")
if val==[0]:
print('reliable')
else:
print('unreliable')
reliable
import sklearn
print('The scikit-learn version is {}.'.format(sklearn.__version__))
The scikit-learn version is 1.3.2.

36
Pycharm Code-
import streamlit as st
import datetime
import pickle
import re
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
port_stem = PorterStemmer()
vectorization = TfidfVectorizer()

vector_form = pickle.load(open('vector.pkl', 'rb'))


load_model = pickle.load(open('model.pkl', 'rb'))

def fake_news(news):
news=stemming(news)
input_data=[news]
vector_form1=vector_form.transform(input_data)
prediction = load_model.predict(vector_form1)
return prediction

def stemming(content):
stemmed_content = re.sub('[^a-zA-Z]',' ',content)
stemmed_content = stemmed_content.lower()
stemmed_content = stemmed_content.split()
stemmed_content = [port_stem.stem(word) for word in stemmed_content if not
word in stopwords.words('english')]
stemmed_content = ' '.join(stemmed_content)
return stemmed_content

def feedback():
st.title(" Feedback Form for the Real Ones ")

st.write(
"We appreciate your feedback on our Fake News Detection system. Please take
a moment to share your thoughts with us.")

# Text input for user's name


name = st.text_input("Your Name:")

# Text area for user's feedback


feedback = st.text_area("Your Feedback:")

# Radio buttons for rating the system

37
st.write("Rating:")
rating = st.radio("", ["Excellent", "Good", "Fair", "Poor"])

# Checkbox for additional feedback options


additional_feedback = st.checkbox("Would you like to provide additional
feedback?")

# Conditional input fields based on additional feedback checkbox


if additional_feedback:
suggestion = st.text_area("Suggestions for Improvement:")
email = st.text_input("Email (optional):")

# Button to submit feedback


if st.button("Submit Feedback"):
if validate_feedback(name, feedback, rating):
save_feedback(name, feedback, rating, suggestion, email)
st.success("Thank you for your feedback!")
else:
st.error("Please make sure all required fields are filled.")

def validate_feedback(name, feedback, rating):


# Basic validation to ensure required fields are filled
if not name or not feedback or not rating:
return False
return True

def save_feedback(name, feedback, rating, suggestion, email):


# Code to save feedback to a database or file
timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
with open("feedback.txt", "a") as file:
file.write(f"Timestamp: {timestamp}\n")
file.write(f"Name: {name}\n")
file.write(f"Feedback: {feedback}\n")
file.write(f"Rating: {rating}\n")
if suggestion:
file.write(f"Suggestions: {suggestion}\n")
if email:
file.write(f"Email: {email}\n")
file.write("-" * 50 + "\n\n")

if __name__ == '__main__':
from PIL import Image
img = Image.open("Fake_img.jpg")

38
st.image(img, width=600)
st.title('Fake News Classification app ')
st.subheader("Input the News content below")
sentence = st.text_area("Enter your news content here", "",height=200)
predict_btt = st.button("predict")
if predict_btt:
prediction_class=fake_news(sentence)
print(prediction_class)
if prediction_class == [0]:
st.success('Reliable')
if prediction_class == [1]:
st.warning('Unreliable')

feedback()

39

You might also like