Final Report Vericheck
Final Report Vericheck
Bachelor of Technology
in
Information Technology
By:
We hereby declare that all the work presented in the dissertation entitled “Veri-Check-ML
powered fake news detection system ” in the partial fulfillment of the requirement for the
award ofthe degree of Bachelor of Technology in Information Technology, Guru Tegh
Bahadur Institute of Technology, Guru Govind Singh Indraprastha University, New Delhi
is an authentic record of our own work carried out under the guidance of Mrs. Debleena.
Date:
ii
CERTIFICATE
Date:
iii
ACKNOWLEDGEMENT
We would like to express our great gratitude towards our supervisor, Mrs. Debleena,
who has given us support and suggestions. Without his/her help we could not have
presented this dissertation up to the present standard. We also take this opportunity to
give thanks to all others who gave us support for the project or in other aspects of our
study at Guru Tegh Bahadur Institute of Technology.
Date:
iv
ABSTRACT
v
discernment, ensuring its judicious dispensation of truth amidst the tempestuous seas
of misinformation.
In sum, this project serves as a clarion call to arms in the struggle against the nefarious
specter of fake news, a testament to the indomitable spirit of human ingenuity and the
transformative potential of machine learning. Through its august endeavors, it heralds
a new epoch in the annals of information integrity, standing as a beacon of
enlightenment amidst the encroaching shadows of falsehood.
vi
LIST OF FIGURES & TABLES
2.1 Browsers 6
4.1 Python 12
4.3 Pycharm 13
4.4 Numpy 14
4.5 Pandas 14
4.6 Sklearn 15
4.7 Streamlit 15
vii
CONTENTS
Title Page
Declaration ii
Certificate iii
Acknowledgement iv
Abstract v, vi
Chapter 1: Introduction 1
2.1: Introduction 5
2.2: Environment 5
viii
4.8:Pickle
4.9:regular expression (re)
4.10:Porter stemmer
Chapter-5: Implementation 17
5.2: Idea 18
5.3: Functionality 18
Chapter-6: Testings 19
8.1: Conclusion 25
References 27
Appendix A. (Screenshots) 29
ix
CHAPTER-1
INTRODUCTION
1
Introduction
In an era defined by the rapid dissemination of information, the prevalence of fake
news poses a critical threat to the integrity of public discourse and societal cohesion.
This project endeavors to combat this menace through the development of a Fake
News Detection system, employing machine learning techniques to discern the
veracity of news articles. Leveraging a robust dataset and state-of-the-art algorithms,
our system aims to provide a bulwark against the propagation of misinformation,
thereby safeguarding the sanctity of truth in the digital age.
Preserving Information Integrity: The propagation of fake news erodes public trust
in the media and undermines the credibility of democratic institutions. This project
aims to preserve the integrity of information dissemination by providing a means to
identify and flag potentially misleading content.
2
Protecting Public Discourse: Fake news can sway public opinion, influence elections,
and incite social unrest. By detecting and mitigating its impact, this project contributes
to maintaining a healthy and informed public discourse.
Addressing a Societal Need: With the prevalence of fake news on the rise, there is a
pressing societal need for reliable tools that can distinguish between factual reporting
and misinformation. This project seeks to address this need by providing a viable
solution.
3
CHAPTER-2
REQUIREMENT ANALYSIS(SRS)
4
2.1 Introduction:
In this chapter we mentioned the software and hardware requirements, which are
necessary for successfully running this system. The major element in building systems
is selecting compatible hardware and software. The system analyst has to determine
what software package is best for “VeriCheck-ML powered fake news detection
system” and, where software is not an issue, the kind of hardware and peripherals
needed for the final conversion.
2.2 Environment:
After analysis, some resources are required to convert the abstract system into the real
one. The hardware and software selection begins with requirement analysis, followed
by a request for proposal and vendor evaluation.
Software and real system are identified. According to the provided functional
specification all the technologies and its capacities are identified. Basic functions and
procedures and methodologies are prepared to implement. Some of the Basic
requirements such as hardware and software are described as follows: -
5
• Google Chrome.
• Safari.
• Mozilla Firefox.
• Opera
• Microsoft Edge.
1. RAM 8 GB / 16 GB(Recommended),
2. 1 GB free disk space
6
CHAPTER-3
SYSTEM DESIGN
7
3.1 Architecture Diagram
This diagram illustrates the sequential flow of processes within the Fake News
Detection system, from data acquisition and preprocessing to model training,
evaluation, deployment, and user interaction.
8
3.2 Sequence Diagram
This sequence diagram depicts the interaction between different components of the
Fake News Detection system, including the user interface, data preprocessing, model
training, and evaluation. It illustrates the sequential flow of actions, from the user
submitting a news article to receiving a prediction on its authenticity.
• The "User" interacts with the "User Interface" to submit a news article.
• The "User Interface" forwards the submitted article to the "Fake News Detection
System."
• The "Fake News Detection System" processes the article through "Data
Preprocessing" and provides a prediction.
9
• The prediction is sent back to the "User Interface," which then presents it to the user.
10
CHAPTER-4
TOOLS AND TECHNOLOGIES
10
This project was created using:-
4.1 Python:
Python is a high-level, interpreted programming language known for its simplicity and
readability.
Python's extensive standard library and vast ecosystem of third-party packages make
it a versatile choice for building complex software applications.
In Fake News Detection project, Python serves as the primary programming language
for implementing data preprocessing, model training, user interface development, and
other tasks.
4.3 PyCharm:
PyCharm is a popular integrated development environment (IDE) specifically
designed for Python development.
It offers a wide range of features, including code editor, debugger, version control
integration, and support for various frameworks and libraries.
PyCharm provides advanced code analysis, refactoring tools, and code completion,
which can enhance productivity and streamline the development process.
In Fake News Detection project, PyCharm can be used as the primary IDE for writing
and organizing code, managing project files, and collaborating with team members. Its
rich set of features and intuitive interface make it well-suited for developing complex
Python applications, such as machine learning systems.
4.5 Pandas: Pandas is a powerful data analysis and manipulation library in Python. It
offers data structures like DataFrame and Series, which make it easy to work with
structured data. Pandas is commonly used for data preprocessing tasks such as
cleaning, transforming, and analyzing datasets.
4.7 Streamlit: Streamlit is a popular Python library for building interactive web
applications. It allows you to create user-friendly interfaces for machine learning
models with minimal code. In your project, Streamlit is used to develop an interactive
user interface where users can input news articles and receive predictions on their
authenticity.
4.10 Porter Stemmer: The Porter Stemmer is an algorithm for reducing words to their
root or base form. It is commonly used in natural language processing tasks to
normalize text data by removing inflections and variations of words. In your project,
the Porter Stemmer may be used as part of the text preprocessing pipeline to
standardize the vocabulary across news articles.
CHAPTER-5
IMPLEMENTATION
5.1 Problem Statement
Addressing the pervasive issue of fake news proliferation, this project aims to develop
a Fake News Detection system leveraging machine learning. By analyzing textual
content, the system will differentiate between credible and misleading news articles.
Objectives include dataset acquisition, preprocessing, model implementation,
performance evaluation, and user interface development. The system's deployment
will enhance accessibility and scalability, contributing to the ongoing battle against
misinformation in the digital realm.
5.2 Idea
Utilize machine learning to develop a system capable of distinguishing between real
and fake news articles. By analyzing textual content, the system will extract features
indicative of misinformation, empowering users to make informed decisions about the
credibility of news sources.
5.3 Functionality
Upload News Article: Users can upload news articles or input text directly into the
application for analysis.
Prediction Trigger: A button or action trigger (e.g., "Predict") allows users to initiate
the prediction process.
18
Prediction Output: After the pSrediction process completes, the system displays the
result (e.g., "Reliable" or "Unreliable") along with any additional information or
metrics.
19
CHAPTER-6
TESTING
6.1 Unit Tests:
Test each data preprocessing function (e.g., text cleaning, tokenization, TF-IDF
vectorization) with sample inputs and expected outputs.
Verify that the model training process updates the classifier's parameters correctly
using synthetic data.
Test the evaluation metrics calculation (e.g., accuracy, precision, recall, F1-score) with
a small test dataset.
Perform end-to-end testing by inputting news articles and verifying the predictions.
Test the integration between the user interface components (e.g., file upload,
prediction button) and the backend machine learning model.
3. System Tests:
Validate the system's response to invalid inputs (e.g., empty text, non-textual input)
and ensure appropriate error messages are displayed.
Measure the system's response time for processing and predicting on different sizes of
news articles and datasets.
Gather feedback from end-users on the usability and intuitiveness of the user interface.
Have users submit news articles and verify the accuracy of the predictions against their
expectations.
Test the system's resilience to common security threats such as SQL injection or cross-
site scripting by inputting malicious payloads.
20
6.6 Deployment Tests:
Test the deployed application on the hosting platform to ensure that it functions as
expected in the production environment.
Conduct regression tests after making updates or changes to the system to ensure that
existing functionalities remain unaffected.
Review and verify the accuracy of project documentation, including user guides,
installation instructions, and API documentation.
By implementing these tests, you can ensure the robustness, reliability, and usability
of your Fake News Detection system across various scenarios and usage conditions.
21
CHAPTER-7
RESULT
7.1 Result Obtained
In this result section, we provide a concise overview of the outcomes and findings of
the Fake News Detection project.
This qualitative evaluation captures end-users' perceptions of the system's ease of use
and effectiveness. Next, we conduct a system evaluation, examining response time and
scalability under varying loads and datasets.
Qualitative analysis delves into insights gleaned from user feedback sessions,
highlighting common issues, suggestions for improvement, and areas of strength.
Finally, we address limitations encountered during the project and propose future
directions for research and development. By presenting comprehensive results and
insights in this section, stakeholders can gain a thorough understanding of the Fake
News Detection system's performance, usability, and potential for further
enhancement.
23
CHAPTER-8
CONCLUSIONS AND SUMMARY
28
8.1 Conclusion
The Fake News Detection system effectively distinguishes between reliable and
unreliable news articles, achieving high accuracy and positive user feedback. Its
scalability and comparative analysis validate its efficacy. Addressing limitations will
enhance its utility, contributing to combating misinformation and upholding information
integrity.
Key Achievements:
25
analyze news streams and promptly flag potentially misleading content.
• Deep Learning Integration: Investigate the integration of deep learning models, such
as recurrent neural networks (RNNs) or transformers, for more nuanced feature
extraction and classification.
• Dynamic Training: Implement mechanisms for dynamic retraining of the model with
updated datasets to adapt to evolving patterns of misinformation.
26
REFERENCES
31
References
[1]. Parikh, S. B., & Atrey, P. K. (2018, April). Media-Rich Fake News Detection: A
Survey. In 2018 IEEE Conference on Multimedia Information Processing and
Retrieval (MIPR) (pp. 436-441). IEEE.
[2]. Conroy, N. J., Rubin, V. L., & Chen, Y. (2015, November). Automatic deception
detection: Methods for finding fake news.
[3]. Helmstetter, S., & Paulheim, H. (2018, August). Weakly supervised learning for
fake news detection on Twitter. In 2018 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining (ASONAM) (pp. 274-277). IEEE.
[4]. Stahl, K. (2018). Fake News Detection in Social Media.
[5]. Della Vedova, M. L., Tacchini, E., Moret, S., Ballarin, G., DiPierro, M., & de
Alfaro, L. (2018, May). Automatic Online Fake News Detection Combining Content
and Social Signals. In 2018 22nd Conference of Open Innovations Association
(FRUCT) (pp. 272-279). IEEE.
[6] Tacchini, E., Ballarin, G., Della Vedova, M. L., Moret, S., & de Alfaro, L. (2017).
Some like it hoax: Automated fake news detection in social networks. arXiv preprint
arXiv:1704.07506.
[7]. Shao, C., Ciampaglia, G. L., Varol, O., Flammini, A., & Menczer, F. (2017). The
spread of fake news by social bots. arXiv preprint arXiv:1707.07592, 96-104.
[8]. Chen, Y., Conroy, N. J., & Rubin, V. L. (2015, November). Misleading online
content: Recognizing clickbait as false news. In Proceedings of the 2015 ACM on
Workshop on Multimodal Deception Detection (pp. 15-19). ACM. 32
[9]. Najafabadi, M. M., Villanustre, F., Khoshgoftaar, T. M., Seliya, N., Wald, R., &
Muharemagic, E. (2015). Deep learning applications and challenges in big data
analytics. Journal of Big Data, 2(1), 1.
[10]. Haiden, L., & Althuis, J. (2018). The Definitional Challenges of Fake News.
[11].GitHub (https://github.com/): A platform for open-source code repositories and
collaboration.
[12]. Scikit-learn Documentation (https://scikit-learn.org/): Comprehensive resources
for machine learning in Python.
[13] Stack Overflow (https://stackoverflow.com/): Community-driven platform for
technical Q&A.
[14] Kaggle (https://www.kaggle.com/): Platform for data science and machine
learning competitions.
[15] Towards Data Science (https://towardsdatascience.com/): Online publication
featuring articles on data science and machine learning.
28
APPENDIX
A.SCREENSHOTS
33
Main Detector
30
Feedback Form
31
APPENDIX
B.SOURCE CODE
44
Jupyter Notebook Code-
import pandas as pd
import numpy as np
df = pd.read_csv(r"C:\Users\aman\OneDrive\Desktop\Major Project\train.csv")
df.head()
id title author \
0 0 House Dem Aide: We Didn’t Even See Comey’s Let... Darrell Lucus
1 1 FLYNN: Hillary Clinton, Big Woman on Campus - ... Daniel J. Flynn
2 2 Why the Truth Might Get You Fired Consortiumnews.com
3 3 15 Civilians Killed In Single US Airstrike Hav... Jessica Purkiss
4 4 Iranian woman jailed for fictional unpublished... Howard Portnoy
text label
0 House Dem Aide: We Didn’t Even See Comey’s Let... 1
1 Ever get the feeling your life circles the rou... 0
2 Why the Truth Might Get You Fired October 29, ... 1
3 Videos 15 Civilians Killed In Single US Airstr... 1
4 Print \nAn Iranian woman has been sentenced to... 1
df.shape
(20800, 5)
df.isnull().sum()
id 0
title 558
author 1957
text 39
label 0
dtype: int64
df = df.fillna(' ')
df.isnull().sum()
id 0
title 0
author 0
text 0
label 0
dtype: int64
df.head()
id title author \
0 0 House Dem Aide: We Didn’t Even See Comey’s Let... Darrell Lucus
1 1 FLYNN: Hillary Clinton, Big Woman on Campus - ... Daniel J. Flynn
2 2 Why the Truth Might Get You Fired Consortiumnews.com
33
3 3 15 Civilians Killed In Single US Airstrike Hav... Jessica Purkiss
4 4 Iranian woman jailed for fictional unpublished... Howard Portnoy
text label
0 House Dem Aide: We Didn’t Even See Comey’s Let... 1
1 Ever get the feeling your life circles the rou... 0
2 Why the Truth Might Get You Fired October 29, ... 1
3 Videos 15 Civilians Killed In Single US Airstr... 1
4 Print \nAn Iranian woman has been sentenced to... 1
df['content'] = df['author'] + " " + df['title']
df['content']
author \
0 Darrell Lucus
1 Daniel J. Flynn
2 Consortiumnews.com
3 Jessica Purkiss
4 Howard Portnoy
... ...
20795 Jerome Hudson
20796 Benjamin Hoffman
20797 Michael J. de la Merced and Rachel Abrams
34
20798 Alex Ansary
20799 David Swanson
text label \
0 House Dem Aide: We Didn’t Even See Comey’s Let... 1
1 Ever get the feeling your life circles the rou... 0
2 Why the Truth Might Get You Fired October 29, ... 1
3 Videos 15 Civilians Killed In Single US Airstr... 1
4 Print \nAn Iranian woman has been sentenced to... 1
... ... ...
20795 Rapper T. I. unloaded on black celebrities who... 0
20796 When the Green Bay Packers lost to the Washing... 0
20797 The Macy’s of today grew from the union of sev... 0
20798 NATO, Russia To Hold Parallel Exercises In Bal... 1
20799 David Swanson is an author, activist, journa... 1
content
0 Darrell Lucus House Dem Aide: We Didn’t Even S...
1 Daniel J. Flynn FLYNN: Hillary Clinton, Big Wo...
2 Consortiumnews.com Why the Truth Might Get You...
3 Jessica Purkiss 15 Civilians Killed In Single ...
4 Howard Portnoy Iranian woman jailed for fictio...
... ...
20795 Jerome Hudson Rapper T.I.: Trump a ’Poster Chi...
20796 Benjamin Hoffman N.F.L. Playoffs: Schedule, Ma...
20797 Michael J. de la Merced and Rachel Abrams Macy...
20798 Alex Ansary NATO, Russia To Hold Parallel Exer...
20799 David Swanson What Keeps the F-35 Alive
35
x=df['content']
y=df['label']
from sklearn.model_selection import train_test_split
x_train , x_test , y_train, y_test = train_test_split(x, y, test_size=0.20)
from sklearn.feature_extraction.text import TfidfVectorizer
vect=TfidfVectorizer()
x_train=vect.fit_transform(x_train)
x_test=vect.transform(x_test)
from sklearn.ensemble import RandomForestClassifier
RF = RandomForestClassifier(random_state = 0)
RF.fit(x_train,y_train)
RandomForestClassifier(random_state=0)
pred_RF = RF.predict(x_test)
RF.score(x_test,y_test)
0.9932692307692308
import pickle
pickle.dump(vect, open('vector.pkl', 'wb'))
pickle.dump(RF, open('model.pkl', 'wb'))
vector_form=pickle.load(open('vector.pkl', 'rb'))
load_model=pickle.load(open('model.pkl', 'rb'))
def fake_news(news):
news=stemming(news)
input_data=[news]
vector_form1=vector_form.transform(input_data)
prediction = load_model.predict(vector_form1)
return prediction
val=fake_news("""Daniel Nussbaum Jackie Mason: Hollywood Would Love Trump i
f He Bombed North Korea over Lack of Trans Bathrooms (Exclusive Video) - Breitb
art
""")
if val==[0]:
print('reliable')
else:
print('unreliable')
reliable
import sklearn
print('The scikit-learn version is {}.'.format(sklearn.__version__))
The scikit-learn version is 1.3.2.
36
Pycharm Code-
import streamlit as st
import datetime
import pickle
import re
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
from sklearn.feature_extraction.text import TfidfVectorizer
port_stem = PorterStemmer()
vectorization = TfidfVectorizer()
def fake_news(news):
news=stemming(news)
input_data=[news]
vector_form1=vector_form.transform(input_data)
prediction = load_model.predict(vector_form1)
return prediction
def stemming(content):
stemmed_content = re.sub('[^a-zA-Z]',' ',content)
stemmed_content = stemmed_content.lower()
stemmed_content = stemmed_content.split()
stemmed_content = [port_stem.stem(word) for word in stemmed_content if not
word in stopwords.words('english')]
stemmed_content = ' '.join(stemmed_content)
return stemmed_content
def feedback():
st.title(" Feedback Form for the Real Ones ")
st.write(
"We appreciate your feedback on our Fake News Detection system. Please take
a moment to share your thoughts with us.")
37
st.write("Rating:")
rating = st.radio("", ["Excellent", "Good", "Fair", "Poor"])
if __name__ == '__main__':
from PIL import Image
img = Image.open("Fake_img.jpg")
38
st.image(img, width=600)
st.title('Fake News Classification app ')
st.subheader("Input the News content below")
sentence = st.text_area("Enter your news content here", "",height=200)
predict_btt = st.button("predict")
if predict_btt:
prediction_class=fake_news(sentence)
print(prediction_class)
if prediction_class == [0]:
st.success('Reliable')
if prediction_class == [1]:
st.warning('Unreliable')
feedback()
39