0% found this document useful (0 votes)

40 views34 pages

Internshipreport 15

Its internship report

Uploaded by

Sindhu Dhavanam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views34 pages

Internshipreport 15

Its internship report

Uploaded by

Sindhu Dhavanam

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Stanley College of Engineering and Technology for

Women(Autonomous)

(Approved by AICTE, Accredited by NBA and NAAC, Affiliated to Osmania University)

DEPARTMENT OF COMPUTER SCIENCE AND

ENGINEERING

Roll no Name Year Branch Section

160621733015 Dhavanam Sindhu IV CSE A

GUIDE DETAILS:

Internal Guide Details: Mrs.B.G.Prasuna, Assistant Professor, CSE at Stanley College of

Engineering and Technology for Women

Internal Guide Evaluator

FAKE NEWS DETECTION
DECLARATION

I here by declare that the summer internship program entitled Fake News
Detection submitted to the Stanley College of Engineering and Technology for
Women, is a record of original work done by me. This project work is submitted
in partial fulfilment of the requirements for the award of the degree of the B.E. in
Computer Science and Engineering.

Dhavanam Sindhu 160621733015

ACKNOWLEDGEMENT

I wish to express my sincere thanks to Sri. Kodali Krishna Rao, Correspondent and Secretary,
for providing with all the necessary facilities.

I place on record, my sincere gratitude to Dr.Satya Prasad Lanka, Principal, for his constant
encouragement.

I deeply express my sincere thanks to my Head of the Department, Dr.YVS Sai Pragathi, for
encouraging and allowing to present the internship on the topic “Fake News Detection” at my
department premises for the partial fulfilment of the requirements leading to the award of the
B.E. degree.

It is my privilege to express sincere regards to my Project Guide Mrs.B.G.Prasuna for the

valuable inputs, able guidance, encouragement, whole-hearted cooperation and constructive
criticism throughout the duration of my project.

I thank my Project Coordinator Mrs.B.G.Prasuna for her guidance and support throughout the
duration of my project. I take this opportunity to thank all my faculty, who have directly or
indirectly helped my project.

Last but not the least, I express my thanks to my friends for their cooperation and support.
TABLE OF CONTENTS

1. INTRODUCTION

1.1 Overview

1.2 Purpose

2. LITERATURE SURVEY

3. THEORETICAL ANALYSIS

3.1 Block diagram

3.2 Hardware & Software designing

4. EXPERIMENTAL INVESTIGATION

4.1 Data and Features

4.2 BACKGROUND INFORMATION

5. FLOWCHART

6. RESULT

7. ADVANTAGES & DISADVANTAGES

8. APPLICATIONS

9. CONCLUSION

10. FUTURE SCOPE

11. BIBLIOGRAPHY

APPENDIX
A. Source code
LIST OF FIGURES

FIG 3.1 Block Diagram

FIG 5 Flowchart
FIG 6.1 Result 1
FIG 6.2 Result 2
FIG 6.3 Result 3
FIG 6.4 Result 4
ABSTRACT

This project addresses the critical issue of fake news detection through a machine learning
approach, employing a logistic regression model to classify news articles as either true or fake.
The dataset comprises two CSV files containing labeled news articles, which are combined and
preprocessed for analysis. Key preprocessing steps include text cleaning, which involves
converting text to lowercase, removing URLs, punctuation, and stopwords, as well as merging
the title and body of the articles into a single content feature.

The text data is transformed into numerical features using the Term Frequency-Inverse
Document Frequency (TF-IDF) vectorization technique, which effectively captures the
significance of words in the context of the dataset. The dataset is split into training and testing
subsets, with 75% allocated for training the model and 25% for evaluating its performance.

Model performance is assessed using accuracy, classification reports, and confusion matrices,
which provide insights into the model's predictive capabilities. Additionally, the Receiver
Operating Characteristic (ROC) curve is plotted to evaluate the trade-off between the true
positive rate and the false positive rate, while the area under the ROC curve (AUC) is computed
to quantify model performance. A validation curve is generated to analyze the effect of the
logistic regression parameter CCC on accuracy, enhancing understanding of model behavior
under varying regularization strengths.

The results indicate a robust capability for distinguishing between true and fake news articles,
thereby demonstrating the potential of machine learning in combating misinformation in
today’s digital landscape.

Keywords: Fake News Detection, Machine Learning, Logistic Regression, Preprocessing, TF-
IDF Vectorization, Training and Testing, Model Performance, ROC Curve, AUC (Area Under
Curve)

Languages used: Python

Project Domain: Data Science
1. INTRODUCTION

1.1 OVERVIEW

This project focuses on developing a machine learning-based fake news detection system that
employs natural language processing (NLP) techniques to analyze and classify news articles as
either true or fake. Given the prevalence of misinformation and the rapid dissemination of false
information on social media and other online platforms, this project aims to create a reliable
solution that aids in identifying fake news. By leveraging logistic regression and feature
extraction methods such as TF-IDF, the system is designed to achieve high accuracy in
distinguishing between legitimate and misleading news articles. The ultimate purpose of this
project is to contribute to a more informed society by providing users with tools to critically
evaluate news content, thereby fostering media literacy and reducing the impact of fake news
on public perception and opinion.

In the broader context, the project underscores the importance of employing advanced analytics
and machine learning methodologies to tackle pressing social issues. Fake news has emerged
as a formidable challenge that affects political landscapes, public health, and societal norms.
By integrating machine learning with NLP, this project not only showcases the technical
feasibility of such solutions but also emphasizes the need for continuous improvement and
innovation in combating misinformation. The project's implementation can serve as a
foundation for further research and development, enhancing the effectiveness of fake news
detection systems and their applicability across various media platforms.

1.2 PURPOSE
The primary purpose of this project is to develop a robust and effective fake news detection
system that utilizes machine learning techniques to automatically classify news articles as
either true or fake. As misinformation continues to proliferate across various platforms, the
necessity for reliable tools that can help individuals and organizations assess the credibility of
news content has never been more critical. This project aims to:

1. Enhance Media Literacy: By providing users with an automated tool to evaluate news
articles, the project promotes critical thinking and informed decision-making among
users regarding the information they consume.
2. Support Journalistic Integrity: The system serves as a valuable resource for
journalists and news organizations, enabling them to fact-check information quickly
and accurately before publication.
3. Combat Misinformation: By identifying and flagging false information, the project
contributes to mitigating the impact of fake news on public perception, political
discourse, and societal behavior.
4. Advance Research in NLP and ML: The project serves as a foundation for further
research and exploration in natural language processing and machine learning,
providing insights that can be leveraged to improve detection methodologies and
explore new approaches to combat misinformation.
2. LITERATURE SURVEY
The fake news detection project centers around the exploration of machine learning algorithms
and NLP techniques applied to the domain of online misinformation. Fake news detection has
become increasingly relevant due to the rapid dissemination of false information on social
media and other platforms. Various research efforts have examined the role of text classification
models, leveraging machine learning techniques such as Logistic Regression, Support Vector
Machines (SVM), and neural networks. These models are trained on large datasets of fake and
true news articles, which are labeled and used to distinguish between reliable and misleading
content. One of the critical elements in such studies is feature extraction using techniques like
Term Frequency-Inverse Document Frequency (TF-IDF) to transform text data into numerical
vectors, which can be processed by machine learning models. In this project, the Logistic
Regression model is employed, which is widely recognized for its effectiveness in binary
classification problems like fake news detection. Numerous studies highlight Logistic
Regression as a baseline model for text classification, comparing its performance against more
advanced models such as Random Forest and deep learning approaches like Convolutional
Neural Networks (CNN) or Long Short-Term Memory (LSTM) networks. However, Logistic
Regression remains favored for its simplicity, ease of interpretation, and relatively fast training
process. Additionally, the use of TF-IDF for feature extraction aligns with established literature,
which emphasizes its role in transforming text into a format that preserves important word-level
distinctions. Preprocessing steps such as data cleaning, tokenization, and the removal of
stopwords are also widely discussed in previous research. This project implements such
preprocessing techniques to enhance the model's performance by filtering out noise in the
dataset, including punctuation, digits, URLs, and common stopwords. These steps ensure that
only meaningful textual content is analyzed, contributing to the model's ability to classify fake
and true news articles accurately. Furthermore, studies have highlighted the importance of
combining different feature extraction techniques with traditional models to boost classification
accuracy. This aligns with the current project's approach of leveraging TF-IDF with Logistic
Regression. The survey also reveals ongoing challenges in addressing false positives and
improving model robustness. Many research papers emphasize the limitations of classical
machine learning models in dealing with highly deceptive or contextually ambiguous content.
As a result, future directions in this area point toward integrating more complex models such as
transformers or BERT (Bidirectional Encoder Representations from Transformers), which are
capable of understanding semantic relationships within text.
3. THEORETICAL ANALYSIS
3.1 BLOCK DIAGRAM

Input Data Pre- Feature Model

Dataset Processing Extraction Training

Output Prediction Model

Results Evaluation

3.2 HARDWARE AND SOFTWARE DESIGNING

HARDWARE DESIGNING

1. Intel core i5/i7 processor

2. Hard disk
3. 4GB RAM
4. Laptop/Desktop
SOFTWARE DESIGNING

The fake news detection project outlined above leverages a combination of machine learning
techniques, natural language processing (NLP), and logistic regression to differentiate between
fake and true news articles. Below is a step-by-step breakdown of the software requirements
based on the code you have provided.

1. Google Colab

Step 1: Project Setup:

Open Google Collaboratory and Sign-in through your mail.
Open a new notebook on your colab to contain your gender and age prediction project.

Step 2: Data Loading and Preparation

Load Pre-trained Dataset:
Use Google Colab’s file upload feature to upload your pre-trained models for fake news
detection. Libraries Used: pandas
You’ll need the following files:
 For Fake news detection model Load Dataset files: True.csv and Fake.csv
 Labeling them as "True" (1) or "Fake" (0).
 Age These datasets are then concatenated into one unified dataframe.

Use the "Upload" button in Colab or the Dataset files.upload() function to upload these files.

Step 3: Load OpenCV and Models

Install OpenCV: If OpenCV isn’t already installed in Colab, install it by running the following
command:

 Type !pip install opencv-python in a new code cell to install OpenCV.

Load Pre-trained Models:

 Once you upload your models, load them into your Colab notebook using OpenCV’s
cv2.dnn.readNet() method.

Step 4: Text Preprocessing

For cleaning text:

 Libraries Used: re, string, nltk

 This step cleans the text by removing URLs, punctuation, digits, and stopwords. The
goal is to normalize the text and prepare it for feature extraction.
 The stopwords are then removed using nltk

Step 5: Feature Extraction (TF-IDF)

 Libraries Used: sklearn

 The text data is vectorized using TfidfVectorizer, which converts the text into
numerical features based on word frequency.

Step 6: Model Training

 Libraries Used: sklearn (LogisticRegression)

 A logistic regression model is trained on the TF-IDF features extracted from the text
data.
Step 7: Model Evaluation and Visualization

Visualizing the confusion matrix:

 Libraries Used: sklearn (accuracy_score, confusion_matrix, roc_curve, auc),

matplotlib, seaborn
 The trained model is evaluated on the test set, and the performance is visualized through
metrics like accuracy, confusion matrix, and ROC curve.

Step 8: Model Validation

 Libraries Used: sklearn (validation_curve)

 The validation curve is plotted to explore the effect of the logistic regression
regularization parameter (C) on model performance.

Step 9: Model Prediction

 Objective: To use the trained logistic regression model to predict whether the news
articles in the test set are real (True) or fake.
 The logistic regression model, after being trained on the training data (X_train_tfidf,
y_train), is now ready to make predictions on the test data (X_test_tfidf). This involves
inputting the TF-IDF features of the test dataset into the model to generate class
predictions (either "True" or "Fake").

Step 10: Analyze the Output

1.Accuracy and Classification Report

 Objective: To assess the performance of the model by comparing the predicted labels
(y_pred) against the actual labels (y_test), and to understand the quality of the
predictions using accuracy and detailed classification metrics.

Key Libraries Used:

 sklearn (accuracy_score, classification_report): These functions provide a detailed

evaluation of the model’s performance.
 The accuracy score is the percentage of correct predictions made by the model out of
all the predictions. In a fake news detection task, accuracy alone might not be enough
to evaluate the performance because of class imbalance (i.e., if the dataset contains
more true news than fake news, predicting most news as "True" could still yield a high
accuracy). Therefore, we also examine the classification report, which provides more
detailed metrics like precision, recall, and F1-score for both the "Fake" and "True"
classes.
 Precision is the ratio of true positive predictions (correctly predicted "True" or "Fake")
to the total number of positive predictions.
 Recall is the ratio of true positive predictions to the actual number of positive instances
in the test data.
 The F1-score is the harmonic mean of precision and recall, providing a balanced
measure when there's a trade-off between the two.
 These metrics give a comprehensive view of how well the model performs in detecting
both "Fake" and "True" news.

2. Confusion Matrix Visualization

 Objective: To visually represent the model's prediction performance in a matrix format,

showing the counts of correct and incorrect predictions for each class (True vs. Fake).

Key Libraries Used:

 sklearn (confusion_matrix), matplotlib, and seaborn: These are used to create and
display a confusion matrix plot.
 The confusion matrix gives a direct comparison of the actual labels versus the
predicted labels:

 True Positives (TP): Articles correctly identified as "True."

 True Negatives (TN): Articles correctly identified as "Fake."
 False Positives (FP): Articles incorrectly classified as "True" but are actually
"Fake."
 False Negatives (FN): Articles incorrectly classified as "Fake" but are
actually "True."
 This visualization helps in understanding where the model makes errors,
which is critical for improving future model iterations.
3. ROC Curve and AUC Score:

 Objective: To evaluate the model’s ability to distinguish between true and fake news
through the ROC (Receiver Operating Characteristic) curve, which plots the True
Positive Rate (TPR) against the False Positive Rate (FPR) for different threshold values.

Key Libraries Used:

 sklearn (roc_curve, auc) and matplotlib: These are used to compute and plot the
ROC curve.
 The ROC curve is a graphical representation of the trade-off between true positives
and false positives. A model that performs well will have a curve that is closer to the
top-left corner.
 The AUC (Area Under the Curve) score quantifies the overall performance of the
model, where a score closer to 1 indicates better performance.
4. EXPERIMENTAL INVESTIGATION

4.1 DATA AND FEATURES

The dataset used for this fake news detection project consists of two CSV files, one containing
true news articles and the other containing fake news articles. Each dataset includes textual
information in the form of article titles and the full body of the text. Labels were manually
assigned to these datasets: a label of "1" was given to the true news articles, and "0" to the fake
news articles. The two datasets were merged into a single dataset for simplicity, combining the
titles and text into a "content" field that forms the basis for feature extraction. By focusing on
both titles and article bodies, this approach ensures that the model considers all possible cues
in detecting the veracity of a news item.

Text preprocessing steps, including converting text to lowercase, removing URLs, punctuation,
numbers, and stopwords, are applied to the content field to clean and standardize the input data.
The TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer is used to convert the
processed text into numerical features that represent the importance of words relative to the
entire dataset. The vectorizer is restricted to 5,000 features to balance between the model’s
efficiency and the richness of the textual information. These numerical features serve as input
to the machine learning model, allowing it to capture linguistic patterns that distinguish fake
news from real news, based on word frequency and distribution.

4.2 BACKGROUND INFORMATION

DESIGN PRINCIPLES

 The design principles of the fake news detection system prioritize accuracy,
efficiency, and scalability, ensuring that the system meets the needs of its users
effectively.
 Accuracy is of utmost importance; the system must reliably differentiate between true
and false news articles to build and maintain user trust.
 To achieve accuracy, the system employs rigorous preprocessing steps, which include
text normalization, removal of stopwords, and the application of TF-IDF
vectorization to transform textual data into meaningful numerical features suitable
for machine learning algorithms.
 The selection of logistic regression as the classification model is deliberate,
emphasizing its interpretability and proven effectiveness in binary classification
tasks.
 Efficiency in processing and analyzing large volumes of text data is essential for the
system's performance, ensuring quick results and seamless user experience.
 The system utilizes optimized data handling and feature extraction techniques to
guarantee rapid processing times, enabling real-time analysis of news articles as they
are published.
 The design also considers scalability to handle increasing datasets, ensuring the system
remains functional as more articles are introduced over time.
 By incorporating a modular architecture, the system can be easily updated or
expanded with new algorithms and features, allowing it to stay relevant and effective
in addressing emerging challenges in the realm of fake news detection.
 The design principles focus on creating a system that is both robust and adaptive,
catering to the evolving nature of digital news consumption and misinformation.
 Overall, the emphasis on accuracy, efficiency, and scalability ensures that the fake news
detection system can effectively serve its purpose while maintaining user trust and
adapting to future developments.

USER BENEFITS

1. The primary user benefit of this fake news detection system is its potential to
significantly enhance media literacy among individuals and organizations.
2. By providing users with a powerful tool to evaluate the credibility of news articles, the
system empowers them to make informed decisions based on reliable information.
3. This capability is particularly valuable in an age where misinformation can
significantly influence public opinion and societal behavior.
4. Furthermore, the system can be instrumental for journalists, content creators, and
educators by serving as a supplementary resource in the fact-checking and news
validation processes.
5. By automating the detection of fake news, users can save time and resources, enabling
them to focus on content creation and dissemination rather than on fact-checking.
6. This automation contributes to a more informed public discourse, fostering critical
engagement with news content.
7. The system also supports the integrity of journalism by promoting accurate and
trustworthy information.
8. As users become more skilled in evaluating news sources, they enhance their ability to
combat the spread of false narratives.
9. Overall, the fake news detection system plays a crucial role in promoting critical
thinking and responsible media consumption.
10. By empowering users and improving the quality of information shared, the system
ultimately contributes to a well-informed society.
5. FLOW CHART

Start Begin Add Labels

Load Data- load true
process Label news: true=1,
& fake news
fake=0

Remove Stopwords Clean Text

Combine Data
Remove common Lowercase, remove
words URLs Combine true and false

Split Data Feature Extraction Train Model

Train-test split Apply TF-IDF Logistic regression

Evaluate Accuracy
Check model accuracy
Make Predictions
Classification Report Predict news type
Generate report

Confusion Matrix
Create matrix

Plot ROC Curve

Visualize ROC

Validate Model End

Test new params Finish process

6. RESULT
7. ADVANTAGES & DISADVANTAGES

ADVANTAGES

 High Accuracy: The machine learning model, when trained on a well-labeled dataset,
can achieve high accuracy in classifying news articles.
 Automation: The system automates the process of identifying fake news, reducing the
manual effort required for fact-checking.
 Scalability: The modular design allows for easy updates and the integration of new
algorithms, enhancing adaptability to evolving fake news tactics.
 User-Friendly: A well-designed user interface can make the system accessible to non-
technical users, promoting broader adoption.
 Enhanced Credibility: Effective detection of fake news can help maintain the
credibility of information sources and media outlets, fostering trust among readers.
 Informed Public: By identifying and filtering out false information, detection systems
contribute to a more informed public, allowing individuals to make better decisions
based on accurate data.
 Mitigation of Misinformation Spread: Fake news detection helps limit the spread of
misinformation, especially during critical events like elections or public health crises,
reducing the potential for panic or confusion.
 Promotion of Accountability: Detection systems can hold individuals and
organizations accountable for disseminating false information, promoting responsible
journalism and content creation.
 Improved Media Literacy: The existence of detection tools encourages individuals to
develop critical thinking skills and media literacy, making them more discerning
consumers of news.
 Support for Research: Fake news detection contributes to academic and journalistic
research by providing data on misinformation trends, helping scholars understand the
implications of fake news in society.
 Protection Against Manipulation: By identifying false narratives, detection tools
protect users from being manipulated by malicious actors, such as those spreading
propaganda or disinformation for political gain.
 Facilitating Fact-Checking: Detection systems often work in tandem with fact-
checking organizations, streamlining the verification process and making accurate
information more accessible.

DISADVANTAGES
 Dependence on Data Quality: The accuracy of the model heavily relies on the
quality and representativeness of the training data. If the dataset contains biases or
unrepresentative samples, the model may perform poorly.
 False Positives/Negatives: There is a risk of misclassifying legitimate news as fake
(false positives) or vice versa (false negatives), which could undermine user trust.
 Complexity of Language: The subtleties of language and context can pose
challenges for automated detection systems, potentially limiting their effectiveness.

 Ethical Considerations: The deployment of such systems raises ethical questions

regarding censorship and the balance between free speech and misinformation
control.
8. APPLICATIONS

Here, fake news detection system has a wide range of applications across various domains. In
media and journalism, it can serve as an essential tool for reporters and editors to fact-check
information before publication, thereby upholding journalistic integrity. Social media
platforms can implement this system to monitor content shared by users, providing alerts for
potentially misleading articles, which can enhance user experience and trust.

Educational institutions can leverage the system to teach students about media literacy,
encouraging critical thinking about news sources and the information they consume.
Additionally, governmental and non-governmental organizations can use this technology to
monitor misinformation campaigns during elections or public health crises, helping to protect
democratic processes and public welfare.

 Social Media Platforms:Many social media sites, such as Facebook and Twitter,
implement fake news detection algorithms to flag or limit the spread of false
information, helping to create a more reliable online environment.

 News Aggregators: Websites and apps that aggregate news articles use detection
systems to filter out fake news, ensuring users receive credible information from
various sources.
 Search Engines: Search engines like Google employ fake news detection techniques
to prioritize credible sources in search results, providing users with more trustworthy
information.
 Fact-Checking Organizations: Fact-checking services leverage detection tools to
analyze and verify claims made in news articles, social media posts, and other content,
enhancing their ability to combat misinformation.
 Educational Tools: Fake news detection applications are used in educational settings
to teach students about media literacy, helping them identify and critically evaluate
misinformation.
 Political Campaigns: During elections, political organizations use fake news detection
tools to monitor misinformation related to candidates or issues, allowing them to
respond promptly to false narratives.
 Public Health: In times of public health crises, such as the COVID-19 pandemic,
detection systems are crucial for filtering out misleading health information and
providing accurate guidelines to the public.
 Corporate Communications: Businesses use fake news detection to monitor their
brand reputation online, identifying and addressing false claims or misinformation that
could damage their image.
 Legal and Regulatory Frameworks: Governments and regulatory bodies may utilize
fake news detection tools to enforce laws against the spread of false information,
especially during critical events like elections or public emergencies.
 Journalism and Reporting: News organizations employ fake news detection
techniques to verify information before publication, ensuring the accuracy of their
reporting and maintaining journalistic integrity.
9. CONCLUSION

In conclusion, the development of a machine learning-based fake news detection system marks
a pivotal advancement in the ongoing battle against misinformation in today’s increasingly
digital landscape. As the prevalence of misleading information continues to rise, the ability to
accurately discern credible news from false narratives becomes paramount. This system
harnesses the power of advanced natural language processing (NLP) techniques and logistic
regression modeling, allowing it to analyze and classify news articles with remarkable
precision. By employing these sophisticated methodologies, the system not only enhances its
ability to identify fake news but also fosters a culture of media literacy, empowering users to
make well-informed decisions based on reliable information.

Moreover, the insights gleaned from this project have far-reaching implications for future
research and the development of improved fake news detection methodologies. As this field
evolves, the findings can inform the creation of more robust algorithms that adapt to the ever-
changing tactics employed by purveyors of misinformation. Additionally, the successful
implementation of this system can inspire collaboration among researchers, technologists, and
policymakers to devise comprehensive strategies that address the complexities of
misinformation. Ultimately, the ongoing refinement of fake news detection tools will
contribute to cultivating a more trustworthy information environment, where individuals
can confidently engage with media content and participate in informed public discourse.
10. FUTURE SCOPE

Looking ahead, the future scope of the fake news detection project presents numerous
opportunities for enhancement and innovation. One of the primary areas for development is the
integration of more advanced machine learning algorithms. By utilizing ensemble
methods or deep learning approaches, the system can achieve improved classification
accuracy, enabling it to better distinguish between genuine news and misinformation.
Additionally, expanding the dataset to include a wider variety of news sources and languages
will enhance the model's robustness and make it applicable across different cultural contexts.
This diversification can significantly improve the system’s effectiveness in various regions and
demographic groups, addressing the global nature of misinformation.

Moreover, the incorporation of user feedback into the detection system is crucial for fostering
continuous learning and adaptation. By allowing users to report inaccuracies or provide
insights, the algorithms can evolve alongside emerging trends in misinformation tactics. This
iterative process will ensure that the detection methods remain relevant and effective in the
face of constantly changing strategies employed by purveyors of fake news. Exploring the
potential for real-time detection capabilities will further enhance the system's utility, enabling
users to receive immediate feedback on news articles as they are shared online. In summary,
ongoing research and innovation in this field will be essential to tackle the dynamic challenges
posed by fake news, ultimately leading to a more informed and discerning public.
11.BIBLIOGRAPHY

Research Papers:

Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake News Detection on Social Media:
A Data Mining Perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22-36.

Ahmed, H., Traore, I., & Saad, S. (2018). Detecting Fake News with Machine Learning
Methods: A Survey. IEEE Access, 7, 90596-90600.

Conroy, N. J., Rubin, V. L., & Chen, Y. (2015). A Survey of Machine Learning Algorithms for
Fake News Detection. Proceedings of the Association for Information Science and Technology,
52(1), 1-9.

Zhou, X., & Zafarani, R. (2019). Fake News: A Survey of Research, Detection Methods, and
Opportunities. ACM Computing Surveys (CSUR), 53(5), 1-40.

Oshikawa, R., Qian, J., & Wang, W. Y. (2020). A Survey on Natural Language Processing for
Fake News Detection. Proceedings of the 12th Language Resources and Evaluation
Conference, 6086-6093.

Web Resources:

OpenCV documentation: https://docs.opencv.org/

TensorFlow documentation: https://www.tensorflow.org/guide

Keras documentation: https://keras.io/

APPENDIX
A.SOURCE CODE
import pandas as pd
import re
import string
import nltk
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split, validation_curve
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix,
roc_curve, auc
from nltk.corpus import stopwords
from sklearn.model_selection import learning_curve

# Download stopwords
nltk.download('stopwords')

# Load the datasets

true_df = pd.read_csv('/content/True.csv')
fake_df = pd.read_csv('/content/Fake.csv')

# Add labels: 1 for True, 0 for Fake

true_df['label'] = 1
fake_df['label'] = 0

# Combine the datasets

df = pd.concat([true_df, fake_df], ignore_index=True)
df = df[['title', 'text', 'label']] # Keep relevant columns

# Preprocessing function: Clean text

def clean_text(text):
text = text.lower() # Convert to lowercase
text = re.sub(r'http\S+|www\S+|https\S+', '', text, flags=re.MULTILINE) # Remove URLs
text = text.translate(str.maketrans('', '', string.punctuation + string.digits)) # Remove
punctuation, numbers
text = re.sub(r'\s+', ' ', text).strip() # Remove extra spaces
return text

# Apply text cleaning to both 'title' and 'text' columns

df['text'] = df['text'].apply(clean_text)
df['title'] = df['title'].apply(clean_text)

# Combine 'title' and 'text' into a single feature

df['content'] = df['title'] + " " + df['text']

# Remove stopwords
stop_words = set(stopwords.words('english'))
df['content'] = df['content'].apply(lambda x: ' '.join([word for word in x.split() if word not in
stop_words]))

# Split the data into training and testing sets

X = df['content']
y = df['label']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)

# Feature extraction: Convert text data to TF-IDF features

vectorizer = TfidfVectorizer(max_features=5000)
X_train_tfidf = vectorizer.fit_transform(X_train)
X_test_tfidf = vectorizer.transform(X_test)

# Model training: Logistic Regression

model = LogisticRegression()
model.fit(X_train_tfidf, y_train)
# Model prediction
y_pred = model.predict(X_test_tfidf)

# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy: {accuracy * 100:.2f}%")

# Classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Confusion Matrix
conf_matrix = confusion_matrix(y_test, y_pred)
plt.figure(figsize=(6, 4))
sns.heatmap(conf_matrix, annot=True, fmt="d", cmap="Blues", xticklabels=['Fake', 'True'],
yticklabels=['Fake', 'True'])
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Confusion Matrix')
plt.show()

# ROC Curve
y_prob = model.predict_proba(X_test_tfidf)[:, 1]
fpr, tpr, _ = roc_curve(y_test, y_prob)
roc_auc = auc(fpr, tpr)

plt.figure(figsize=(6, 4))
plt.plot(fpr, tpr, color='darkorange', lw=2, label=f'ROC curve (area = {roc_auc:.2f})')
plt.plot([0, 1], [0, 1], color='navy', lw=2, linestyle='--')
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('Receiver Operating Characteristic (ROC) Curve')
plt.legend(loc="lower right")
plt.show()

# Validation Curve (Logistic Regression parameter C)

param_range = [0.001, 0.01, 0.1, 1.0, 10, 100]
train_scores, test_scores = validation_curve(LogisticRegression(), X_train_tfidf, y_train,
param_name="C", param_range=param_range, scoring="accuracy", cv=5)

# Plot the validation curve

train_mean = train_scores.mean(axis=1)
test_mean = test_scores.mean(axis=1)

plt.figure(figsize=(6, 4))
plt.plot(param_range, train_mean, label="Training score", color="r")
plt.plot(param_range, test_mean, label="Cross-validation score", color="g")

plt.xlabel("Parameter C")
plt.ylabel("Accuracy")
plt.title("Validation Curve for Logistic Regression")
plt.xscale('log')
plt.legend(loc="best")
plt.show()

Docdownloader Com PDF To Study Fake News Detection in Online Social Media in Context of Machine DD
No ratings yet
Docdownloader Com PDF To Study Fake News Detection in Online Social Media in Context of Machine DD
78 pages
Fake News Detection Project Documentation
No ratings yet
Fake News Detection Project Documentation
16 pages
22A91F0056 Swathi
No ratings yet
22A91F0056 Swathi
66 pages
Sri BB
No ratings yet
Sri BB
65 pages
Fake News Detection
100% (1)
Fake News Detection
44 pages
Tracking and Tracing of Fake News Using URL Report-1
No ratings yet
Tracking and Tracing of Fake News Using URL Report-1
78 pages
Shivansh
No ratings yet
Shivansh
23 pages
D13 - Project Report
No ratings yet
D13 - Project Report
33 pages
CAPM Exam Prep-Questions
89% (9)
CAPM Exam Prep-Questions
97 pages
AI Phase5
No ratings yet
AI Phase5
26 pages
Internship REPORT CYBER GYAN VIRTUAL INTERNSHIP
No ratings yet
Internship REPORT CYBER GYAN VIRTUAL INTERNSHIP
29 pages
Fakenews ReportFIN With S PDF
No ratings yet
Fakenews ReportFIN With S PDF
35 pages
MAJOR PROJECT REPORT On Machine Learning Model To Determine Fake News
No ratings yet
MAJOR PROJECT REPORT On Machine Learning Model To Determine Fake News
52 pages
BASH
No ratings yet
BASH
4 pages
Fake News Detection - Report
100% (1)
Fake News Detection - Report
59 pages
Fake News Report Preview
No ratings yet
Fake News Report Preview
5 pages
Front Papers-Technical Seminors
No ratings yet
Front Papers-Technical Seminors
46 pages
MINOR REPORT (1) Fake News Detect
No ratings yet
MINOR REPORT (1) Fake News Detect
14 pages
Fake News Documentation Andhra University Project
No ratings yet
Fake News Documentation Andhra University Project
87 pages
NEWS2
No ratings yet
NEWS2
55 pages
20SCSE1180073 Shreyansh.
No ratings yet
20SCSE1180073 Shreyansh.
21 pages
B.E Cse Batchno 214
No ratings yet
B.E Cse Batchno 214
47 pages
Fake News Detection Using Multi
No ratings yet
Fake News Detection Using Multi
9 pages
Draft
No ratings yet
Draft
40 pages
Documentation - Real and Fake
No ratings yet
Documentation - Real and Fake
66 pages
Edited - PROJECT REPORT - Amisha
No ratings yet
Edited - PROJECT REPORT - Amisha
24 pages
Fake News Detection Using Machine Learning: Bachelor of Technology
No ratings yet
Fake News Detection Using Machine Learning: Bachelor of Technology
22 pages
Project File On Fake News (Soniya Rawat)
No ratings yet
Project File On Fake News (Soniya Rawat)
53 pages
Fake News Detection Using Machine Learning: Project Report On
No ratings yet
Fake News Detection Using Machine Learning: Project Report On
57 pages
Fake News Proposal
No ratings yet
Fake News Proposal
18 pages
Fake News Detection: Project Synopsis On
No ratings yet
Fake News Detection: Project Synopsis On
4 pages
Pavan Final
No ratings yet
Pavan Final
72 pages
Minor Project Report
No ratings yet
Minor Project Report
49 pages
Encryption & Decryption Apk
No ratings yet
Encryption & Decryption Apk
27 pages
Fake News Final Report DNWSLVDK C
No ratings yet
Fake News Final Report DNWSLVDK C
51 pages
Nistir89 4153
No ratings yet
Nistir89 4153
26 pages
Fake News - Aman
No ratings yet
Fake News - Aman
41 pages
Fake News Detection PDF
No ratings yet
Fake News Detection PDF
10 pages
Fake News Detection Using Machine Learning12 2
No ratings yet
Fake News Detection Using Machine Learning12 2
65 pages
Fake News Final Report
No ratings yet
Fake News Final Report
29 pages
Fake News Detection System Report
No ratings yet
Fake News Detection System Report
29 pages
Mini Project
No ratings yet
Mini Project
24 pages
Bharathi Mini Project
No ratings yet
Bharathi Mini Project
47 pages
Fake News Detection Report
No ratings yet
Fake News Detection Report
18 pages
Fake News Analysis
No ratings yet
Fake News Analysis
46 pages
Fake News Detectio3
No ratings yet
Fake News Detectio3
24 pages
Fake News Detection
No ratings yet
Fake News Detection
9 pages
Fake News Detection Overview
No ratings yet
Fake News Detection Overview
16 pages
Review Paper
No ratings yet
Review Paper
7 pages
Aiml Project Report
No ratings yet
Aiml Project Report
46 pages
Fake News Detection: Adithiya G (Urk18Cs257)
No ratings yet
Fake News Detection: Adithiya G (Urk18Cs257)
28 pages
(NetCrypt) Review Paper PDF
No ratings yet
(NetCrypt) Review Paper PDF
5 pages
D1 - 4 - Fake News Detection
No ratings yet
D1 - 4 - Fake News Detection
39 pages
Synopsis
No ratings yet
Synopsis
5 pages
AI Project Proporsal - Fake News Detection
No ratings yet
AI Project Proporsal - Fake News Detection
4 pages
Fake News Detection: Project Proposal
No ratings yet
Fake News Detection: Project Proposal
7 pages
Final Year of Computer Engineering 2022-23 Semester VII Project Synopsis
No ratings yet
Final Year of Computer Engineering 2022-23 Semester VII Project Synopsis
11 pages
MAJOR PROJECT Documentation
No ratings yet
MAJOR PROJECT Documentation
67 pages
SYNOPSIS
No ratings yet
SYNOPSIS
4 pages
A Project Report On Fake News Detection
100% (1)
A Project Report On Fake News Detection
29 pages
Range Bars
No ratings yet
Range Bars
8 pages
Solar Energy Measurement Using Arduino and Proteus Simulation
100% (1)
Solar Energy Measurement Using Arduino and Proteus Simulation
6 pages
Intro To S88
No ratings yet
Intro To S88
28 pages
Manual
0% (1)
Manual
50 pages
Waterfall vs. Agile: Must Know Differences: What Is Waterfall Methodology?
No ratings yet
Waterfall vs. Agile: Must Know Differences: What Is Waterfall Methodology?
5 pages
MCA - NEW 3rd Semester Assignment (January 2023)
No ratings yet
MCA - NEW 3rd Semester Assignment (January 2023)
11 pages
Manual BFT Leo B CBB
No ratings yet
Manual BFT Leo B CBB
14 pages
Working With Menus in Visual Basic 6 (VB6) : An Expanded Menu Editor Window
No ratings yet
Working With Menus in Visual Basic 6 (VB6) : An Expanded Menu Editor Window
5 pages
Complete Module3
No ratings yet
Complete Module3
31 pages
2.system Configuration Hardware Requirements: Processor Intel Pentium 3805u @1.90ghz
No ratings yet
2.system Configuration Hardware Requirements: Processor Intel Pentium 3805u @1.90ghz
51 pages
Particle Size Distribution - Representation
No ratings yet
Particle Size Distribution - Representation
4 pages
PTC ELearning Curriculum Thingsworx
No ratings yet
PTC ELearning Curriculum Thingsworx
3 pages
PatchLog 16.01.2023 16-51-27
No ratings yet
PatchLog 16.01.2023 16-51-27
324 pages
Centura
No ratings yet
Centura
344 pages
OOSD Unit 2
No ratings yet
OOSD Unit 2
30 pages
L2-L3.Introduction To Programming and Python Basics
No ratings yet
L2-L3.Introduction To Programming and Python Basics
32 pages
Payment Gateway
No ratings yet
Payment Gateway
9 pages
Forest Fire Detetion
No ratings yet
Forest Fire Detetion
21 pages
3rd Term Scheme of Work JSS 1
No ratings yet
3rd Term Scheme of Work JSS 1
9 pages
Oculus Mobile v0.5.0 SDK Documentation PDF
No ratings yet
Oculus Mobile v0.5.0 SDK Documentation PDF
84 pages
Computer Security Course Outline
No ratings yet
Computer Security Course Outline
4 pages
Chapter 1
No ratings yet
Chapter 1
51 pages
Mobile
No ratings yet
Mobile
12 pages
XBee Arduino Compatible Coding Platform
No ratings yet
XBee Arduino Compatible Coding Platform
1 page
Modul 13 - Resouce Leveling
No ratings yet
Modul 13 - Resouce Leveling
8 pages
Tanmay Kalani: Tanmay.k16@iiits - in
No ratings yet
Tanmay Kalani: Tanmay.k16@iiits - in
2 pages
V Meter Complete Data
No ratings yet
V Meter Complete Data
4 pages
Tutorial 6: A Spoon Holder: Entity Introductions
No ratings yet
Tutorial 6: A Spoon Holder: Entity Introductions
12 pages
CIS 527 Strayer University Computer Incident Response Team Plan Essay
No ratings yet
CIS 527 Strayer University Computer Incident Response Team Plan Essay
2 pages

Internshipreport 15

Uploaded by

Internshipreport 15

Uploaded by

Stanley College of Engineering and Technology for

(Approved by AICTE, Accredited by NBA and NAAC, Affiliated to Osmania University)

DEPARTMENT OF COMPUTER SCIENCE AND

Roll no Name Year Branch Section

160621733015 Dhavanam Sindhu IV CSE A

Internal Guide Details: Mrs.B.G.Prasuna, Assistant Professor, CSE at Stanley College of

Internal Guide Evaluator

Dhavanam Sindhu 160621733015

It is my privilege to express sincere regards to my Project Guide Mrs.B.G.Prasuna for the

3.1 Block diagram

3.2 Hardware & Software designing

4.1 Data and Features

4.2 BACKGROUND INFORMATION

7. ADVANTAGES & DISADVANTAGES

10. FUTURE SCOPE

FIG 3.1 Block Diagram

Languages used: Python

Input Data Pre- Feature Model

Output Prediction Model

3.2 HARDWARE AND SOFTWARE DESIGNING

1. Intel core i5/i7 processor

Step 1: Project Setup:

Step 2: Data Loading and Preparation

Step 3: Load OpenCV and Models

 Type !pip install opencv-python in a new code cell to install OpenCV.

Load Pre-trained Models:

Step 4: Text Preprocessing

For cleaning text:

 Libraries Used: re, string, nltk

Step 5: Feature Extraction (TF-IDF)

 Libraries Used: sklearn

Step 6: Model Training

 Libraries Used: sklearn (LogisticRegression)

Visualizing the confusion matrix:

 Libraries Used: sklearn (accuracy_score, confusion_matrix, roc_curve, auc),

Step 8: Model Validation

 Libraries Used: sklearn (validation_curve)

Step 9: Model Prediction

Step 10: Analyze the Output

1.Accuracy and Classification Report

Key Libraries Used:

 sklearn (accuracy_score, classification_report): These functions provide a detailed

2. Confusion Matrix Visualization

 Objective: To visually represent the model's prediction performance in a matrix format,

Key Libraries Used:

 True Positives (TP): Articles correctly identified as "True."

Key Libraries Used:

4.1 DATA AND FEATURES

4.2 BACKGROUND INFORMATION

Start Begin Add Labels

Remove Stopwords Clean Text

Split Data Feature Extraction Train Model

Plot ROC Curve

Validate Model End

Test new params Finish process

 Ethical Considerations: The deployment of such systems raises ethical questions

OpenCV documentation: https://docs.opencv.org/

TensorFlow documentation: https://www.tensorflow.org/guide

Keras documentation: https://keras.io/

# Load the datasets

# Add labels: 1 for True, 0 for Fake

# Combine the datasets

# Preprocessing function: Clean text

# Apply text cleaning to both 'title' and 'text' columns

# Combine 'title' and 'text' into a single feature

# Split the data into training and testing sets

# Feature extraction: Convert text data to TF-IDF features

# Model training: Logistic Regression

# Validation Curve (Logistic Regression parameter C)

# Plot the validation curve

You might also like