0% found this document useful (0 votes)

39 views12 pages

Spam News Detection Report: Manikiran

Uploaded by

Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views12 pages

Spam News Detection Report: Manikiran

Uploaded by

Mani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 12

Spam News Detection Report

Manikiran
Internship Project

October 15, 2024

Contents

1 Introduction 2

2 Problem Statement 3
2.1 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

3 Dataset Overview 4

4 Data Preprocessing 5
4.1 Tokenization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 Lowercasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.3 Stop Word Removal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.4 TF-IDF Vectorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

5 Logistic Regression Model 7

5.1 Introduction to Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . 7
5.2 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

6 Model Evaluation 8
6.1 Confusion Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6.2 Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6.3 Precision and Recall . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
6.4 F1 Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

7 Results and Analysis 9

8 Conclusion 10

9 References 11

1
Chapter 2

Introduction

In the age of information, digital platforms have become primary sources of news for millions.
However, with the rapid dissemination of information, there is a growing threat of spam or
fake news, which can mislead readers and spread false information. Detecting and mitigating
this threat has become a significant challenge for both researchers and developers.
In this report, we develop a machine learning model capable of detecting spam news arti-
cles. The model aims to classify news as either true or false using logistic regression. This
report outlines the problem, the dataset used, the preprocessing techniques applied, and the
evaluation of the model’s performance.

2
Chapter 3

Problem Statement

Spam news, or fake news, is a pervasive issue in modern media. It has the potential to
distort public opinion and misinform large populations, leading to serious societal
consequences. The challenge lies in distinguishing between legitimate news and fake news,
as the latter is often designed to appear authentic.
The main objective is to use machine learning techniques to classify news articles into ”spam”
or ”real” based on the content of the text. The problem can be framed as a binary classifi-
cation task, where the target variable is whether the news is spam or true.

2.1 Objectives
The primary objectives of this project are:
• To develop a machine learning model that can accurately classify news articles as spam
or true.
• To evaluate the performance of the model using standard evaluation metrics.
• To explore different preprocessing techniques for improving the model’s accuracy.

3
Chapter 4

Dataset Overview

The dataset used for this project contains news articles, each labeled as either true or spam.
The dataset comprises two key columns:

• Text: The content of the news article.

• Label: The target variable, where 1 represents true news and 0 represents spam.
The dataset contains a balanced number of true and spam articles, allowing for fair training
and evaluation of the model. Below is an overview of the dataset:

• Total number of articles: 10,000

• True news articles: 5,000
• Spam news articles: 5,000

4
Chapter 5

Data Preprocessing

Before applying machine learning algorithms, the text data needs to be preprocessed. The
steps involved in preprocessing the text are as follows:

4.1 Tokenization
Tokenization is the process of splitting the text into individual words (tokens). This helps
the model understand the content of the text on a word-by-word basis.

4.2 Lowercasing
All text is converted to lowercase to ensure uniformity. For example, ”News” and ”news”
are treated as the same word after this step.

4.3 Stop Word Removal

Stop words are common words like ”the”, ”is”, ”in”, etc., which do not carry significant
meaning. These are removed to reduce noise in the data.

4.4 TF-IDF Vectorization

To convert the text into numerical features that can be fed into the machine learning model,
we use Term Frequency-Inverse Document Frequency (TF-IDF) vectorization. TF-IDF as-
signs higher weights to important words and reduces the impact of frequent but unimportant
words.
Code Example:
from sklearn . feature_extraction . text import TfidfVectorizer

# Initialize the vectorizer

vectorizer = TfidfVectorizer ( max_features =5000)

5
Spam News Detection Report 6

# Transform the training data

X_train_tfidf = vectorizer . fit_transform ( X_train )
X_test_tfidf = vectorizer . transform ( X_test )
Chapter 7

Logistic Regression Model

5.1 Introduction to Logistic Regression

Logistic Regression is a widely-used algorithm for binary classification problems. It
estimates the probability that a given input belongs to a particular class by fitting a logistic
curve to the data. In the context of spam news detection, logistic regression will predict the
probability that a news article is spam.

5.2 Model Training

We split the dataset into training and testing sets. The training set is used to fit the model,
while the testing set evaluates the model’s performance.
Code Example:
from sklearn . linear_model import LogisticRegression

# Initialize the model

model = LogisticRegression ()

# Fit the model on the training data

model . fit ( X_train_tfidf , y_train )

# Predict on the test data

y_pred = model . predict ( X_test_tfidf )

7
Chapter 8

Model Evaluation

To evaluate the performance of the model, we use various metrics such as accuracy, precision,
recall, and the F1 score.

6.1 Confusion Matrix

The confusion matrix provides insights into the number of true positives, false positives, true
negatives, and false negatives.
Code Example:
from sklearn . metrics import confusion_matrix

# Generate the confusion matrix

conf_matrix = confusion_matrix ( y_test , y_pred )

6.2 Accuracy
Accuracy is the proportion of correctly classified news articles (both true and spam) out of
the total number of articles.

6.3 Precision and Recall

Precision is the proportion of predicted spam news that is actually spam, while recall is the
proportion of actual spam news that was correctly predicted by the model.

6.4 F1 Score
The F1 score is the harmonic mean of precision and recall, providing a balanced measure of
the model’s performance.

8
Chapter 9

Results and Analysis

The Logistic Regression model achieved the following results on the test set:

• Accuracy: 92.5%
• Precision: 91.8%
• Recall: 93.2%
• F1 Score: 92.5%
These results indicate that the model is effective in detecting spam news, with a high accuracy
and balanced precision-recall performance.

9
Chapter
Conclusion

In this report, we developed a logistic regression model to classify news articles as either
spam or true. The model was trained on a labeled dataset and evaluated using standard
performance metrics. The results demonstrate that logistic regression is a robust and effective
method for detecting spam news.
Future work could involve testing more advanced models like deep learning or exploring ad-
ditional text preprocessing techniques to further improve accuracy. Additionally, expanding
the dataset to include more diverse sources of news could make the model more generalizable.

1
Chapter
References

• Scikit-learn documentation: https://scikit-learn.org/stable/documentation.html

• Spam News Detection Dataset: https://example.com/dataset
• Pedregosa, F., et al. (2011). Scikit-learn: Machine Learning in Python. Journal of
Machine Learning Research, 12, 2825-2830.

Detecting of Fake News With Python and ML
57% (7)
Detecting of Fake News With Python and ML
17 pages
Fake News Detection Project Report
100% (1)
Fake News Detection Project Report
8 pages
Fake News Detection PPT 1
No ratings yet
Fake News Detection PPT 1
13 pages
Poster - Template - PPTX (1) (2) A Fe
No ratings yet
Poster - Template - PPTX (1) (2) A Fe
1 page
Database Management Systems Course Guide Book PDF
No ratings yet
Database Management Systems Course Guide Book PDF
4 pages
Digitally-Assisted Analog and RF CMOS Circuit Design For Software-Defined Radio PDF
No ratings yet
Digitally-Assisted Analog and RF CMOS Circuit Design For Software-Defined Radio PDF
229 pages
Spam News Detection Report
No ratings yet
Spam News Detection Report
9 pages
2nd Project Darling
No ratings yet
2nd Project Darling
9 pages
SPAMDETECTION
No ratings yet
SPAMDETECTION
8 pages
ML Summer Training
No ratings yet
ML Summer Training
20 pages
Fake News Detection
100% (1)
Fake News Detection
25 pages
Spam News Detection
No ratings yet
Spam News Detection
5 pages
Fake News Detection Using Machine Learning
No ratings yet
Fake News Detection Using Machine Learning
11 pages
ML Report Fake News Detection
No ratings yet
ML Report Fake News Detection
15 pages
Spamdetection
No ratings yet
Spamdetection
6 pages
Pandey 2022 J. Phys. Conf. Ser. 2161 012027
No ratings yet
Pandey 2022 J. Phys. Conf. Ser. 2161 012027
13 pages
A Comparative Analysis of Machine Learning Techniques On Fake News Detection 1
No ratings yet
A Comparative Analysis of Machine Learning Techniques On Fake News Detection 1
42 pages
Spam Filter Project Report Logistic Regression
No ratings yet
Spam Filter Project Report Logistic Regression
10 pages
Fake News Detection Using Machine Learning: Presented by Fathima T H MSC Computer Science
71% (7)
Fake News Detection Using Machine Learning: Presented by Fathima T H MSC Computer Science
15 pages
SMS Spam Detection Using Machine Learning
No ratings yet
SMS Spam Detection Using Machine Learning
9 pages
Project Report
No ratings yet
Project Report
12 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
38 pages
Report Rohun Sjmoon
No ratings yet
Report Rohun Sjmoon
6 pages
Comparison of Naive Bayes Classifier and C-LSTM
No ratings yet
Comparison of Naive Bayes Classifier and C-LSTM
6 pages
Automatic Irony and Sarcasm Detection in Socmed
No ratings yet
Automatic Irony and Sarcasm Detection in Socmed
49 pages
Cyberspace News Prediction of Text and Image
No ratings yet
Cyberspace News Prediction of Text and Image
53 pages
Detection of Fake News
No ratings yet
Detection of Fake News
17 pages
Project Proposal - Group 17-2-5
No ratings yet
Project Proposal - Group 17-2-5
4 pages
Geetha Internship
No ratings yet
Geetha Internship
17 pages
Project Synopsis Report Format
No ratings yet
Project Synopsis Report Format
9 pages
The Main Objective Is To Detect The Fake News, Which Is A Classic Text Classification
No ratings yet
The Main Objective Is To Detect The Fake News, Which Is A Classic Text Classification
57 pages
Fake News - Machine Learning
No ratings yet
Fake News - Machine Learning
6 pages
Pavan
No ratings yet
Pavan
23 pages
AAT Cover Page
No ratings yet
AAT Cover Page
17 pages
Fake News Detector With Real Time Web Scraping
No ratings yet
Fake News Detector With Real Time Web Scraping
11 pages
Sms Spam Using Machine Learning 4
No ratings yet
Sms Spam Using Machine Learning 4
42 pages
Spam Detection
No ratings yet
Spam Detection
10 pages
Email Spam Detection Final Presentation-21BSCHH010002
No ratings yet
Email Spam Detection Final Presentation-21BSCHH010002
17 pages
17 Result Analysis NLP
No ratings yet
17 Result Analysis NLP
13 pages
Project Documentation
No ratings yet
Project Documentation
6 pages
Fake News Detection A Deep Dive Into NLP Models 1
No ratings yet
Fake News Detection A Deep Dive Into NLP Models 1
10 pages
Lecture 6 Text Classification
No ratings yet
Lecture 6 Text Classification
19 pages
Fake News Detection Project
No ratings yet
Fake News Detection Project
9 pages
Ai Project
No ratings yet
Ai Project
16 pages
Fake News Detection Using NLP
No ratings yet
Fake News Detection Using NLP
11 pages
FYP Copy
No ratings yet
FYP Copy
42 pages
Information Security Awareness - Refresher Course
100% (2)
Information Security Awareness - Refresher Course
83 pages
ITD253 L6 TextClassificationClustering
No ratings yet
ITD253 L6 TextClassificationClustering
39 pages
Fake News Detection
No ratings yet
Fake News Detection
13 pages
Untitled
100% (2)
Untitled
66 pages
Spam Filter - Machine Learning
No ratings yet
Spam Filter - Machine Learning
25 pages
FAke News Report
No ratings yet
FAke News Report
16 pages
Headline Detecting Fake News With M
No ratings yet
Headline Detecting Fake News With M
3 pages
Parabot Notes PDF
No ratings yet
Parabot Notes PDF
2 pages
IRE Deliverable 3
No ratings yet
IRE Deliverable 3
7 pages
ML Project Report PDF
No ratings yet
ML Project Report PDF
26 pages
ML Projrct Article 2
No ratings yet
ML Projrct Article 2
6 pages
Shoaib Khan - 1918922 - Report
No ratings yet
Shoaib Khan - 1918922 - Report
20 pages
A I Project Proposal
No ratings yet
A I Project Proposal
10 pages
Machine Learning Techniques For The Classification of Fake News
No ratings yet
Machine Learning Techniques For The Classification of Fake News
5 pages
A Review of Deep Learning Based Malware Detection Techniques
No ratings yet
A Review of Deep Learning Based Malware Detection Techniques
19 pages
Practical No-14
No ratings yet
Practical No-14
2 pages
EBI Overview
No ratings yet
EBI Overview
4 pages
Word Puzzle PDF
No ratings yet
Word Puzzle PDF
23 pages
IT8501-Web-techonology QP 21
No ratings yet
IT8501-Web-techonology QP 21
3 pages
Process List
No ratings yet
Process List
14 pages
Eliwell 978 Manual
No ratings yet
Eliwell 978 Manual
12 pages
Powerpoint Template: " Add Your Company Slogan "
No ratings yet
Powerpoint Template: " Add Your Company Slogan "
20 pages
A Good Image Generator Is What You Need For High Resolution Video Synthesis
No ratings yet
A Good Image Generator Is What You Need For High Resolution Video Synthesis
23 pages
Textile Management System Final Review
No ratings yet
Textile Management System Final Review
40 pages
Design Thinking in Food Delivery Apps 9921004850
No ratings yet
Design Thinking in Food Delivery Apps 9921004850
12 pages
MLA
No ratings yet
MLA
1 page
The Killhouse Entry Point Wiki Fandom
No ratings yet
The Killhouse Entry Point Wiki Fandom
1 page
MF-HF 5000 SW 1,35,27
No ratings yet
MF-HF 5000 SW 1,35,27
3 pages
1073 Operating Manual PDF
No ratings yet
1073 Operating Manual PDF
42 pages
Quantum Series G.6 Operating Instructions - Issue 2
No ratings yet
Quantum Series G.6 Operating Instructions - Issue 2
32 pages
DeepSkyCamera Manual en
No ratings yet
DeepSkyCamera Manual en
39 pages
Chios Victory Equasis
No ratings yet
Chios Victory Equasis
4 pages
Internship-Report 2028208
No ratings yet
Internship-Report 2028208
24 pages
SQL Rev Class 12
No ratings yet
SQL Rev Class 12
6 pages
Chapter1 2challenges
No ratings yet
Chapter1 2challenges
5 pages
Knowledge Management
No ratings yet
Knowledge Management
8 pages
Lms - Uaf.edu - PK Course Uaf Student Result - PHP PDF
100% (1)
Lms - Uaf.edu - PK Course Uaf Student Result - PHP PDF
4 pages
Principles of The Self-Organizing System (Ashby)
100% (3)
Principles of The Self-Organizing System (Ashby)
25 pages
Enterprise 3000 Service Manual
100% (1)
Enterprise 3000 Service Manual
84 pages
Top Election Offenses
No ratings yet
Top Election Offenses
46 pages
Linux Security
No ratings yet
Linux Security
45 pages
Mobile Communication - Lab - Manual
No ratings yet
Mobile Communication - Lab - Manual
53 pages