0% found this document useful (0 votes)

11 views15 pages

ML Report Fake News Detection

The document outlines a machine learning project focused on fake news detection using natural language processing techniques. It details the dataset, libraries used, algorithms implemented, and the process of training various classification models to identify misleading information in text. The project aims to enhance online information credibility by effectively classifying news articles as real or fake.

Uploaded by

architashankar.tech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views15 pages

ML Report Fake News Detection

Uploaded by

architashankar.tech

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

UML501: Machine Learning

Project

TOPIC- FAKE NEWS DETECTION

Submitted By:

Name: Archita

Shankar

Roll No: 102203680

Name: Parth Adlakha

Roll No: 102203636

Submitted To:

Dr. Jinee Goyal

Computer Science and Engineering Department

1
Thapar Institute of Engineering and Technology

2
Index
Sr. No. Content used Page No.

1. Introduction 3

2 Libraries used 4

3. Algorithm used 5-6

4. Code and Screenshots 8-14

3
1. Introduction

1.1 Name of the dataset

– train.csv
1.2 Dataset Link Source

https://docs.google.com/spreadsheets/d/e/2PACX-
1vRP31dOhVmg97Yf0mIy5lD1Z7k4vr57xDS0NQWELeSnm0Pd
Hi5fE5304uDBkRgPb5NbzcIYuLJpxx1i/pub?
gid=562236535&single=true&output=csv

1.3 Description -
A Fake News Detection ML project uses natural language processing
(NLP) and machine learning algorithms to identify misleading or false
information in text. It involves data collection, preprocessing, feature
extraction, and classification models like logistic regression or
transformers. The system improves online information credibility by
analyzing patterns and flagging fake content.

Purpose of the Project:

1. Detect Fake News Patterns: Analyze textual data to identify linguistic and
stylistic patterns common in fake news articles.

2. Classify News as Real or Fake: Train a machine learning classifier, such as

logistic regression or transformers, to distinguish between credible and fake news.

3. Educational Insight: Demonstrates how natural language processing (NLP)

and machine learning algorithms can be applied to tackle misinformation effectively.

This project uses Python libraries like pandas, nltk and scikit-learn for
data preprocessing, NLP tasks, and model building. It highlights a practical application
of data science techniques to address the critical issue of fake news.

4
2. Libraries Used

o re: the regular expression module, which is used for

pattern matching with strings.
o sys: the sys module, which provides access to some variables
used or maintained by the interpreter and to functions that interact
with the interpreter.
o pandas: Pandas is a powerful data manipulation and
analysis library.
o matplotlib: the pyplot module from the matplotlib library, which
is used for creating static, animated, and interactive visualizations
in Python.
o sklearn: Linear Regression class from the scikit-learn
library, which is a machine learning library for Python.

3. Algorithm Used

Machine Learning Algorithms for Fake News Detection

This project leverages various machine learning algorithms to classify news articles as
either real or fake based on textual patterns and features.

1. Data Preprocessing
o The dataset comprises labeled news articles with text and label columns.
o Preprocessing includes:
 Removing punctuation, special characters, and stopwords.
 Tokenizing and stemming/lemmatizing text.
 Converting text into numerical features using techniques like TF-IDF
vectorization.

2. Training Different Classification Models

The project explores multiple algorithms to find the most accurate classifier:
5
o Logistic Regression: A linear model predicting probabilities for binary
classification.
o Random Forest: An ensemble model using decision trees with bagging for
robust predictions.
o AdaBoost Classifier: An adaptive boosting technique combining weak
classifiers for improved accuracy.
o Decision Tree Classifier: A tree-based model splitting features to achieve
maximum information gain.
o Multinomial Naive Bayes (NB): A probabilistic classifier suitable for text
data with discrete distributions.
o Support Vector Machines (SVM): A margin-based classifier for high-
dimensional datasets.

3. Model Evaluation
o Models are evaluated using metrics like accuracy, precision, recall, and F1-
score.
o Cross-validation ensures the model generalizes well to unseen data.

4. Making Predictions
o Each classifier predicts whether a news article is real or fake based on
learned patterns.
o Comparative analysis identifies the best-performing model for deployment.

5. Implementation in Code (Explanation)

 Model Training:
Each classification algorithm, such as Logistic Regression, Random Forest,
AdaBoost, Decision Tree, Multinomial Naive Bayes, and SVM, is trained on the
preprocessed data. The training involves fitting the model to the input features
(numerical representation of text) and corresponding labels (real or fake).
6
 Making Predictions:
After training, the models predict the class of new or test data by analyzing the
patterns learned during training. These predictions represent whether a given news
article is classified as real or fake.
 Model Evaluation:
The predictions are compared against the actual labels in the test data to compute
evaluation metrics like accuracy, precision, recall, and F1-score. This helps
determine how well each model performs on unseen data.

7. Performance Visualization (Explanation)

 Comparison of Model Performance:
The accuracy, precision, and recall of all models are visualized using graphs,
allowing a comparative analysis of their effectiveness in detecting fake news.
 Feature Importance Visualization:
For models like Random Forest and AdaBoost, feature importance scores are
displayed in bar charts. This highlights the most influential features (e.g., specific
words or phrases) in classifying news as fake or real.
 Confusion Matrix:
A confusion matrix is visualized as a heatmap to provide insights into the number
of correct and incorrect predictions, distinguishing between true positives, true
negatives, false positives, and false negatives. This helps identify patterns in
misclassification.

7
8
4. Code file 1

1. import pandas as pd
2. import matplotlib.pyplot as plt
3. import numpy as np
4.
5. from sklearn.feature_extraction.text import CountVectorizer,
TfidfVectorizer
6. train=pd.read_csv(r'D:/Desktop/ML project final/train (3).csv',
nrows=1800)
7. test=pd.read_csv(r'D:/Desktop/ML project final/test (1).csv',
nrows=500)
8.
9. train
10. test
11. test=test.fillna(' ')
12. train=train.fillna(' ')
13. test['total']=test['title']+' '+test['author']+test['text']
14. train['total']=train['title']+' '+train['author']+train['text']
15.
16. train.head()
17. test.head()
18.
19. from sklearn.feature_extraction.text import TfidfTransformer,
CountVectorizer
20. from nltk.corpus import stopwords
21. from nltk.stem.porter import PorterStemmer
22. import re
23.
24. ps = PorterStemmer()
25. stop_words = set(stopwords.words('english'))
26. regex = re.compile('[^a-zA-Z]')
27.
28. corpus = []
29. for i in range(len(train)):
30. review = regex.sub(' ', train['total'][i])
31. review = review.lower().split()
32. review = [ps.stem(word) for word in review if word not in
stop_words]
33. corpus.append(' '.join(review))
34.
35. count_vectorizer = CountVectorizer(ngram_range=(1, 2))
36. counts = count_vectorizer.fit_transform(corpus)
37. transformer = TfidfTransformer(smooth_idf=False)
38. tfidf = transformer.fit_transform(counts)
39.
40. targets = train['label'].values

9
41.
42. from sklearn.model_selection import train_test_split
43. X_train, X_test, y_train, y_test = train_test_split(tfidf,
targets, random_state=0)
44. # Multinominal NB
45. from sklearn.naive_bayes import MultinomialNB
46.
47. NB = MultinomialNB()
48. NB.fit(X_train, y_train)
49. print('Accuracy of NB classifier on training set: {:.2f}'
50. .format(NB.score(X_train, y_train)))
51. print('Accuracy of NB classifier on test set: {:.2f}'
52. .format(NB.score(X_test, y_test)))
53.
54. # Random Forest
55. from sklearn.ensemble import RandomForestClassifier
56. RnFr = RandomForestClassifier()
57. RnFr.fit(X_train, y_train)
58. print('Accuracy of RandomForest classifier on training set:
{:.2f}'
59. .format(RnFr.score(X_train, y_train)))
60. print('Accuracy of RandomForest classifier on test set: {:.2f}'
61. .format(RnFr.score(X_test, y_test)))
62.
63. # SVM
64. from sklearn.svm import SVC
65. svclassifier = SVC(C=1,kernel='linear',gamma =
'auto',probability=True)
66. svclassifier.fit(X_train, y_train)
67. print('Accuracy of SVM classifier on training set: {:.2f}'
68. .format(RnFr.score(X_train, y_train)))
69. print('Accuracy of SVM classifier on test set: {:.2f}'
70. .format(RnFr.score(X_test, y_test)))
71.
72. # AdaBoostClassifier
73. from sklearn.ensemble import AdaBoostClassifier
74. ada_classifier= AdaBoostClassifier()
75. ada_classifier.fit(X_train, y_train)
76. print('Accuracy of AdaBoostClassifier classifier on training set:
{:.2f}'
77. .format(RnFr.score(X_train, y_train)))
78. print('Accuracy of AdaBoostClassifier classifier on test set:
{:.2f}'
79. .format(RnFr.score(X_test, y_test)))
80.
81. # LogisticRegression
82. from sklearn.linear_model import LogisticRegression
83. log_classifier=LogisticRegression()
84. log_classifier.fit(X_train, y_train)
10
85. print('Accuracy of LogisticRegression classifier on training set:
{:.2f}'
86. .format(RnFr.score(X_train, y_train)))
87. print('Accuracy of LogisticRegression classifier on test set:
{:.2f}'
88. .format(RnFr.score(X_test, y_test)))
89.
90.
Screenshots

This is multinomial NB

This is Ada Boost Classifier

11
This is decision Tree Classifier

This is Logistic regression

This is Random Forest

12
This is SVM

Code file 2
1. import pandas as pd
2. import matplotlib.pyplot as plt
3. import numpy as np
4.
5. from sklearn.feature_extraction.text import CountVectorizer,
TfidfVectorizer
6. train=pd.read_csv(r'D:/Desktop/ML project final/train (3).csv',
nrows=1800)
7. test=pd.read_csv(r'D:/Desktop/ML project final/test (1).csv',
nrows=500)
8.
9. test=test.fillna(' ')
10. train=train.fillna(' ')
11. test['total']=test['title']+' '+test['author']+test['text']
12. train['total']=train['title']+' '+train['author']+train['text']
13.
14. from sklearn.feature_extraction.text import TfidfTransformer,
CountVectorizer
15. from nltk.corpus import stopwords
16. from nltk.stem.porter import PorterStemmer
17. import re
18. ps = PorterStemmer()
19. stop_words = set(stopwords.words('english'))
20. regex = re.compile('[^a-zA-Z]')
21.
22. corpus = []
23. for i in range(len(train)):
13
24. review = regex.sub(' ', train['total'][i])
25. review = review.lower().split()
26. review = [ps.stem(word) for word in review if word not in
stop_words]
27. corpus.append(' '.join(review))
28.
29. count_vectorizer = CountVectorizer(ngram_range=(1, 2))
30. counts = count_vectorizer.fit_transform(corpus)
31. transformer = TfidfTransformer(smooth_idf=False)
32. tfidf = transformer.fit_transform(counts)
33.
34. targets = train['label'].values
35.
36. from sklearn.model_selection import train_test_split
37. X_train, X_test, y_train, y_test = train_test_split(tfidf,
targets, random_state=0)
38.
39. from sklearn.naive_bayes import MultinomialNB
40. NB = MultinomialNB()
41. NB.fit(X_train, y_train)
42. print('Accuracy of NB classifier on training set: {:.2f}'
43. .format(NB.score(X_train, y_train)))
44. print('Accuracy of NB classifier on test set: {:.2f}'
45. .format(NB.score(X_test, y_test)))
46.
47. from sklearn.ensemble import RandomForestClassifier
48. RnFr = RandomForestClassifier()
49. RnFr.fit(X_train, y_train)
50. print('Accuracy of RandomForest classifier on training set:
{:.2f}'
51. .format(RnFr.score(X_train, y_train)))
52. print('Accuracy of RandomForest classifier on test set: {:.2f}'
53. .format(RnFr.score(X_test, y_test)))
54.
55. from sklearn.svm import SVC
56. svclassifier = SVC(C=1,kernel='linear',gamma =
'auto',probability=True)
57. svclassifier.fit(X_train, y_train)
58. print('Accuracy of SVM classifier on training set: {:.2f}'
59. .format(RnFr.score(X_train, y_train)))
60. print('Accuracy of SVM classifier on test set: {:.2f}'
61. .format(RnFr.score(X_test, y_test)))
62.
14
63. from sklearn.ensemble import AdaBoostClassifier
64. ada_classifier= AdaBoostClassifier()
65. ada_classifier.fit(X_train, y_train)
66. print('Accuracy of AdaBoostClassifier classifier on training set:
{:.2f}'
67. .format(RnFr.score(X_train, y_train)))
68. print('Accuracy of AdaBoostClassifier classifier on test set:
{:.2f}'
69. .format(RnFr.score(X_test, y_test)))
70.
71. from sklearn.linear_model import LogisticRegression
72. log_classifier=LogisticRegression()
73. log_classifier.fit(X_train, y_train)
74. print('Accuracy of LogisticRegression classifier on training set:
{:.2f}'
75. .format(RnFr.score(X_train, y_train)))
76. print('Accuracy of LogisticRegression classifier on test set:
{:.2f}'
77. .format(RnFr.score(X_test, y_test)))
78.
79.

Screenshots

Fake News Detection
100% (1)
Fake News Detection
25 pages
Spam News Detection Report
No ratings yet
Spam News Detection Report
9 pages
Network Security Lab Manual
50% (4)
Network Security Lab Manual
75 pages
Fake News Detetcion PPT 2023
No ratings yet
Fake News Detetcion PPT 2023
25 pages
Survey of Deep Learning Paradigms For Speech Processing
No ratings yet
Survey of Deep Learning Paradigms For Speech Processing
37 pages
Rapid Miner Cheat Doc 1
No ratings yet
Rapid Miner Cheat Doc 1
14 pages
AAT Cover Page
No ratings yet
AAT Cover Page
17 pages
Construct The Binary Tree From Preorder and Inorder
No ratings yet
Construct The Binary Tree From Preorder and Inorder
2 pages
Project Documentation
No ratings yet
Project Documentation
6 pages
Fake Phase3
No ratings yet
Fake Phase3
14 pages
Ai Fake News Detection
No ratings yet
Ai Fake News Detection
3 pages
Maharana Pratap Engineering College: Computer Science and Engineering
No ratings yet
Maharana Pratap Engineering College: Computer Science and Engineering
14 pages
Fake News Detection Presentation
No ratings yet
Fake News Detection Presentation
15 pages
ML Project Report PDF
No ratings yet
ML Project Report PDF
26 pages
Sehrash
No ratings yet
Sehrash
3 pages
Document
No ratings yet
Document
3 pages
Artificial Neural Network Proposal
No ratings yet
Artificial Neural Network Proposal
5 pages
Comparison of Naive Bayes Classifier and C-LSTM
No ratings yet
Comparison of Naive Bayes Classifier and C-LSTM
6 pages
Fake News Detection
No ratings yet
Fake News Detection
8 pages
Ai Project
No ratings yet
Ai Project
16 pages
Machine Learning Fake News Blocking
No ratings yet
Machine Learning Fake News Blocking
14 pages
Fake News Detection Project Report
100% (1)
Fake News Detection Project Report
8 pages
Cyber Security: PROJECT: Fake News Detection
No ratings yet
Cyber Security: PROJECT: Fake News Detection
8 pages
NM TF
No ratings yet
NM TF
3 pages
Exp13 PDF
No ratings yet
Exp13 PDF
6 pages
Fake News Detection Using NLP
No ratings yet
Fake News Detection Using NLP
11 pages
s134450 Fake News Detection Using Machine Learning
No ratings yet
s134450 Fake News Detection Using Machine Learning
91 pages
Headline Detecting Fake News With M
No ratings yet
Headline Detecting Fake News With M
3 pages
Wa0038.
No ratings yet
Wa0038.
13 pages
Mega
No ratings yet
Mega
14 pages
A Machine Learning Project Report
No ratings yet
A Machine Learning Project Report
12 pages
Fake News Detection Project Report
No ratings yet
Fake News Detection Project Report
2 pages
Identifying Trustworthy News Articles
No ratings yet
Identifying Trustworthy News Articles
17 pages
Fake News Detection PPT 1
No ratings yet
Fake News Detection PPT 1
13 pages
Project Report
No ratings yet
Project Report
12 pages
Project Report
No ratings yet
Project Report
6 pages
AI Phase3
No ratings yet
AI Phase3
5 pages
Case Study DL
No ratings yet
Case Study DL
8 pages
Fake News Detection: Muhammad Hassan Ur Rehman Sufyan Ahmed Huzaifa Shuja Taber Bin Zameer
No ratings yet
Fake News Detection: Muhammad Hassan Ur Rehman Sufyan Ahmed Huzaifa Shuja Taber Bin Zameer
21 pages
AI Phase4
No ratings yet
AI Phase4
6 pages
Detection of Fake News
No ratings yet
Detection of Fake News
17 pages
NM Project Phase-2
No ratings yet
NM Project Phase-2
9 pages
Fake News - Machine Learning
No ratings yet
Fake News - Machine Learning
6 pages
FND Imp Points
No ratings yet
FND Imp Points
6 pages
20SCSE1180073 Shreyansh.
No ratings yet
20SCSE1180073 Shreyansh.
21 pages
Artificial Intelligence in Mechanical Engineering: A Case Study On Vibration Analysis of Cracked Cantilever Beam
No ratings yet
Artificial Intelligence in Mechanical Engineering: A Case Study On Vibration Analysis of Cracked Cantilever Beam
4 pages
The Main Objective Is To Detect The Fake News, Which Is A Classic Text Classification
No ratings yet
The Main Objective Is To Detect The Fake News, Which Is A Classic Text Classification
57 pages
AI Phase5
No ratings yet
AI Phase5
26 pages
Fake News Detector Report
No ratings yet
Fake News Detector Report
5 pages
Fake News Detection
No ratings yet
Fake News Detection
5 pages
Fake News Detectio3
No ratings yet
Fake News Detectio3
24 pages
Final Synopsis-Major Abhilasha, Ananya
No ratings yet
Final Synopsis-Major Abhilasha, Ananya
10 pages
FYP Copy
No ratings yet
FYP Copy
42 pages
Project Synopsis Report Format
No ratings yet
Project Synopsis Report Format
9 pages
A I Project Proposal
No ratings yet
A I Project Proposal
10 pages
Datastructure CT
No ratings yet
Datastructure CT
2 pages
Project Presentation
No ratings yet
Project Presentation
27 pages
Fake News Detection With Different Model
No ratings yet
Fake News Detection With Different Model
15 pages
Soln mt2 w08 431
No ratings yet
Soln mt2 w08 431
10 pages
7 Jasc 2019
No ratings yet
7 Jasc 2019
23 pages
FAke News Report
No ratings yet
FAke News Report
16 pages
Final Year of Computer Engineering 2022-23 Semester VII Project Synopsis
No ratings yet
Final Year of Computer Engineering 2022-23 Semester VII Project Synopsis
11 pages
Presentation On Line Drawing Algorithms
No ratings yet
Presentation On Line Drawing Algorithms
32 pages
Matlab Code For Ecg
No ratings yet
Matlab Code For Ecg
5 pages
Data Structure and Algorithm MCQ: A) B) C) D)
No ratings yet
Data Structure and Algorithm MCQ: A) B) C) D)
12 pages
ML Summer Training
No ratings yet
ML Summer Training
20 pages
D13 Manuscript
No ratings yet
D13 Manuscript
12 pages
Adaptive Robust Control of Wheeled Mobile Robot With Uncertainties
No ratings yet
Adaptive Robust Control of Wheeled Mobile Robot With Uncertainties
6 pages
Geetha Internship
No ratings yet
Geetha Internship
17 pages
Intelligent Compilers
No ratings yet
Intelligent Compilers
9 pages
Synopsis Minor Project-2
No ratings yet
Synopsis Minor Project-2
5 pages
DSL Lab Manual
No ratings yet
DSL Lab Manual
41 pages
Imresize (Dot) M
No ratings yet
Imresize (Dot) M
5 pages
Brochure Degree Sciences
No ratings yet
Brochure Degree Sciences
205 pages
DSP Summary Notes
No ratings yet
DSP Summary Notes
7 pages
Calambra Case Analysis
No ratings yet
Calambra Case Analysis
16 pages
3 4 SLV Sim Lin Eqns
No ratings yet
3 4 SLV Sim Lin Eqns
7 pages
03fundamentals CampgMatlab Electrical
No ratings yet
03fundamentals CampgMatlab Electrical
6 pages
5 41-55 IJMSPHR Performance of Machine Learning Algorithm
No ratings yet
5 41-55 IJMSPHR Performance of Machine Learning Algorithm
15 pages
Simulation
No ratings yet
Simulation
10 pages
Physics-Informed Neural Networks For Modeling Physiological Time Series For Cuf Ess Blood Pressure Estimation
No ratings yet
Physics-Informed Neural Networks For Modeling Physiological Time Series For Cuf Ess Blood Pressure Estimation
15 pages
(2+1) - Dimensional Extended Calogero-Bogoyavlenskii-Schiff Equation in Plasma Physics
No ratings yet
(2+1) - Dimensional Extended Calogero-Bogoyavlenskii-Schiff Equation in Plasma Physics
5 pages
Query Quake
No ratings yet
Query Quake
5 pages
4-Data Structures Operations and Its Cost Estimation
No ratings yet
4-Data Structures Operations and Its Cost Estimation
4 pages
PSRM 2 Assignement 4
No ratings yet
PSRM 2 Assignement 4
3 pages
Wolfcrypt FIPS 140-3
No ratings yet
Wolfcrypt FIPS 140-3
1 page
UG I UG II 2025 Revised
No ratings yet
UG I UG II 2025 Revised
2 pages
Diagram Simple Regression Appropriate: Roadway
No ratings yet
Diagram Simple Regression Appropriate: Roadway
1 page
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet

ML Report Fake News Detection

Uploaded by

ML Report Fake News Detection

Uploaded by

UML501: Machine Learning

TOPIC- FAKE NEWS DETECTION

Roll No: 102203680

Name: Parth Adlakha

Roll No: 102203636

Dr. Jinee Goyal

Computer Science and Engineering Department

3. Algorithm used 5-6

4. Code and Screenshots 8-14

1.1 Name of the dataset

Purpose of the Project:

2. Classify News as Real or Fake: Train a machine learning classifier, such as

3. Educational Insight: Demonstrates how natural language processing (NLP)

o re: the regular expression module, which is used for

Machine Learning Algorithms for Fake News Detection

2. Training Different Classification Models

5. Implementation in Code (Explanation)

7. Performance Visualization (Explanation)

This is Ada Boost Classifier

This is Logistic regression

This is Random Forest

You might also like