0% found this document useful (0 votes)

63 views6 pages

Fake News Classification - Ipynb - Colaboratory

This document discusses classifying fake news using machine learning models. It performs the following steps: 1. Imports libraries and loads a dataset of news articles labeled as real or fake. 2. Preprocesses the text data by tokenizing, lemmatizing, removing stopwords, and vectorizing into feature vectors. 3. Splits the data into training and test sets and trains a random forest classifier model. 4. Evaluates the model on test data and reports a 93.7% accuracy.

Uploaded by

AYAAN Satkut

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views6 pages

Fake News Classification - Ipynb - Colaboratory

Uploaded by

AYAAN Satkut

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

12/31/23, 4:11 PM Fake News Classification.

ipynb - Colaboratory

keyboard_arrow_down Required Libraries

import pandas as pd
import numpy as np
import re
import nltk
from nltk.corpus import stopwords
from nltk.stem import PorterStemmer, WordNetLemmatizer
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

keyboard_arrow_down 1. Data Gathering

df = pd.read_csv("/content/drive/MyDrive/Fake news detection/News_dataset.csv")
df.head()

id title author text label

House Dem Aide: We

House Dem Aide: We Didn’t
0 0 Darrell Lucus Didn’t Even See Comey’s 1
Even See Comey’s Let...
Let...

FLYNN: Hillary Clinton, Big Ever get the feeling your

1 1 Daniel J. Flynn 0
Woman on Campus - ... life circles the rou...

Why the Truth Might Get You Why the Truth Might Get
2 2 Consortiumnews.com 1
Fired You Fired October 29, ...

keyboard_arrow_down 2. Data Analysis

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20800 entries, 0 to 20799
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 20800 non-null int64
1 title 20242 non-null object
2 author 18843 non-null object
3 text 20761 non-null object
4 label 20800 non-null int64
dtypes: int64(2), object(3)
memory usage: 812.6+ KB

df['label'].value_counts()

1 10413
0 10387
Name: label, dtype: int64

df.shape

(20800, 5)

df.isna().sum()

id 0
title 558
author 1957
text 39
label 0
dtype: int64

df = df.dropna() #Handled Missing values by droping those rows

df.isna().sum()

id 0
title 0

https://colab.research.google.com/drive/196NFXnYInNj9bFCE3lyCRGStXYB71vPs?authuser=2#printMode=true 1/6
12/31/23, 4:11 PM Fake News Classification.ipynb - Colaboratory
author 0
text 0
label 0
dtype: int64

df.shape

(18285, 5)

df.reset_index(inplace=True)
df.head()

index id title author text label

House Dem Aide: We House Dem Aide: We

0 0 0 Didn’t Even See Comey’s Darrell Lucus Didn’t Even See 1
Let... Comey’s Let...

FLYNN: Hillary Clinton, Ever get the feeling

1 1 1 Big Woman on Campus - Daniel J. Flynn your life circles the 0
... rou...

Why the Truth Might

Why the Truth Might Get
2 2 2 Consortiumnews com Get You Fired October 1

df['title'][0]

'House Dem Aide: We Didn’t Even See Comey’s Letter Until Jason Chaffetz Tweeted It'

df = df.drop(['id','text','author'],axis = 1)
df.head()

index title label

0 0 House Dem Aide: We Didn’t Even See Comey’s Let... 1

1 1 FLYNN: Hillary Clinton, Big Woman on Campus - ... 0

2 2 Why the Truth Might Get You Fired 1

3 3 15 Civilians Killed In Single US Airstrike Hav... 1

4 4 Iranian woman jailed for fictional unpublished... 1

keyboard_arrow_down 3. Data Preprocessing

keyboard_arrow_down 1.Tokenization
sample_data = 'The quick brown fox jumps over the lazy dog'
sample_data = sample_data.split()
sample_data

['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

keyboard_arrow_down 2. Make Lowercase

sample_data = [data.lower() for data in sample_data]
sample_data

['the', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

keyboard_arrow_down 3. Remove Stopwords

nltk.download('stopwords')
stopwords = stopwords.words('english')
print(stopwords[0:10])
print(len(stopwords))

['i', 'me', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', "you're"]
179
[nltk_data] Downloading package stopwords to /root/nltk_data...
[nltk_data] Unzipping corpora/stopwords.zip.

https://colab.research.google.com/drive/196NFXnYInNj9bFCE3lyCRGStXYB71vPs?authuser=2#printMode=true 2/6
12/31/23, 4:11 PM Fake News Classification.ipynb - Colaboratory
sample_data = [data for data in sample_data if data not in stopwords]
print(sample_data)
len(sample_data)

['quick', 'brown', 'fox', 'jumps', 'lazy', 'dog']

keyboard_arrow_down 4. Stemming
ps = PorterStemmer()
sample_data_stemming = [ps.stem(data) for data in sample_data]
print(sample_data_stemming)

['quick', 'brown', 'fox', 'jump', 'lazi', 'dog']

keyboard_arrow_down 5. Lemmatization
nltk.download('wordnet')
lm = WordNetLemmatizer()
sample_data_lemma = [lm.lemmatize(data) for data in sample_data]
print(sample_data_lemma)

[nltk_data] Downloading package wordnet to /root/nltk_data...

['quick', 'brown', 'fox', 'jump', 'lazy', 'dog']

lm = WordNetLemmatizer()
corpus = []
for i in range (len(df)):
review = re.sub('^a-zA-Z0-9',' ', df['title'][i])
review = review.lower()
review = review.split()
review = [lm.lemmatize(x) for x in review if x not in stopwords]
review = " ".join(review)
corpus.append(review)

len(corpus)

18285

df['title'][0]

'House Dem Aide: We Didn’t Even See Comey’s Letter Until Jason Chaffetz Tweeted It'

corpus[0]

'house dem aide: didn’t even see comey’s letter jason chaffetz tweeted'

keyboard_arrow_down 4. Vectorization (Convert Text data into the Vector)

tf = TfidfVectorizer()
x = tf.fit_transform(corpus).toarray()
x

array([[0., 0., 0., ..., 0., 0., 0.],

[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
...,
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.],
[0., 0., 0., ..., 0., 0., 0.]])

y = df['label']
y.head()

0 1
1 0
2 1
3 1
4 1
Name: label, dtype: int64

https://colab.research.google.com/drive/196NFXnYInNj9bFCE3lyCRGStXYB71vPs?authuser=2#printMode=true 3/6
12/31/23, 4:11 PM Fake News Classification.ipynb - Colaboratory

keyboard_arrow_down Data splitting into the train and test

x_train, x_test, y_train, y_test = train_test_split(x,y, test_size = 0.3, random_state = 10, stratify = y )

len(x_train),len(y_train)

(12799, 12799)

len(x_test), len(y_test)

(5486, 5486)

keyboard_arrow_down 5. Model Building

rf = RandomForestClassifier()
rf.fit(x_train, y_train)

▾ RandomForestClassifier
RandomForestClassifier()

keyboard_arrow_down 6. Model Evaluation

y_pred = rf.predict(x_test)
accuracy_score_ = accuracy_score(y_test,y_pred)
accuracy_score_

0.9374772147283995

class Evaluation:

def __init__(self,model,x_train,x_test,y_train,y_test):
self.model = model
self.x_train = x_train
self.x_test = x_test
self.y_train = y_train
self.y_test = y_test

def train_evaluation(self):
y_pred_train = self.model.predict(self.x_train)

acc_scr_train = accuracy_score(self.y_train,y_pred_train)
print("Accuracy Score On Training Data Set :",acc_scr_train)
print()

con_mat_train = confusion_matrix(self.y_train,y_pred_train)
print("Confusion Matrix On Training Data Set :\n",con_mat_train)
print()

class_rep_train = classification_report(self.y_train,y_pred_train)
print("Classification Report On Training Data Set :\n",class_rep_train)

def test_evaluation(self):
y_pred_test = self.model.predict(self.x_test)

acc_scr_test = accuracy_score(self.y_test,y_pred_test)
print("Accuracy Score On Testing Data Set :",acc_scr_test)
print()

con_mat_test = confusion_matrix(self.y_test,y_pred_test)
print("Confusion Matrix On Testing Data Set :\n",con_mat_test)
print()

class_rep_test = classification_report(self.y_test,y_pred_test)
print("Classification Report On Testing Data Set :\n",class_rep_test)

https://colab.research.google.com/drive/196NFXnYInNj9bFCE3lyCRGStXYB71vPs?authuser=2#printMode=true 4/6
12/31/23, 4:11 PM Fake News Classification.ipynb - Colaboratory
#Checking the accuracy on training dataset

Evaluation(rf,x_train, x_test, y_train, y_test).train_evaluation()

Accuracy Score On Training Data Set : 1.0

Confusion Matrix On Training Data Set :

[[7252 0]
[ 0 5547]]

Classification Report On Training Data Set :

precision recall f1-score support

0 1.00 1.00 1.00 7252

1 1.00 1.00 1.00 5547

accuracy 1.00 12799

macro avg 1.00 1.00 1.00 12799
weighted avg 1.00 1.00 1.00 12799

#Checking the accuracy on testing dataset

Evaluation(rf,x_train, x_test, y_train, y_test).test_evaluation()

Accuracy Score On Testing Data Set : 0.9374772147283995

Confusion Matrix On Testing Data Set :

[[2825 284]
[ 59 2318]]

Classification Report On Testing Data Set :

precision recall f1-score support

0 0.98 0.91 0.94 3109

1 0.89 0.98 0.93 2377

accuracy 0.94 5486

macro avg 0.94 0.94 0.94 5486
weighted avg 0.94 0.94 0.94 5486

keyboard_arrow_down Prediction Pipeline

class Preprocessing:

def __init__(self,data):
self.data = data

def text_preprocessing_user(self):
lm = WordNetLemmatizer()
pred_data = [self.data]
preprocess_data = []
for data in pred_data:
review = re.sub('^a-zA-Z0-9',' ', data)
review = review.lower()
review = review.split()
review = [lm.lemmatize(x) for x in review if x not in stopwords]
review = " ".join(review)
preprocess_data.append(review)
return preprocess_data

df['title'][1]

'FLYNN: Hillary Clinton, Big Woman on Campus - Breitbart'

data = 'FLYNN: Hillary Clinton, Big Woman on Campus - Breitbart'

Preprocessing(data).text_preprocessing_user()

['flynn: hillary clinton, big woman campus - breitbart']

class Prediction:

def init(self,pred_data, model):

self.pred_data = pred_data
self.model = model

def prediction_model(self):
preprocess_data = Preprocessing(self.pred_data).text_preprocessing_user()
data = tf.transform(preprocess_data)

https://colab.research.google.com/drive/196NFXnYInNj9bFCE3lyCRGStXYB71vPs?authuser=2#printMode=true 5/6
12/31/23, 4:11 PM Fake News Classification.ipynb - Colaboratory
prediction = self.model.predict(data)

if prediction [0] == 0 :
return "The News Is Fake"

else:
return "The News Is Real"

data = 'FLYNN: Hillary Clinton, Big Woman on Campus - Breitbart'

Prediction(data,rf).prediction_model()

'The News Is Fake'

df['title'][3]

'15 Civilians Killed In Single US Airstrike Have Been Identified'

user_data = '15 Civilians Killed In Single US Airstrike Have Been Identified'

Prediction(user_data,rf).prediction_model()

'The News Is Real'

https://colab.research.google.com/drive/196NFXnYInNj9bFCE3lyCRGStXYB71vPs?authuser=2#printMode=true 6/6

Fake News Detection
100% (1)
Fake News Detection
25 pages
Detecting of Fake News With Python and ML
57% (7)
Detecting of Fake News With Python and ML
17 pages
Fake News Detection Using Machine Learning: Presented by Fathima T H MSC Computer Science
71% (7)
Fake News Detection Using Machine Learning: Presented by Fathima T H MSC Computer Science
15 pages
Quantum Users Guide-3
100% (2)
Quantum Users Guide-3
209 pages
Small Signal Amplifiers: Prof. Niknejad
No ratings yet
Small Signal Amplifiers: Prof. Niknejad
33 pages
Historical Background of Special and Inclusive Education
100% (2)
Historical Background of Special and Inclusive Education
5 pages
Vidyanjali Project SCT - CSR
No ratings yet
Vidyanjali Project SCT - CSR
21 pages
Bubble Sort
No ratings yet
Bubble Sort
108 pages
Training Program: Subject:: Final Project
0% (2)
Training Program: Subject:: Final Project
7 pages
Fake News Detection
No ratings yet
Fake News Detection
13 pages
Cs302 Final Term Solved Papers Mega File
100% (1)
Cs302 Final Term Solved Papers Mega File
6 pages
Directory20210112 124635
No ratings yet
Directory20210112 124635
9 pages
Student Exploration: Fan Cart Physics
100% (1)
Student Exploration: Fan Cart Physics
4 pages
Built An NLP Model To Detect Fake News Accurately 1746681940
No ratings yet
Built An NLP Model To Detect Fake News Accurately 1746681940
96 pages
The Writers Presence Donald Mcquade Robert Atwan PDF Download
No ratings yet
The Writers Presence Donald Mcquade Robert Atwan PDF Download
83 pages
Tender Schedule Ponshe Agency Staff
No ratings yet
Tender Schedule Ponshe Agency Staff
121 pages
Fake News Detection EDA Case Study
No ratings yet
Fake News Detection EDA Case Study
57 pages
s134450 Fake News Detection Using Machine Learning
No ratings yet
s134450 Fake News Detection Using Machine Learning
91 pages
CKD SRL3
No ratings yet
CKD SRL3
89 pages
SocrAI Day 3
No ratings yet
SocrAI Day 3
43 pages
The Main Objective Is To Detect The Fake News, Which Is A Classic Text Classification
No ratings yet
The Main Objective Is To Detect The Fake News, Which Is A Classic Text Classification
57 pages
Lecture Environmental Science - Fundamentals of Ecology
No ratings yet
Lecture Environmental Science - Fundamentals of Ecology
47 pages
D13 Manuscript
No ratings yet
D13 Manuscript
12 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
Identifying Trustworthy News Articles
No ratings yet
Identifying Trustworthy News Articles
17 pages
Project Presentation
No ratings yet
Project Presentation
27 pages
Wa0007.3307303433096114618
No ratings yet
Wa0007.3307303433096114618
20 pages
FAke News Report
No ratings yet
FAke News Report
16 pages
Fake News Prediction Review
No ratings yet
Fake News Prediction Review
14 pages
Fake News Detection Presentation
No ratings yet
Fake News Detection Presentation
15 pages
Machine Learning Fake News Blocking
No ratings yet
Machine Learning Fake News Blocking
14 pages
AAT Cover Page
No ratings yet
AAT Cover Page
17 pages
AMTS - Vacuum Bagging
No ratings yet
AMTS - Vacuum Bagging
15 pages
549-00-0059 C Ela
No ratings yet
549-00-0059 C Ela
16 pages
Fake News Detection Project
No ratings yet
Fake News Detection Project
7 pages
Fake News Detection Using NLP
No ratings yet
Fake News Detection Using NLP
11 pages
Shoaib Khan - 1918922 - Report
No ratings yet
Shoaib Khan - 1918922 - Report
20 pages
Fake Phase3
No ratings yet
Fake Phase3
14 pages
ML Report Fake News Detection
No ratings yet
ML Report Fake News Detection
15 pages
20SCSE1180073 Shreyansh.
No ratings yet
20SCSE1180073 Shreyansh.
21 pages
Project Report
No ratings yet
Project Report
12 pages
Design of Electrical Apparatus
No ratings yet
Design of Electrical Apparatus
15 pages
Maharana Pratap Engineering College: Computer Science and Engineering
No ratings yet
Maharana Pratap Engineering College: Computer Science and Engineering
14 pages
Assign 5 TT
No ratings yet
Assign 5 TT
13 pages
Detection of Fake News
No ratings yet
Detection of Fake News
17 pages
A Fake News Detection System Using Data Science and ML
No ratings yet
A Fake News Detection System Using Data Science and ML
7 pages
Tracer Studyonthe Graduatesofthe BSBAProgram An Inputto Curricular Development
No ratings yet
Tracer Studyonthe Graduatesofthe BSBAProgram An Inputto Curricular Development
20 pages
Fake - News - Detection - Using ML
No ratings yet
Fake - News - Detection - Using ML
11 pages
Fake News Detection: Muhammad Hassan Ur Rehman Sufyan Ahmed Huzaifa Shuja Taber Bin Zameer
No ratings yet
Fake News Detection: Muhammad Hassan Ur Rehman Sufyan Ahmed Huzaifa Shuja Taber Bin Zameer
21 pages
Summary of A Doll - S House
No ratings yet
Summary of A Doll - S House
25 pages
Methodology
No ratings yet
Methodology
9 pages
AI Phase3
No ratings yet
AI Phase3
5 pages
Conference FakeNews Detection
No ratings yet
Conference FakeNews Detection
6 pages
Fake News
No ratings yet
Fake News
8 pages
Aquif Ibrar 1212
No ratings yet
Aquif Ibrar 1212
9 pages
Exp13 PDF
No ratings yet
Exp13 PDF
6 pages
Prac-5 Aam
No ratings yet
Prac-5 Aam
6 pages
Ed Mastery: IT Mastery, #13
From Everand
Ed Mastery: IT Mastery, #13
Michael W. Lucas
No ratings yet
AI Phase4
No ratings yet
AI Phase4
5 pages
Cyber Security: PROJECT: Fake News Detection
No ratings yet
Cyber Security: PROJECT: Fake News Detection
8 pages
Project Synopsis Report Format
No ratings yet
Project Synopsis Report Format
9 pages
Project Documentation
No ratings yet
Project Documentation
6 pages
Artificial Neural Network Proposal
No ratings yet
Artificial Neural Network Proposal
5 pages
COL774: Assignment 4 Naive Bayes & Collaborative Filtering: Released On: 2nd October, 2024
No ratings yet
COL774: Assignment 4 Naive Bayes & Collaborative Filtering: Released On: 2nd October, 2024
4 pages
Fake News Mini PDF
No ratings yet
Fake News Mini PDF
12 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
Fake News - Machine Learning
No ratings yet
Fake News - Machine Learning
6 pages
Viga Acero Compuesta - 21x50#
No ratings yet
Viga Acero Compuesta - 21x50#
10 pages
Sample Code
No ratings yet
Sample Code
8 pages
Ed Mastery (Manly McManface Edition)
From Everand
Ed Mastery (Manly McManface Edition)
Michael W. Lucas
No ratings yet
Mahbuba Afroz Deena
No ratings yet
Mahbuba Afroz Deena
5 pages
Fake News Detector Report
No ratings yet
Fake News Detector Report
5 pages
Ai Fake News Detection
No ratings yet
Ai Fake News Detection
3 pages
NM TF
No ratings yet
NM TF
3 pages
FND Imp Points
No ratings yet
FND Imp Points
6 pages
How To Auto Install All Kali Linux Tools Using Katoolin On DebianUbuntu
No ratings yet
How To Auto Install All Kali Linux Tools Using Katoolin On DebianUbuntu
4 pages
CV DBecerra
No ratings yet
CV DBecerra
3 pages
Document
No ratings yet
Document
3 pages
TENTEC V-Series Data Sheet R8 A4
No ratings yet
TENTEC V-Series Data Sheet R8 A4
4 pages
Headline Detecting Fake News With M
No ratings yet
Headline Detecting Fake News With M
3 pages
Report Rohun Sjmoon
No ratings yet
Report Rohun Sjmoon
6 pages
Led Emergency Kit
No ratings yet
Led Emergency Kit
2 pages
4.10. Text Data Pre-Processing - Use Case - Ipynb - Colaboratory
No ratings yet
4.10. Text Data Pre-Processing - Use Case - Ipynb - Colaboratory
2 pages
NLP Progress Report
No ratings yet
NLP Progress Report
4 pages
Formaldehyde Emission
No ratings yet
Formaldehyde Emission
2 pages
H3 World School - One of The Top English Medium Schools in Ahmedabad
No ratings yet
H3 World School - One of The Top English Medium Schools in Ahmedabad
3 pages
DFC3000 Product Specifications
No ratings yet
DFC3000 Product Specifications
2 pages
Portfolio What Is A Portfolio?
No ratings yet
Portfolio What Is A Portfolio?
4 pages
WELiT Motion Sensor Promotional Offer
No ratings yet
WELiT Motion Sensor Promotional Offer
1 page