Python Project

This document loads news article text and label data, preprocesses the text by removing stopwords and special characters, splits the data into training and test sets, applies TF-IDF vectorization, trains a PassiveAggressiveClassifier model on the training set, predicts labels on the test set, and evaluates the model's accuracy and confusion matrix. Key steps include loading CSV data, preprocessing text, splitting into train and test, applying TF-IDF, training a PAC model, predicting on the test set, and evaluating accuracy.

Uploaded by

bebshnnsjs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views2 pages

Python Project

Uploaded by

bebshnnsjs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 2

# Import necessary libraries

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import PassiveAggressiveClassifier
from sklearn.metrics import accuracy_score, confusion_matrix
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
from nltk.stem import WordNetLemmatizer
import re

# Download NLTK resources (if not downloaded)

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

# Load the dataset (assuming it's in CSV format)

data = pd.read_csv('/news.csv') # Replace 'your_dataset.csv' with your file name

# Explore the dataset

print(data.head()) # Check the first few rows
print(data.info()) # Get information about the dataset

# Data preprocessing
lemmatizer = WordNetLemmatizer()
stop_words = set(stopwords.words('english'))

def preprocess_text(text):
# Convert text to lowercase
text = text.lower()

# Remove special characters and digits

text = re.sub(r'\W', ' ', text)
text = re.sub(r'\d', ' ', text)

# Tokenize the text

words = word_tokenize(text)

# Remove stop words and lemmatize tokens

words = [lemmatizer.lemmatize(word) for word in words if word not in stop_words]

# Join words back into text

processed_text = ' '.join(words)
return processed_text

data['text'] = data['text'].apply(preprocess_text)

# Feature extraction
X = data['text']
y = data['label']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# TF-IDF Vectorization
tfidf_vectorizer = TfidfVectorizer(max_features=5000, stop_words='english')
tfidf_train = tfidf_vectorizer.fit_transform(X_train)
tfidf_test = tfidf_vectorizer.transform(X_test)

# Model building - using Passive Aggressive Classifier

model = PassiveAggressiveClassifier(max_iter=50)
model.fit(tfidf_train, y_train)

# Prediction
y_pred = model.predict(tfidf_test)

# Model evaluation
accuracy = accuracy_score(y_test, y_pred)
conf_matrix = confusion_matrix(y_test, y_pred)

print(f"Accuracy: {accuracy * 100:.2f}%")

print(f"Confusion Matrix:\n{conf_matrix}")

Unstructtured Data Classification Fresco
100% (1)
Unstructtured Data Classification Fresco
4 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
Glove
100% (1)
Glove
10 pages
Lab 6
No ratings yet
Lab 6
47 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
Reg. No.: 39110009 Colab Notebook Link: Name: Abivirshan Suresh
No ratings yet
Reg. No.: 39110009 Colab Notebook Link: Name: Abivirshan Suresh
27 pages
DSC 253 Homework 1
No ratings yet
DSC 253 Homework 1
15 pages
Aped For Fake News
No ratings yet
Aped For Fake News
6 pages
Parts of Speech Tagger
No ratings yet
Parts of Speech Tagger
12 pages
Report On - Social Media Research Topic Modeling
No ratings yet
Report On - Social Media Research Topic Modeling
26 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
NLP 1 Week Tutorial NLTK
No ratings yet
NLP 1 Week Tutorial NLTK
15 pages
Back of Words
No ratings yet
Back of Words
21 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
Fake News Classification - Ipynb - Colaboratory
No ratings yet
Fake News Classification - Ipynb - Colaboratory
6 pages
2023 Aug How To Produce Data For A Neural networkORG
No ratings yet
2023 Aug How To Produce Data For A Neural networkORG
6 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
NLP Lab Programs
No ratings yet
NLP Lab Programs
18 pages
Shreya Srivastava-27
No ratings yet
Shreya Srivastava-27
3 pages
C24064 - NLP - Lab Manual
No ratings yet
C24064 - NLP - Lab Manual
28 pages
Topic Classifierby David Caleb
No ratings yet
Topic Classifierby David Caleb
7 pages
17 - Source Code - nlp-2-5
No ratings yet
17 - Source Code - nlp-2-5
4 pages
Unit2 Full
No ratings yet
Unit2 Full
28 pages
A Comprehensive Guide To Understand and Implement Text Classification in Python
No ratings yet
A Comprehensive Guide To Understand and Implement Text Classification in Python
34 pages
WDM - Week - I
No ratings yet
WDM - Week - I
24 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
Sumati
No ratings yet
Sumati
10 pages
IRT Lab Programs
No ratings yet
IRT Lab Programs
9 pages
Project Report
No ratings yet
Project Report
12 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Document
No ratings yet
Document
3 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
2023 Aug How To Prepare Data For A Neural Network A Step-by-Step Guide
No ratings yet
2023 Aug How To Prepare Data For A Neural Network A Step-by-Step Guide
7 pages
NLP Lab Assignment 8
No ratings yet
NLP Lab Assignment 8
14 pages
AI Phash3
No ratings yet
AI Phash3
11 pages
Ai Lab Final
No ratings yet
Ai Lab Final
21 pages
Text, Pos, Wor2vec
No ratings yet
Text, Pos, Wor2vec
3 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
Methodology
No ratings yet
Methodology
9 pages
IR 4 E-Mail Spam Filtering Spam - Dataset
No ratings yet
IR 4 E-Mail Spam Filtering Spam - Dataset
2 pages
NLP Practical Three
No ratings yet
NLP Practical Three
8 pages
NLP Notebook
No ratings yet
NLP Notebook
20 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
Application Code
No ratings yet
Application Code
3 pages
Sample Code
No ratings yet
Sample Code
8 pages
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
No ratings yet
Machine Learning, NLP - Text Classification Using Scikit-Learn, Python and NLTK
9 pages
FND Imp Points
No ratings yet
FND Imp Points
6 pages
Lab5 Example Fall 23
No ratings yet
Lab5 Example Fall 23
4 pages
NLP
No ratings yet
NLP
6 pages
A Guide To Text Classification (NLP)
No ratings yet
A Guide To Text Classification (NLP)
17 pages
Module 3 Lab 3
No ratings yet
Module 3 Lab 3
4 pages
CSDM2-Text Preprocessing For NL Data - 011050
No ratings yet
CSDM2-Text Preprocessing For NL Data - 011050
6 pages
Token Ization
No ratings yet
Token Ization
5 pages
Assignment
No ratings yet
Assignment
6 pages
Unstructured Data Classification Handson
No ratings yet
Unstructured Data Classification Handson
4 pages
# Load The Dataset: 'News - Dataset - Pickle' 'RB'
No ratings yet
# Load The Dataset: 'News - Dataset - Pickle' 'RB'
2 pages
Machine Learning Project in Python Step-By-Step
No ratings yet
Machine Learning Project in Python Step-By-Step
23 pages
Matrices Basics
No ratings yet
Matrices Basics
16 pages
Assignment 1 Harsh Agarwal
No ratings yet
Assignment 1 Harsh Agarwal
13 pages
Hindi Solution Half Yearly
No ratings yet
Hindi Solution Half Yearly
7 pages
English Half Yearly Solution
No ratings yet
English Half Yearly Solution
6 pages
NgRx SignalStore: An effortless solution for state management
From Everand
NgRx SignalStore: An effortless solution for state management
Abdelfattah Ragab
No ratings yet
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet