0% found this document useful (0 votes)

17 views5 pages

Natural Language Processing

This document discusses natural language processing and sentiment analysis of restaurant reviews. It imports data, preprocesses text by removing stopwords and applying stemming, creates bag-of-words representations, trains naive bayes and decision tree classifiers on the data, and evaluates the models using accuracy on a held-out test set. Key steps include loading and cleaning a dataset of restaurant reviews, vectorizing the text into feature vectors, training classifiers on 80% of the data and evaluating on the remaining 20%, and reporting the confusion matrices and accuracies of the naive bayes and decision tree models.

Uploaded by

shivaybhargava33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views5 pages

Natural Language Processing

Uploaded by

shivaybhargava33

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

Natural Language Processing

Install nltk
conda install -c anaconda nltk

Data Set: Restaurant_Reviews.tsv (Tab Separated File)

Import Libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import os

Import Data Set

os.chdir('C:\\Noble\\Training\\Deep Learning\\Training\Data\\')
os.getcwd()
# \t – for tab separated
# quoting = 3 – ignore “” from processing
dataset = pd.read_csv('Restaurant_Reviews.tsv', delimiter = '\t', quoting = 3)
dataset

Get one row from data set – example line 5

dataset['Review'][5]

To Print / View all stop words

import nltk # for stop words
from nltk.corpus import stopwords
nltk.download('stopwords')
all_stopwords = stopwords.words('english')
print (all_stopwords)

Cleaning the Data Set

import re
# re – Regular expression - https://docs.python.org/3/library/re.html
import nltk # for stop words
nltk.download('stopwords') # importing all stopwords
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer # For applying steming in the
dataset , to get the root of the word
corpus = [] # create a list to store all cleaned words
for i in range(0, 1000):
# dataset['Review'][i] - source data to prcess - i th record in the data
review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i]) # Replace punctuations
with space, other than letters replace with space
review = review.lower()
review = review.split() # split into different words
ps = PorterStemmer() # get root words
all_stopwords = stopwords.words('english') # get english stop words
all_stopwords.remove('not') # Remove “not” from stop words
review = [ps.stem(word) for word in review if not word in set(all_stopwords)]
review = ' '.join(review)
corpus.append(review)

Print Corpus
print (corpus)

To check Number of Distinct Words

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer() # 1500 is decided by statement len(X[0]). Fist execute
without max features
X = cv.fit_transform(corpus).toarray()
len(X[0])

Create a Bag of Words (tokenization)

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features = 1500) # 1500 is decided by statement
len(X[0]). Fist execute without max features
X = cv.fit_transform(corpus).toarray()
y = dataset.iloc[:, -1].values # this is dependent variable
print(len(X[0])) # this gives me the max_features count
print (X)
print (y)

Train Test Split

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20,
random_state = 0)

Print Size
print (X.shape)
print (X_train.shape)
print (X_test.shape)

Create Naïve Bayce Algorithms

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

Prediction
y_pred = classifier.predict(X_test)
Print Result Actual and Predict
print(np.concatenate((y_pred.reshape(len(y_pred),1),
y_test.reshape(len(y_test),1)),1))

Confusion Matrix to print Accuracy

from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

Create Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier

dt= DecisionTreeClassifier()
dt.fit(X_train, y_train)
dt_pred = dt.predict(X_test)
cm = confusion_matrix(y_test,dt_pred)
print(cm)
accuracy_score(y_test,dt_pred)

7.1 GR 7 EHL & EFAL - 2023 - 2024 ATP - Tracker Term 3 2024
No ratings yet
7.1 GR 7 EHL & EFAL - 2023 - 2024 ATP - Tracker Term 3 2024
30 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
21 pages
A House On Fire Book
No ratings yet
A House On Fire Book
77 pages
British Airways Forage Report
No ratings yet
British Airways Forage Report
12 pages
Immigration Clearance
60% (5)
Immigration Clearance
92 pages
NLP - Cheatsheet
No ratings yet
NLP - Cheatsheet
10 pages
The Metaphysics of Quantum Mechanics (T en
100% (1)
The Metaphysics of Quantum Mechanics (T en
353 pages
NLP 1 Week Tutorial NLTK
No ratings yet
NLP 1 Week Tutorial NLTK
15 pages
(PDF Download) Modern Statistics With R Måns Thulin Fulll Chapter
100% (3)
(PDF Download) Modern Statistics With R Måns Thulin Fulll Chapter
64 pages
IT3106E SP 01 Machine Level Programming
No ratings yet
IT3106E SP 01 Machine Level Programming
296 pages
Area Under The Curve PDF
No ratings yet
Area Under The Curve PDF
51 pages
Parts of Speech Tagger
No ratings yet
Parts of Speech Tagger
12 pages
NLP Lab Programms
No ratings yet
NLP Lab Programms
9 pages
Removing Stopwords in NLP
No ratings yet
Removing Stopwords in NLP
32 pages
NLPPractical
No ratings yet
NLPPractical
12 pages
DSBA+Master+Codebook+ +Text+Mining+&+TSF
No ratings yet
DSBA+Master+Codebook+ +Text+Mining+&+TSF
11 pages
Assignment
No ratings yet
Assignment
6 pages
Lab 2
No ratings yet
Lab 2
4 pages
Math Test Mix Up Worksheets RAZ
No ratings yet
Math Test Mix Up Worksheets RAZ
3 pages
GODSEND AGENDA Booster Deck
No ratings yet
GODSEND AGENDA Booster Deck
17 pages
Token Ization
No ratings yet
Token Ization
5 pages
NLP Assignment (917722H031)
No ratings yet
NLP Assignment (917722H031)
18 pages
Application Code
No ratings yet
Application Code
3 pages
NLP Lab
No ratings yet
NLP Lab
18 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
19 pages
Self Evaluation Exercises
No ratings yet
Self Evaluation Exercises
12 pages
Deep Learning Questions 1701781891
No ratings yet
Deep Learning Questions 1701781891
17 pages
Record
No ratings yet
Record
6 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
17 pages
Test Initial Engleza A 8a Cu Matrice Si Barem
100% (2)
Test Initial Engleza A 8a Cu Matrice Si Barem
4 pages
Importing The Libraries
No ratings yet
Importing The Libraries
3 pages
The Philippine Education Past To Present Medium of Instruction
50% (2)
The Philippine Education Past To Present Medium of Instruction
18 pages
AI Lab Manual Aktu
No ratings yet
AI Lab Manual Aktu
11 pages
R22 NLP Python Programs
No ratings yet
R22 NLP Python Programs
15 pages
7 TextAnalysis
No ratings yet
7 TextAnalysis
3 pages
Report On - Social Media Research Topic Modeling
No ratings yet
Report On - Social Media Research Topic Modeling
26 pages
7th English
No ratings yet
7th English
162 pages
Beginner's Guide To Data Cleaning and Feature Extraction in NLP - by Enes Gokce - Towards Data Science
No ratings yet
Beginner's Guide To Data Cleaning and Feature Extraction in NLP - by Enes Gokce - Towards Data Science
20 pages
Logarithm Exercise Full Solution (Combined)
No ratings yet
Logarithm Exercise Full Solution (Combined)
40 pages
Sumati
No ratings yet
Sumati
10 pages
For Assignment-10 (Machine Learning With Python - NLP-2)
No ratings yet
For Assignment-10 (Machine Learning With Python - NLP-2)
37 pages
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
No ratings yet
03 The-Different-Methods-Deal-Text-Data-Predictive-Python
16 pages
Ayachi BPSC Tre 4 0 English (TGT, Class-9th & 10th) Complete Foundation With Final Selection Batch 2024 - Online Live Classes by Adda 247
No ratings yet
Ayachi BPSC Tre 4 0 English (TGT, Class-9th & 10th) Complete Foundation With Final Selection Batch 2024 - Online Live Classes by Adda 247
2 pages
Methodology
No ratings yet
Methodology
9 pages
NLP Manual
No ratings yet
NLP Manual
21 pages
Sample Questions 1 - Marks Class 11 Maths
No ratings yet
Sample Questions 1 - Marks Class 11 Maths
6 pages
T SNE Visualization of Amazon Reviews With Polarity Based Color Coding+
No ratings yet
T SNE Visualization of Amazon Reviews With Polarity Based Color Coding+
29 pages
Group 4 MovieReview
No ratings yet
Group 4 MovieReview
10 pages
Shreya Srivastava-27
No ratings yet
Shreya Srivastava-27
3 pages
Machine Learning Project Presentation
No ratings yet
Machine Learning Project Presentation
14 pages
7 Idf
No ratings yet
7 Idf
5 pages
Experiment 7 ML
No ratings yet
Experiment 7 ML
3 pages
NLP Tushar
No ratings yet
NLP Tushar
21 pages
Ir Lab 2 Ir Learning Outcomes: Pyterrier
No ratings yet
Ir Lab 2 Ir Learning Outcomes: Pyterrier
7 pages
Ment Analysis Text Classification
No ratings yet
Ment Analysis Text Classification
9 pages
The Vulgar Poetism in Driss Benali Malhûn Texts - Reading of "Lalla Ghita" and "Al-Lutayyah"
No ratings yet
The Vulgar Poetism in Driss Benali Malhûn Texts - Reading of "Lalla Ghita" and "Al-Lutayyah"
11 pages
DSBD 7 Ass
No ratings yet
DSBD 7 Ass
9 pages
NLP Lab - Manual
No ratings yet
NLP Lab - Manual
33 pages
SL Classification For Data Science..
No ratings yet
SL Classification For Data Science..
4 pages
Python Project
No ratings yet
Python Project
2 pages
PRJ Movie Recommendation Data Science..
No ratings yet
PRJ Movie Recommendation Data Science..
7 pages
1a NLTK
No ratings yet
1a NLTK
10 pages
Wsma Final Manual
No ratings yet
Wsma Final Manual
58 pages
Aped For Fake News
No ratings yet
Aped For Fake News
6 pages
Basenlp
No ratings yet
Basenlp
5 pages
Sentiment Analysis With NLP Deep Learning
No ratings yet
Sentiment Analysis With NLP Deep Learning
8 pages
Spanish I Course Outline
No ratings yet
Spanish I Course Outline
4 pages
Glove
100% (1)
Glove
10 pages
Lab2 IR
No ratings yet
Lab2 IR
16 pages
British National Academy Complaint Letter
No ratings yet
British National Academy Complaint Letter
2 pages
Rajeev Mishra 20 SCSE1180087
No ratings yet
Rajeev Mishra 20 SCSE1180087
29 pages
Natural Language Processing
No ratings yet
Natural Language Processing
25 pages
NLTK - N-Gram LM
No ratings yet
NLTK - N-Gram LM
13 pages
NLTK Tutorial
No ratings yet
NLTK Tutorial
33 pages
Grammar File4
No ratings yet
Grammar File4
2 pages
19 Passage 1 - What Are You Laughing at Q1-13
No ratings yet
19 Passage 1 - What Are You Laughing at Q1-13
5 pages
Theory: Scale and Chord: Ke y # Sharps Ke y B Flats
No ratings yet
Theory: Scale and Chord: Ke y # Sharps Ke y B Flats
3 pages
Matplotlib For Data Science..
No ratings yet
Matplotlib For Data Science..
11 pages
ASTW RA03 PracticalManual
No ratings yet
ASTW RA03 PracticalManual
18 pages
Aditya Hridaya Stotra Meaning and Benefits PDF
0% (1)
Aditya Hridaya Stotra Meaning and Benefits PDF
16 pages
Hadrosaurus Foulkii (Animals) 2022-2023
No ratings yet
Hadrosaurus Foulkii (Animals) 2022-2023
3 pages
Moon of The Caribbees: Presented by
No ratings yet
Moon of The Caribbees: Presented by
7 pages
Bulanadi Chapter 1 To 3
No ratings yet
Bulanadi Chapter 1 To 3
33 pages
Open House Powerpoint
No ratings yet
Open House Powerpoint
17 pages
Tense and Aspects
No ratings yet
Tense and Aspects
6 pages
Natural Language Processing
No ratings yet
Natural Language Processing
17 pages
Context Clue
No ratings yet
Context Clue
5 pages
Jing Ye Si
0% (1)
Jing Ye Si
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
22 pages
Party Data Model
No ratings yet
Party Data Model
26 pages
NLP Lab Manual
No ratings yet
NLP Lab Manual
15 pages

Natural Language Processing

Uploaded by

Natural Language Processing

Uploaded by

Natural Language Processing

Data Set: Restaurant_Reviews.tsv (Tab Separated File)

Import Data Set

Get one row from data set – example line 5

To Print / View all stop words

Cleaning the Data Set

To check Number of Distinct Words

Create a Bag of Words (tokenization)

Train Test Split

Create Naïve Bayce Algorithms

Confusion Matrix to print Accuracy

Create Decision Tree Classifier

from sklearn.tree import DecisionTreeClassifier

You might also like