0% found this document useful (0 votes)

48 views21 pages

DL Project

The document describes a project on sentiment analysis of movie reviews from the IMDB dataset using machine learning techniques. It discusses preprocessing text data using NLP, training a LinearSVC classifier on TF-IDF features, and evaluating model performance on test data using various metrics.

Uploaded by

sravu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views21 pages

DL Project

Uploaded by

sravu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 21

Deep Learning

Project
On
Sentiment Analysis
Of Movie Reviews
in
IMDB Review Dataset

Prepared by :
V. Jayadeep Hemanth - VU21CSEN0500045
M. Jyothi Swaroop - VU21CSEN0500069
A. Sai Sri Ram - VU21CSEN0500017
Rvs. Aditya – VU21CSEN0500132

Academic Year : 2023-2024

Introduction :

Problem Description : The project focuses on sentiment analysis

of movie reviews using a Linear Support Vector Machine
(LinearSVC) classiﬁer. Sentiment analysis aims to determine the
sentiment expressed in a piece of text, whether it's positive or
negative. In this project, we'll train a model to classify movie
reviews as either positive or negative based on the text content.

Requirements Elicitation:
1. Access to the IMDB dataset with 50,000 movie reviews.
2. Python environment with necessary libraries installed:
spaCy, scikit-learn, pandas, etc.
3. Understanding of text preprocessing techniques such as
tokenization, lemmatization, and stop word removal.
4. Knowledge of machine learning concepts, particularly
support vector machines (SVM) and its implementation using
LinearSVC.

Problem Modeling:

The problem of sentiment analysis can be modeled as a binary

classification task, where each movie review is classified as either
positive or negative sentiment. LinearSVC is chosen as the model due
to its effectiveness in text classification tasks and its efficiency in
handling large datasets.
System Design:

Algorithm: The algorithm involves preprocessing the text data by

tokenizing, lemmatizing, and removing stop words. Then, the data
is transformed using TF-IDF vectorization. Finally, it's fed into the
LinearSVC classifier for training.
1. Annotating POS tags, named entities, syntactic
dependencies, etc.
2. Cleaning Text Data Algorithm :
Flowchart: The flowchart depicts the step-by-step process of
text preprocessing, feature extraction, model training, and
prediction.
Data Flow Diagram: The data flow diagram illustrates the flow of
data between various components of the system, including input
data, preprocessing steps, model training, and output predictions.
Implementation:
● Loading the Dataset: The IMDB dataset, which contains
movie reviews labeled as positive or negative sentiments, is
loaded into a DataFrame using a library like pandas.

● Preprocessing the Text Data:

○ Cleaning: The text data undergoes cleaning processes
such as removing special characters, converting to
lowercase, and handling stopwords.
○ Tokenization: Each review is tokenized into individual
words or tokens.
○ Lemmatization: Tokens are lemmatized to convert
inﬂected words into their base or dictionary form.
○ Vectorization: Text data is transformed into numerical
features using TF-IDF (Term Frequency-Inverse
Document Frequency) vectorization. This process
converts text data into a matrix representation where
each row corresponds to a document (review) and each
column corresponds to a unique word (feature). TF-IDF
assigns weights to words based on their frequency in a
document and across the corpus, allowing the model
to focus on important words while ignoring common
ones.

● Splitting into Train and Test Sets:

○ The preprocessed data is split into training and testing
sets. Typically, around 70% of the data is used for
training and the remaining 30% for testing.
○ This split ensures that the model learns from one
portion of the data (training set) and evaluates its
performance on unseen data (test set).
● Training the Model:
○ A machine learning model, in this case, a Linear
Support Vector Machine (LinearSVC), is chosen for
sentiment analysis.
○ The model is trained using the training dataset, where
the TF-IDF transformed features are used as input and
the corresponding sentiment labels (positive or
negative) are used as target output.
○ During training, the model learns to identify patterns
and relationships between the input features (word
frequencies) and the target labels (sentiments).
○ The training process involves optimizing the model's
parameters to minimize the prediction errors and
improve its accuracy.

● Evaluation:
○ After training, the model's performance is evaluated
using the testing dataset.
○ Metrics such as accuracy, precision, recall, and F1-score
are calculated to assess how well the model predicts
sentiment labels on unseen data.
○ Confusion matrices may also be analyzed to
understand the model's performance in terms of true
positives, true negatives, false positives, and false
negatives.
1. Mapping ‘positive’ and ‘negative’ sentiments to 1 and 0.

2. Evaluation Metrics
3. Output

System Test and Evaluation:

● The trained model is evaluated using the test dataset to

assess its performance in terms of precision, recall, F1-score,
and support. Confusion matrices and classiﬁcation reports
are generated to analyze the model's performance.
Report:

1. Results:

● The trained LinearSVC model achieved an accuracy of 89%

on the test set.
● Precision, recall, and F1-score were calculated for both
positive and negative sentiment classes.

2. Conclusion:

● The sentiment analysis model developed in this project

demonstrates the eﬀectiveness of machine learning
techniques in classifying movie reviews.
● With further ﬁne-tuning and optimization, the model can be
enhanced to improve its predictive performance.

3. Future Directions:

● Explore advanced natural language processing techniques

to capture more nuanced sentiment analysis.
● Incorporate user feedback and domain-speciﬁc knowledge
to enhance the model's performance.
● Deploy the trained model in real-world applications such as
movie review platforms to assist users in making informed
decisions.

Overall, the project showcases the application of machine learning

in analyzing and understanding sentiment in movie reviews,
contributing to the ﬁeld of natural language processing and
sentiment analysis.
CODE
IMPLEMENTATION
3/24/24, 9:22 PM IMDB_Sentiment

Deep Learning Project - Movie Reviews Sentiment Analysis

Names :
V. Jayadeep Hemanth - VU21CSEN0500045
M. Jyothi Swaroop - VU21CSEN0500069
A. Sai Sri Ram - VU21CSEN0500017
Rvs. Aditya – VU21CSEN0500132

1. Framework used :

1. spaCy
2. sklearn
3. Pandas

Machine Learning model used : LinearSVC - Linear Support Vector Machine Classifier

Metrics : Precision, Recall, F1-score, Support

Import spaCy
Import displacy for displaying word dependecies

In [2]: import spacy

from spacy import displacy

Load English Language from spaCy

In [3]: nlp=spacy.load('en_core_web_sm')

Using nlp() over a pre-defined text

In [7]: text="This movie is very bad. This is worst than the one I watch a week ago."
doc=nlp(text)
doc

This movie is very bad. This is worst than the one I watch a week ago.
Out[7]:

Tokenization of the pre-defined text

In [8]: for token in doc:

print(token)
3/24/24, 9:22 PM IMDB_Sentiment
This
movie
is
very
bad
.
This
is
worst
than
the
one
I
watch
a
week
ago
.

Imputing Sentencizer for separating sentences.

In [9]: sentencizer = nlp.create_pipe('sentencizer')

nlp.add_pipe('sentencizer', before='parser')

<spacy.pipeline.sentencizer.Sentencizer at 0x2c0f8856d00>
Out[9]:

In [10]: for sentencizer in doc.sents:

print(sentencizer)

This movie is very bad.

This is worst than the one I watch a week ago.

Import STOP_WORDS from spaCy English.

STOP_WORDS are the words which repeat more times in a mostly for the purpose of joining
sentences, maintaining the syntaxes and semantics of the given context.

In [11]: from spacy.lang.en.stop_words import STOP_WORDS

In [12]: stopwords=list(STOP_WORDS)
print(stopwords)

2/9
3/24/24, 9:22 PM IMDB_Sentiment
['see', 'whole', 'except', 'a', 'quite', 'them', 'himself', 'hereafter', 'beyond',
'always', 'call', 'was', 'too', 'yourself', 'nevertheless', 'from', 'as', 'using',
'been', 'these', 'show', 'i', "'m", 'perhaps', 'your', 'all', 'whereas', 'have',
'upon', 'along', 'other', 'nor', 'somehow', 'about', 'forty', 'yourselves', 'where
after', 'with', 'since', 'mostly', 'not', '‘ll', 'wherein', 'third', 'name', 'the
n', 'still', 'everyone', 'between', 'something', 'because', 'behind', 'many', 'aga
in', 'seem', 'towards', 'twelve', '‘d', 'whereupon', 'until', 'under', 'had', 'u
p', 'here', '’re', 'indeed', 'what', 'whenever', 'amongst', 'than', 'however', 'th
roughout', 'he', 'we', 'first', 'yours', 'if', 'must', 'regarding', "'re", 'becomi
ng', 'why', 'of', "'ll", 'herself', 'amount', 'others', 'whither', 'enough', 'befo
re', 'thereupon', 'him', 'whereby', 'front', 'but', 'is', 'never', '’m', 'while',
'above', 'could', 'hers', 'am', 'keep', 'side', 'four', 'ever', 'just', 'several',
'no', 'afterwards', 'being', 'may', 'please', 'therein', 'made', 'themselves', 'fi
fteen', 'fifty', 'via', 'how', 'whose', 'by', 'toward', "n't", 'ca', 'give', 'us',
'own', 'full', 'seemed', 'anyone', 'his', '’ll', "'ve", 'well', 'also', 'everywher
e', 'against', 'almost', 'has', 'around', 'hereby', '’s', 'it', 'any', 'among', 'c
an', 'seems', 'they', 'through', 'anyhow', 'at', 'beforehand', 'become', 'two', 'l
ess', 'on', 'either', 'beside', 'put', 'none', 'some', 'down', 'make', 'our', 'of
f', 'six', 'most', 'part', 'eleven', 'empty', 'get', 'one', 'often', 'used', 'some
one', 'whence', 'nowhere', 'alone', 'there', '‘ve', 'few', 'say', 'next', "'d", 'a
fter', 'those', 'ten', 'only', 'more', 'thereafter', 'she', '’ve', 'nine', 'much',
'and', 'done', 'this', 'unless', '‘s', 'across', 'bottom', 'hence', 'another', 'th
ence', '‘m', 'ourselves', 'out', 'take', 'due', 'very', 'without', 're', 'which',
'now', 'should', 'meanwhile', 'moreover', 'mine', 'its', 'who', 'will', 'yet', 'fo
r', 'various', 'seeming', 'the', 'everything', 'did', 'back', 'somewhere', 'anywa
y', 'already', 'into', 'same', 'you', 'five', 'onto', 'twenty', 'whom', 'doing',
'former', 'elsewhere', 'three', 'eight', 'thru', 'over', 'anything', 'below', 'tog
ether', 'ours', 'where', 'sometimes', 'rather', 'such', 'thus', 'thereby', 'althou
gh', 'move', 'sixty', 'anywhere', 'latterly', 'me', 'otherwise', 'latter', 'n‘t',
'namely', 'each', 'my', 'becomes', 'n’t', 'else', 'or', 'within', 'myself', 'per',
'might', 'top', 'so', 'really', 'nothing', 'cannot', 'every', 'noone', 'during',
'would', 'that', 'when', 'her', 'itself', 'an', 'though', 'became', 'further', 'he
rein', 'serious', 'even', 'were', 'whatever', 'in', 'once', 'hundred', 'does', 'so
metime', 'last', 'whoever', 'whether', 'least', 'both', 'their', 'are', 'formerl
y', '’d', '‘re', 'wherever', 'do', 'go', 'therefore', 'be', 'nobody', 'to', 'besid
es', 'hereupon', 'neither', "'s"]

Drop STOP_WORDS

In [13]: for token in doc:

if token.is_stop==False:
print(token)

movie
bad
.
worst
watch
week
ago
.

Lemmatization - finding the base word of an existing word in the dataset

In [14]: for lem in doc:

print(lem.text,lem.lemma_)

3/9
3/24/24, 9:22 PM IMDB_Sentiment
This this
movie movie
is be
very very
bad bad
. .
This this
is be
worst bad
than than
the the
one one
I I
watch watch
a a
week week
ago ago
. .

Tagging each word in the text with a Parts-Of-Speech tag

In [15]: pos_list=[]
for token in doc:
print(token.text,token.pos_,spacy.explain(token.pos_))

This DET determiner

movie NOUN noun
is AUX auxiliary
very ADV adverb
bad ADJ adjective
. PUNCT punctuation
This PRON pronoun
is AUX auxiliary
worst ADJ adjective
than ADP adposition
the DET determiner
one NOUN noun
I PRON pronoun
watch VERB verb
a DET determiner
week NOUN noun
ago ADV adverb
. PUNCT punctuation

This code snippet : displacy.render(doc) performs the following functionality - This

command renders the processed doc using spaCy's built-in visualization tool called
displacy . The displacy.render() function takes the processed doc as input and
generates a visualization of the analyzed text. The visualization typically includes the original
text with annotations such as part-of-speech tags, named entities, and syntactic
dependencies, displayed in an interactive and visually appealing format.

In [16]: doc=nlp(text)
displacy.render(doc)

4/9
3/24/24, 9:22 PM IMDB_Sentiment

e u

This movie is very

DET NOUN AUX ADV

In [17]: doc=nlp(text)
displacy.render(doc,style='ent')

This movie is very bad. This is worst than the one I watch a week ago DATE .

Importing Vectorizer , Pipeline , Train Test Split and Metrics .

In [18]: from sklearn.feature_extraction.text import TfidfVectorizer

from sklearn.pipeline import Pipeline
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report,confusion_matrix
import pandas as pd

Reading IMDB Dataset with 50,000 records and mapping positive and negative values
to 1 and 0 respectively.

In [19]: df=pd.read_csv('IMDB Dataset.csv')

df['sentiment'] = df['sentiment'].map({'positive': 1, 'negative': 0})
df

5/9
3/24/24, 9:22 PM IMDB_Sentiment

Out[19]: review sentiment

0 One of the other reviewers has mentioned that ... 1

1 A wonderful little production. <br /><br />The... 1

2 I thought this was a wonderful way to spend ti... 1

3 Basically there's a family where a little boy ... 0

4 Petter Mattei's "Love in the Time of Money" is... 1

... ... ...

49995 I thought this movie did a down right good job... 1

49996 Bad plot, bad dialogue, bad acting, idiotic di... 0

49997 I am a Catholic taught in parochial elementary... 0

49998 I'm going to have to disagree with the previou... 0

49999 No one expects the Star Trek movies to be high... 0

50000 rows × 2 columns

In [20]: column_names=['Reviews','Sentiments']
df.columns=column_names
df

Out[20]: Reviews Sentiments

0 One of the other reviewers has mentioned that ... 1

1 A wonderful little production. <br /><br />The... 1

2 I thought this was a wonderful way to spend ti... 1

3 Basically there's a family where a little boy ... 0

4 Petter Mattei's "Love in the Time of Money" is... 1

... ... ...

49995 I thought this movie did a down right good job... 1

49996 Bad plot, bad dialogue, bad acting, idiotic di... 0

49997 I am a Catholic taught in parochial elementary... 0

49998 I'm going to have to disagree with the previou... 0

49999 No one expects the Star Trek movies to be high... 0

50000 rows × 2 columns

In [22]: df.shape
df['Sentiments'].value_counts()

Sentiments
Out[22]:
1 25000
0 25000
Name: count, dtype: int64

In [23]: import string

Cleaning text data by removing punctuations, stop words, etc

6/9
3/24/24, 9:22 PM IMDB_Sentiment

In [24]: puct=string.punctuation
puct
def text_data_cleaning(sentence):
doc=nlp(sentence)
tokens=[]
for token in doc:
if token.lemma_ != "-PRON-":
temp=token.lemma_.lower().strip()
else :
temp=token.lower_
tokens.append(temp)
cleaned_tokens=[]
for token in tokens:
if token not in stopwords and token not in puct:
cleaned_tokens.append(token)
return cleaned_tokens

In [25]: text_data_cleaning("hello how are you. i am fine.")

['hello', 'fine']
Out[25]:

Importing LinearSCV

In [22]: from sklearn.svm import LinearSVC

In [23]: tfidf=TfidfVectorizer(tokenizer=text_data_cleaning)
classifier=LinearSVC()

Declaring Feature column and Target column

In [24]: X=df['Reviews']
y=df['Sentiments']

Splitting data into training and testing

Training - 70% Testing - 30% random_state=32

In [25]: X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=32)

Fitting the model to the data

In [26]: clf=Pipeline([('tfidf',tfidf),('clf',classifier)])
clf.fit(X_train,y_train)

C:\Users\kalle\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2k
fra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\feature_extractio
n\text.py:525: UserWarning: The parameter 'token_pattern' will not be used since
'tokenizer' is not None'
warnings.warn(
C:\Users\kalle\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2k
fra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_classes.py:3
1: FutureWarning: The default value of `dual` will change from `True` to `'auto'`
in 1.5. Set the value of `dual` explicitly to suppress the warning.
warnings.warn(

7/9
3/24/24, 9:22 PM IMDB_Sentiment

Out[26]: ▸ Pipeline i ?

▸ TfidfVectorizer ?

▸ LinearSVC ?

Predicting the review sentiment

In [27]: y_pred=clf.predict(X_test)

In [28]: print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.90 0.88 0.89 7385

1 0.89 0.90 0.89 7615

accuracy 0.89 15000

macro avg 0.89 0.89 0.89 15000
weighted avg 0.89 0.89 0.89 15000

In [29]: confusion_matrix(y_test,y_pred)

array([[6521, 864],
Out[29]:
[ 760, 6855]], dtype=int64)

Take custom input review from end user to classify it as Positive or

Negative

In [42]: review_input = []
print("Enter the number of reviews to input:")
n = int(input())

for i in range(n):
print("Please enter your review:")
print("\n")
x = input()
print(x)
review_input.append(x)

# print(review_input)

predictions = clf.predict(review_input)

# Print predictions along with reviews

for review, prediction in zip(review_input, predictions):
if prediction == 0:
print("\n")
print(review)
print("===> Negative")
else:
print("\n")
print(review)
print("===> Positive")

8/9
3/24/24, 9:22 PM IMDB_Sentiment
Enter the number of reviews to input:
Please enter your review:

This movie does a great job of explaining the problems that we faced and the fears
that we had before we put man into space. As a history of space flight, it is stil
l used today in classrooms that can get one of the rare prints of it. Disney has s
hown it on "Vault Disney" and I wish they would do so again.
Please enter your review:

I'll not comment a lot, what's to??? Stereotype characters, absolute ignorance abo
ut Colombia's reality, awful mise en scene, poor color choice, NOT funny (it suppo
sed to be a comedy and they expect that you will laugh because some distend music
it's beside the nonsense scenes), Very poor actors direction (if you see somewhere
those people, I mean the interpreters, you'll know they are at least good, but see
ing this so call film, it is impossible to guess it), you get tired of the musi
c... this "comedy" has no rhythm, the only good rhythm in it, it's the rap sing in
the final credits....pathetic, doesn't it? etc...etc... It has been a long time I
haven't seen a movie so bad!!
Please enter your review:

If you really really REALLY enjoy movies featuring ants building dirt-mirrors, eat
ing non-ants, and conquering the world with a voice-over narrative, then this is t
he movie for you.

If you really really REALLY enjoy movies featuring ants building dirt-mirrors, eat
ing non-ants, and conquering the world with a voice-over narrative, then this is t
he movie for you.
===> Negative

9/9

NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
0% (1)
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
22 pages
PDS - Proj - Report-2 RISHI B VATSAL P ANISHA M
No ratings yet
PDS - Proj - Report-2 RISHI B VATSAL P ANISHA M
49 pages
Combine PDF
No ratings yet
Combine PDF
124 pages
Satish Deep Learning Lab MAnual
No ratings yet
Satish Deep Learning Lab MAnual
85 pages
Case Study NLP
No ratings yet
Case Study NLP
4 pages
Sentiment Analysis of IMDb Movie Reviews Using LSTM
No ratings yet
Sentiment Analysis of IMDb Movie Reviews Using LSTM
4 pages
DS - Lab Report.
No ratings yet
DS - Lab Report.
25 pages
Practical 2
No ratings yet
Practical 2
4 pages
Sentiment Analysis Using Text Mining PDF
100% (1)
Sentiment Analysis Using Text Mining PDF
12 pages
Cse-564 (Final Viva Voce
No ratings yet
Cse-564 (Final Viva Voce
32 pages
MLT 09
No ratings yet
MLT 09
3 pages
Sentiment Analysis of Imdb Movie Reviews: A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
No ratings yet
Sentiment Analysis of Imdb Movie Reviews: A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
7 pages
Final Presentation
No ratings yet
Final Presentation
18 pages
PES1PG24CS018 Debjit DLTP Assignment-2 Sentiment Analysis Report
No ratings yet
PES1PG24CS018 Debjit DLTP Assignment-2 Sentiment Analysis Report
8 pages
Conference Template A4
No ratings yet
Conference Template A4
10 pages
Final Sentiment Classification
No ratings yet
Final Sentiment Classification
16 pages
Sentiment Analysis of Movie Reviews Using Machine Learning: Members
No ratings yet
Sentiment Analysis of Movie Reviews Using Machine Learning: Members
17 pages
Sentiment Analysis IMDB Review - Presentation
No ratings yet
Sentiment Analysis IMDB Review - Presentation
19 pages
Unit 3 4
No ratings yet
Unit 3 4
6 pages
MN10
No ratings yet
MN10
13 pages
Sentiment Analysis Using LSTM
No ratings yet
Sentiment Analysis Using LSTM
2 pages
Document From Atharva
No ratings yet
Document From Atharva
8 pages
Research Paper Text Classification
No ratings yet
Research Paper Text Classification
17 pages
MN2
No ratings yet
MN2
17 pages
Plagarism Report
No ratings yet
Plagarism Report
12 pages
Group 4 MovieReview
No ratings yet
Group 4 MovieReview
10 pages
NLP Project (Documentation)
No ratings yet
NLP Project (Documentation)
8 pages
Synopsis
No ratings yet
Synopsis
10 pages
Sequence Classification Movie Reviews Paper Submission
No ratings yet
Sequence Classification Movie Reviews Paper Submission
8 pages
An Enhanced Sentiment Analysis Using Machine Learning Methods in Imbalanced Movie Review Streams
No ratings yet
An Enhanced Sentiment Analysis Using Machine Learning Methods in Imbalanced Movie Review Streams
6 pages
Sentiment Analysis of IMDb Movie Reviews
No ratings yet
Sentiment Analysis of IMDb Movie Reviews
6 pages
F13 Final
No ratings yet
F13 Final
23 pages
OKE JUGA - Sentiment Analysis of IMDb Movie Reviews Using Long Short-Term Memory
No ratings yet
OKE JUGA - Sentiment Analysis of IMDb Movie Reviews Using Long Short-Term Memory
4 pages
Leveraging Natural Language Processing and Machine Learning For Enhanced Content Rating
No ratings yet
Leveraging Natural Language Processing and Machine Learning For Enhanced Content Rating
8 pages
431 Paper
No ratings yet
431 Paper
5 pages
Paper Id - ICCCAI25 - 188
No ratings yet
Paper Id - ICCCAI25 - 188
8 pages
Bhatt Pds Print - 77-85
No ratings yet
Bhatt Pds Print - 77-85
9 pages
Building An AI Model Capable of Judging User Sentiments
No ratings yet
Building An AI Model Capable of Judging User Sentiments
2 pages
Sentiment Analysis of Talaash Movie Reviews Using Text Mining Approach
No ratings yet
Sentiment Analysis of Talaash Movie Reviews Using Text Mining Approach
9 pages
Sentiment Analysis From Movie Reviews Us
No ratings yet
Sentiment Analysis From Movie Reviews Us
5 pages
Iscs 476
No ratings yet
Iscs 476
18 pages
FALLSEM2024-25 BCSE332P LO VL2024250102168 2024-10-07 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE332P LO VL2024250102168 2024-10-07 Reference-Material-I
18 pages
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
No ratings yet
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
6 pages
Data Science Project
No ratings yet
Data Science Project
24 pages
Detailed Report
No ratings yet
Detailed Report
6 pages
DBMS Case Study Hospital Management System
100% (1)
DBMS Case Study Hospital Management System
31 pages
2 +intelligent+2024+paper+1
No ratings yet
2 +intelligent+2024+paper+1
12 pages
COMP 4650 6490 Assignment 3 2023-v1.1
No ratings yet
COMP 4650 6490 Assignment 3 2023-v1.1
6 pages
Synopsis
No ratings yet
Synopsis
8 pages
ISSS609 Project Proposal Group 7
No ratings yet
ISSS609 Project Proposal Group 7
8 pages
ML Project Report
No ratings yet
ML Project Report
26 pages
PTS Reference Manual V2.2
100% (3)
PTS Reference Manual V2.2
265 pages
MP 1
No ratings yet
MP 1
14 pages
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
No ratings yet
"Sentiment Analysis of Imdb Movie Reviews": A Project Report
27 pages
Polaris Dp30
No ratings yet
Polaris Dp30
74 pages
Deep-Sentiment: Sentiment Analysis Using Ensemble of CNN and Bi-LSTM Models
No ratings yet
Deep-Sentiment: Sentiment Analysis Using Ensemble of CNN and Bi-LSTM Models
6 pages
Maneesha Nidigonda Major Project
No ratings yet
Maneesha Nidigonda Major Project
11 pages
Cikakkiyar Kariya: Salisu Abdulrazak
No ratings yet
Cikakkiyar Kariya: Salisu Abdulrazak
42 pages
Sentiment Analysis Using Recurrent Neural Network
No ratings yet
Sentiment Analysis Using Recurrent Neural Network
7 pages
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
No ratings yet
Sentiment Analysis From H El Reviews: Data Mining For Business Intelligence
13 pages
Khóa Học Python
No ratings yet
Khóa Học Python
391 pages
Cs221 Report
No ratings yet
Cs221 Report
16 pages
Maneesha Nidigonda Verzeo Major Project
No ratings yet
Maneesha Nidigonda Verzeo Major Project
11 pages
How To Program Delphi 3
No ratings yet
How To Program Delphi 3
448 pages
ABAP Data Dictionary
No ratings yet
ABAP Data Dictionary
68 pages
Vidar NDT Pro
No ratings yet
Vidar NDT Pro
3 pages
Terraform Certified
100% (3)
Terraform Certified
121 pages
OOSE Unit-5
No ratings yet
OOSE Unit-5
66 pages
04 Control of Calibrated Equipment
No ratings yet
04 Control of Calibrated Equipment
8 pages
Javascript
No ratings yet
Javascript
158 pages
Knowledge Representation: Facts: Representations of Facts in Some Chosen Formalism
No ratings yet
Knowledge Representation: Facts: Representations of Facts in Some Chosen Formalism
12 pages
Brian Charlot
No ratings yet
Brian Charlot
1 page
M580-2CH User Manual 20230620
No ratings yet
M580-2CH User Manual 20230620
40 pages
CCNA Resume - Template 3
No ratings yet
CCNA Resume - Template 3
3 pages
Manual Ubiquiti U6 LR
No ratings yet
Manual Ubiquiti U6 LR
58 pages
ZSH - NixOS Wiki
No ratings yet
ZSH - NixOS Wiki
3 pages
5.4 Error Handling in File Operations
No ratings yet
5.4 Error Handling in File Operations
10 pages
XLOOKUP
No ratings yet
XLOOKUP
21 pages
Autoencoders
No ratings yet
Autoencoders
20 pages
DB Assignment2report
No ratings yet
DB Assignment2report
4 pages
Minas A4 Prospekt
No ratings yet
Minas A4 Prospekt
32 pages
Umer Ziyad Resume QualityEngineer
No ratings yet
Umer Ziyad Resume QualityEngineer
3 pages
Movie Ticket Booking System
No ratings yet
Movie Ticket Booking System
10 pages
India States in Alphabetical Order - Google Search
No ratings yet
India States in Alphabetical Order - Google Search
1 page
VMware HOL HA DRS
No ratings yet
VMware HOL HA DRS
13 pages
AreaComp2 Users Guide
No ratings yet
AreaComp2 Users Guide
6 pages
Sanity 2
No ratings yet
Sanity 2
11 pages
Dogfooding
No ratings yet
Dogfooding
10 pages
Frequently Asked Questions: Online Employment Application Form (EAF)
No ratings yet
Frequently Asked Questions: Online Employment Application Form (EAF)
5 pages
Evaluation of CFD Codes On A Two-Phase Flow Benchmark Reference Test Case
No ratings yet
Evaluation of CFD Codes On A Two-Phase Flow Benchmark Reference Test Case
4 pages
Application Form: Minimum Term - 4 Years, Maximum Term (2019) - 8 Years
No ratings yet
Application Form: Minimum Term - 4 Years, Maximum Term (2019) - 8 Years
2 pages
7W211 Quick Guide PDF
No ratings yet
7W211 Quick Guide PDF
2 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet

DL Project

Uploaded by

DL Project

Uploaded by

Deep Learning

Academic Year : 2023-2024

Problem Description : The project focuses on sentiment analysis

The problem of sentiment analysis can be modeled as a binary

Algorithm: The algorithm involves preprocessing the text data by

● Preprocessing the Text Data:

● Splitting into Train and Test Sets:

System Test and Evaluation:

● The trained model is evaluated using the test dataset to

● The trained LinearSVC model achieved an accuracy of 89%

● The sentiment analysis model developed in this project

● Explore advanced natural language processing techniques

Overall, the project showcases the application of machine learning

Deep Learning Project - Movie Reviews Sentiment Analysis

Metrics : Precision, Recall, F1-score, Support

In [2]: import spacy

Load English Language from spaCy

Using nlp() over a pre-defined text

Tokenization of the pre-defined text

In [8]: for token in doc:

Imputing Sentencizer for separating sentences.

In [9]: sentencizer = nlp.create_pipe('sentencizer')

In [10]: for sentencizer in doc.sents:

This movie is very bad.

Import STOP_WORDS from spaCy English.

In [11]: from spacy.lang.en.stop_words import STOP_WORDS

In [13]: for token in doc:

Lemmatization - finding the base word of an existing word in the dataset

In [14]: for lem in doc:

Tagging each word in the text with a Parts-Of-Speech tag

This DET determiner

This code snippet : displacy.render(doc) performs the following functionality - This

This movie is very

Importing Vectorizer , Pipeline , Train Test Split and Metrics .

In [18]: from sklearn.feature_extraction.text import TfidfVectorizer

In [19]: df=pd.read_csv('IMDB Dataset.csv')

Out[19]: review sentiment

0 One of the other reviewers has mentioned that ... 1

1 A wonderful little production. <br /><br />The... 1

2 I thought this was a wonderful way to spend ti... 1

3 Basically there's a family where a little boy ... 0

4 Petter Mattei's "Love in the Time of Money" is... 1

... ... ...

49995 I thought this movie did a down right good job... 1

49996 Bad plot, bad dialogue, bad acting, idiotic di... 0

49997 I am a Catholic taught in parochial elementary... 0

49998 I'm going to have to disagree with the previou... 0

49999 No one expects the Star Trek movies to be high... 0

50000 rows × 2 columns

Out[20]: Reviews Sentiments

0 One of the other reviewers has mentioned that ... 1

1 A wonderful little production. <br /><br />The... 1

2 I thought this was a wonderful way to spend ti... 1

3 Basically there's a family where a little boy ... 0

4 Petter Mattei's "Love in the Time of Money" is... 1

... ... ...

49995 I thought this movie did a down right good job... 1

49996 Bad plot, bad dialogue, bad acting, idiotic di... 0

49997 I am a Catholic taught in parochial elementary... 0

49998 I'm going to have to disagree with the previou... 0

49999 No one expects the Star Trek movies to be high... 0

50000 rows × 2 columns

In [23]: import string

Cleaning text data by removing punctuations, stop words, etc

In [25]: text_data_cleaning("hello how are you. i am fine.")

In [22]: from sklearn.svm import LinearSVC

Declaring Feature column and Target column

Splitting data into training and testing

Fitting the model to the data

Predicting the review sentiment

precision recall f1-score support

0 0.90 0.88 0.89 7385

accuracy 0.89 15000

Take custom input review from end user to classify it as Positive or

# Print predictions along with reviews

You might also like