DL Project
DL Project
Project
On
Sentiment Analysis
Of Movie Reviews
in
IMDB Review Dataset
Prepared by :
V. Jayadeep Hemanth - VU21CSEN0500045
M. Jyothi Swaroop - VU21CSEN0500069
A. Sai Sri Ram - VU21CSEN0500017
Rvs. Aditya – VU21CSEN0500132
Requirements Elicitation:
1. Access to the IMDB dataset with 50,000 movie reviews.
2. Python environment with necessary libraries installed:
spaCy, scikit-learn, pandas, etc.
3. Understanding of text preprocessing techniques such as
tokenization, lemmatization, and stop word removal.
4. Knowledge of machine learning concepts, particularly
support vector machines (SVM) and its implementation using
LinearSVC.
Problem Modeling:
● Evaluation:
○ After training, the model's performance is evaluated
using the testing dataset.
○ Metrics such as accuracy, precision, recall, and F1-score
are calculated to assess how well the model predicts
sentiment labels on unseen data.
○ Confusion matrices may also be analyzed to
understand the model's performance in terms of true
positives, true negatives, false positives, and false
negatives.
1. Mapping ‘positive’ and ‘negative’ sentiments to 1 and 0.
2. Evaluation Metrics
3. Output
1. Results:
2. Conclusion:
3. Future Directions:
1. Framework used :
1. spaCy
2. sklearn
3. Pandas
Machine Learning model used : LinearSVC - Linear Support Vector Machine Classifier
Import spaCy
Import displacy for displaying word dependecies
In [3]: nlp=spacy.load('en_core_web_sm')
In [7]: text="This movie is very bad. This is worst than the one I watch a week ago."
doc=nlp(text)
doc
This movie is very bad. This is worst than the one I watch a week ago.
Out[7]:
<spacy.pipeline.sentencizer.Sentencizer at 0x2c0f8856d00>
Out[9]:
In [12]: stopwords=list(STOP_WORDS)
print(stopwords)
2/9
3/24/24, 9:22 PM IMDB_Sentiment
['see', 'whole', 'except', 'a', 'quite', 'them', 'himself', 'hereafter', 'beyond',
'always', 'call', 'was', 'too', 'yourself', 'nevertheless', 'from', 'as', 'using',
'been', 'these', 'show', 'i', "'m", 'perhaps', 'your', 'all', 'whereas', 'have',
'upon', 'along', 'other', 'nor', 'somehow', 'about', 'forty', 'yourselves', 'where
after', 'with', 'since', 'mostly', 'not', '‘ll', 'wherein', 'third', 'name', 'the
n', 'still', 'everyone', 'between', 'something', 'because', 'behind', 'many', 'aga
in', 'seem', 'towards', 'twelve', '‘d', 'whereupon', 'until', 'under', 'had', 'u
p', 'here', '’re', 'indeed', 'what', 'whenever', 'amongst', 'than', 'however', 'th
roughout', 'he', 'we', 'first', 'yours', 'if', 'must', 'regarding', "'re", 'becomi
ng', 'why', 'of', "'ll", 'herself', 'amount', 'others', 'whither', 'enough', 'befo
re', 'thereupon', 'him', 'whereby', 'front', 'but', 'is', 'never', '’m', 'while',
'above', 'could', 'hers', 'am', 'keep', 'side', 'four', 'ever', 'just', 'several',
'no', 'afterwards', 'being', 'may', 'please', 'therein', 'made', 'themselves', 'fi
fteen', 'fifty', 'via', 'how', 'whose', 'by', 'toward', "n't", 'ca', 'give', 'us',
'own', 'full', 'seemed', 'anyone', 'his', '’ll', "'ve", 'well', 'also', 'everywher
e', 'against', 'almost', 'has', 'around', 'hereby', '’s', 'it', 'any', 'among', 'c
an', 'seems', 'they', 'through', 'anyhow', 'at', 'beforehand', 'become', 'two', 'l
ess', 'on', 'either', 'beside', 'put', 'none', 'some', 'down', 'make', 'our', 'of
f', 'six', 'most', 'part', 'eleven', 'empty', 'get', 'one', 'often', 'used', 'some
one', 'whence', 'nowhere', 'alone', 'there', '‘ve', 'few', 'say', 'next', "'d", 'a
fter', 'those', 'ten', 'only', 'more', 'thereafter', 'she', '’ve', 'nine', 'much',
'and', 'done', 'this', 'unless', '‘s', 'across', 'bottom', 'hence', 'another', 'th
ence', '‘m', 'ourselves', 'out', 'take', 'due', 'very', 'without', 're', 'which',
'now', 'should', 'meanwhile', 'moreover', 'mine', 'its', 'who', 'will', 'yet', 'fo
r', 'various', 'seeming', 'the', 'everything', 'did', 'back', 'somewhere', 'anywa
y', 'already', 'into', 'same', 'you', 'five', 'onto', 'twenty', 'whom', 'doing',
'former', 'elsewhere', 'three', 'eight', 'thru', 'over', 'anything', 'below', 'tog
ether', 'ours', 'where', 'sometimes', 'rather', 'such', 'thus', 'thereby', 'althou
gh', 'move', 'sixty', 'anywhere', 'latterly', 'me', 'otherwise', 'latter', 'n‘t',
'namely', 'each', 'my', 'becomes', 'n’t', 'else', 'or', 'within', 'myself', 'per',
'might', 'top', 'so', 'really', 'nothing', 'cannot', 'every', 'noone', 'during',
'would', 'that', 'when', 'her', 'itself', 'an', 'though', 'became', 'further', 'he
rein', 'serious', 'even', 'were', 'whatever', 'in', 'once', 'hundred', 'does', 'so
metime', 'last', 'whoever', 'whether', 'least', 'both', 'their', 'are', 'formerl
y', '’d', '‘re', 'wherever', 'do', 'go', 'therefore', 'be', 'nobody', 'to', 'besid
es', 'hereupon', 'neither', "'s"]
Drop STOP_WORDS
movie
bad
.
worst
watch
week
ago
.
3/9
3/24/24, 9:22 PM IMDB_Sentiment
This this
movie movie
is be
very very
bad bad
. .
This this
is be
worst bad
than than
the the
one one
I I
watch watch
a a
week week
ago ago
. .
In [15]: pos_list=[]
for token in doc:
print(token.text,token.pos_,spacy.explain(token.pos_))
In [16]: doc=nlp(text)
displacy.render(doc)
4/9
3/24/24, 9:22 PM IMDB_Sentiment
e u
In [17]: doc=nlp(text)
displacy.render(doc,style='ent')
This movie is very bad. This is worst than the one I watch a week ago DATE .
Reading IMDB Dataset with 50,000 records and mapping positive and negative values
to 1 and 0 respectively.
5/9
3/24/24, 9:22 PM IMDB_Sentiment
In [20]: column_names=['Reviews','Sentiments']
df.columns=column_names
df
In [22]: df.shape
df['Sentiments'].value_counts()
Sentiments
Out[22]:
1 25000
0 25000
Name: count, dtype: int64
In [24]: puct=string.punctuation
puct
def text_data_cleaning(sentence):
doc=nlp(sentence)
tokens=[]
for token in doc:
if token.lemma_ != "-PRON-":
temp=token.lemma_.lower().strip()
else :
temp=token.lower_
tokens.append(temp)
cleaned_tokens=[]
for token in tokens:
if token not in stopwords and token not in puct:
cleaned_tokens.append(token)
return cleaned_tokens
['hello', 'fine']
Out[25]:
Importing LinearSCV
In [23]: tfidf=TfidfVectorizer(tokenizer=text_data_cleaning)
classifier=LinearSVC()
In [24]: X=df['Reviews']
y=df['Sentiments']
In [25]: X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=32)
In [26]: clf=Pipeline([('tfidf',tfidf),('clf',classifier)])
clf.fit(X_train,y_train)
C:\Users\kalle\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2k
fra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\feature_extractio
n\text.py:525: UserWarning: The parameter 'token_pattern' will not be used since
'tokenizer' is not None'
warnings.warn(
C:\Users\kalle\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.10_qbz5n2k
fra8p0\LocalCache\local-packages\Python310\site-packages\sklearn\svm\_classes.py:3
1: FutureWarning: The default value of `dual` will change from `True` to `'auto'`
in 1.5. Set the value of `dual` explicitly to suppress the warning.
warnings.warn(
7/9
3/24/24, 9:22 PM IMDB_Sentiment
Out[26]: ▸ Pipeline i ?
▸ TfidfVectorizer ?
▸ LinearSVC ?
In [27]: y_pred=clf.predict(X_test)
In [28]: print(classification_report(y_test,y_pred))
In [29]: confusion_matrix(y_test,y_pred)
array([[6521, 864],
Out[29]:
[ 760, 6855]], dtype=int64)
In [42]: review_input = []
print("Enter the number of reviews to input:")
n = int(input())
for i in range(n):
print("Please enter your review:")
print("\n")
x = input()
print(x)
review_input.append(x)
# print(review_input)
predictions = clf.predict(review_input)
8/9
3/24/24, 9:22 PM IMDB_Sentiment
Enter the number of reviews to input:
Please enter your review:
This movie does a great job of explaining the problems that we faced and the fears
that we had before we put man into space. As a history of space flight, it is stil
l used today in classrooms that can get one of the rare prints of it. Disney has s
hown it on "Vault Disney" and I wish they would do so again.
Please enter your review:
I'll not comment a lot, what's to??? Stereotype characters, absolute ignorance abo
ut Colombia's reality, awful mise en scene, poor color choice, NOT funny (it suppo
sed to be a comedy and they expect that you will laugh because some distend music
it's beside the nonsense scenes), Very poor actors direction (if you see somewhere
those people, I mean the interpreters, you'll know they are at least good, but see
ing this so call film, it is impossible to guess it), you get tired of the musi
c... this "comedy" has no rhythm, the only good rhythm in it, it's the rap sing in
the final credits....pathetic, doesn't it? etc...etc... It has been a long time I
haven't seen a movie so bad!!
Please enter your review:
If you really really REALLY enjoy movies featuring ants building dirt-mirrors, eat
ing non-ants, and conquering the world with a voice-over narrative, then this is t
he movie for you.
This movie does a great job of explaining the problems that we faced and the fears
that we had before we put man into space. As a history of space flight, it is stil
l used today in classrooms that can get one of the rare prints of it. Disney has s
hown it on "Vault Disney" and I wish they would do so again.
===> Positive
I'll not comment a lot, what's to??? Stereotype characters, absolute ignorance abo
ut Colombia's reality, awful mise en scene, poor color choice, NOT funny (it suppo
sed to be a comedy and they expect that you will laugh because some distend music
it's beside the nonsense scenes), Very poor actors direction (if you see somewhere
those people, I mean the interpreters, you'll know they are at least good, but see
ing this so call film, it is impossible to guess it), you get tired of the musi
c... this "comedy" has no rhythm, the only good rhythm in it, it's the rap sing in
the final credits....pathetic, doesn't it? etc...etc... It has been a long time I
haven't seen a movie so bad!!
===> Negative
If you really really REALLY enjoy movies featuring ants building dirt-mirrors, eat
ing non-ants, and conquering the world with a voice-over narrative, then this is t
he movie for you.
===> Negative
9/9