0% found this document useful (0 votes)

9 views8 pages

NLP Project (Documentation)

The document outlines a Sentiment Analysis project that aims to classify movie reviews as positive or negative using a Naive Bayes model, achieving an accuracy of 85%. It includes a comprehensive methodology covering data collection, preprocessing, feature extraction, model training, and deployment through a user-friendly interface built with Streamlit. The project addresses existing gaps in sentiment analysis by providing a robust preprocessing pipeline and real-time predictions.

Uploaded by

mahnoorarshad311002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views8 pages

NLP Project (Documentation)

Uploaded by

mahnoorarshad311002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Sentiment Analysis Project

Session: 2021-2025

Submitted By:
Mahwish Noreen (2021-CS-29)

Supervised By:
Dr. Usman Ghani

Department of Computer Science

University of Engineering and Technology, Lahore
Pakistan
Natural Language Processing 2

Contents
1 Abstract 3

2 Introduction 4
2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Scope of the Project 5

4 Research Gaps Addressed 5

4.1 What’s missing? . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4.2 What’s new about it? . . . . . . . . . . . . . . . . . . . . . . . . 5

5 Metrics and Literature Review 5

5.1 Evaluation Metrics: . . . . . . . . . . . . . . . . . . . . . . . . . 5
5.2 Literature Review: . . . . . . . . . . . . . . . . . . . . . . . . . . 5

6 Comparison with Existing Works 6

7 Methodology 6
7.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7.5 Model Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7.6 User Interface (UI) . . . . . . . . . . . . . . . . . . . . . . . . . . 7

8 Results and Discussion 8

9 Conclusion 8

10 References 8

Mahwish Noreen December 2024

Natural Language Processing 3

1 Abstract
This project focuses on developing a Sentiment Analysis model designed to clas-
sify movie reviews as either positive or negative based solely on their textual
content. Sentiment Analysis, a key application within Natural Language Pro-
cessing (NLP), is increasingly valuable for understanding public opinion across
various domains. It enables businesses, analysts, and researchers to gauge sen-
timents in consumer feedback, social media interactions, and product reviews.
In this project, we train a Naive Bayes machine learning model using a labeled
dataset of movie reviews, achieving an accuracy of 85%. The project also im-
plements a user interface (UI) using the Streamlit framework, and the model
is saved using the Joblib library for efficient deployment. This comprehensive
project covers data preprocessing, feature extraction, model training, evalua-
tion, and deployment in a user-friendly interface.

Mahwish Noreen December 2024

Natural Language Processing 4

2 Introduction
Sentiment Analysis is a fundamental task in Natural Language Processing (NLP)
that deals with identifying the sentiment or emotion expressed in textual data.
With the rapid growth of user-generated content on platforms like social me-
dia and review sites, sentiment analysis has become essential for businesses and
analysts to understand customer opinions and feedback.
In this project, the primary dataset used is the IMDB Movie Reviews Dataset,
a widely recognized labeled dataset containing positive and negative movie re-
views. The goal is to develop an effective and accurate model that can auto-
matically classify movie reviews into positive or negative sentiments, aiding in
real-time analysis of public opinion.

2.1 Problem Statement

With the increasing volume of online movie reviews, understanding public sen-
timent towards films has become essential for filmmakers, producers, and mar-
keters. Analyzing the sentiment of movie reviews can provide valuable insights
into audience perception and help in making informed decisions. The challenge
is to automatically classify the sentiment of reviews into categories such as ”pos-
itive” or ”negative” using natural language processing (NLP) techniques.
This project focuses on the development of a Sentiment Analysis model that
can classify movie reviews as either positive or negative, utilizing the IMDB
Movie Reviews Dataset. The objective is to preprocess the data, train an ap-
propriate machine learning model, evaluate its performance, and implement a
user interface for real-time sentiment analysis.

2.2 Objectives
The primary objectives of this project include:

• Developing a sentiment analysis model to classify reviews as positive or

negative.
• Using the Naive Bayes model for sentiment classification, known for its
simplicity and efficiency.

• Achieving an accuracy of 85% on the IMDB dataset.

• Saving the trained model using the Joblib library for future use and easy
deployment.
• Developing an interactive user interface (UI) using Streamlit for real-time
predictions, making the model accessible to users without any technical
knowledge.

Mahwish Noreen December 2024

Natural Language Processing 5

3 Scope of the Project

This project focuses solely on binary classification (positive and negative sen-
timents) of text-based movie reviews. While the project could be expanded
to handle multi-class sentiment analysis or other domains, the current scope is
limited to movie reviews from the IMDB dataset. The project aims to build,
evaluate, and deploy the Naive Bayes model in a user-friendly interface for quick,
real-time predictions.

4 Research Gaps Addressed

4.1 What’s missing?
Many sentiment analysis models lack robust preprocessing pipelines or fail to
provide real-time predictions in a user-friendly format. This project addresses
these gaps by establishing a comprehensive preprocessing pipeline and incorpo-
rating a real-time prediction feature using Streamlit.

4.2 What’s new about it?

A combination of TF-IDF and Naive Bayes has been utilized to develop a
lightweight yet effective model, achieving solid results. Additionally, a user-
friendly web interface has been implemented for real-time sentiment analysis.

5 Metrics and Literature Review

5.1 Evaluation Metrics:
The model was evaluated using the following metrics:

• Accuracy: The percentage of correct predictions.

• Precision: The proportion of correctly identified positive predictions.
• Recall: The proportion of actual positives correctly identified.

• F1-Score: The harmonic mean of precision and recall, balancing both

metrics.

5.2 Literature Review:

While advanced models like LSTM and BERT offer high performance, they
often require significant computational resources. In contrast, this approach
is simpler and achieves competitive results, particularly suitable for real-time
applications or smaller datasets.

Mahwish Noreen December 2024

Natural Language Processing 6

6 Comparison with Existing Works

• What’s different? Existing models may deliver higher accuracy but
at the cost of substantial computational power. This model achieves ap-
proximately 85% accuracy while being faster and more resource-efficient.
Moreover, the integration with Streamlit ensures a seamless and intuitive
user experience, an often-overlooked aspect in similar projects.

7 Methodology
The project workflow follows a structured approach, as outlined below:

7.1 Data Collection

The IMDB Movie Reviews Dataset is used for this project. It consists of 50,000
labeled movie reviews, equally divided between positive and negative sentiments.
This dataset provides the foundation for training and evaluating the model.

7.2 Data Preprocessing

Data preprocessing plays a crucial role in ensuring the quality of the input data
for machine learning models. The following steps were applied to the text data:
• Tokenization: Reviews were split into individual words or tokens. To-
kenization allows the model to understand the structure and meaning of
the text.
• Stopword Removal: Common words such as ”the”, ”and”, and ”is” were
removed, as they do not contribute significantly to sentiment classification.

• Stemming/Lemmatization: Words were reduced to their root forms to

handle variations such as ”running” and ”ran” being treated as ”run.”

7.3 Feature Extraction

The next step involved converting the text data into numerical representations
that can be used by machine learning models. The Term Frequency-Inverse
Document Frequency (TF-IDF) method was employed. This technique helps
identify the most important words in the reviews by considering both the fre-
quency of words within a document and the rarity of words across the entire
dataset. TF-IDF thus emphasizes meaningful words while downplaying com-
monly used ones.

Mahwish Noreen December 2024

Natural Language Processing 7

7.4 Model Training

The Naive Bayes classifier was chosen for training the model due to its simplicity,
speed, and effectiveness in text classification tasks. This probabilistic model
assumes independence between the features (words), which is generally a good
approximation for text data. The model was trained on the preprocessed data
and evaluated using standard metrics such as accuracy, precision, recall, and
F1-score.

7.5 Model Saving

To make the model easy to deploy and use, the trained Naive Bayes model was
saved using the Joblib library. Joblib allows the model to be serialized and
stored, ensuring it can be reloaded without retraining every time it is used.

7.6 User Interface (UI)

To enhance accessibility and usability, a web-based UI was created using the
Streamlit framework. Streamlit enables rapid development of interactive ap-
plications, allowing users to input text (movie reviews) and receive real-time
sentiment predictions from the model. The application was designed to be in-
tuitive and user-friendly, ensuring that non-technical users can interact with it
effectively.

Mahwish Noreen December 2024

Natural Language Processing 8

8 Results and Discussion

The Naive Bayes model achieved an accuracy of 85% on the IMDB dataset,
which is competitive for a text classification task using a simple model. Below
are the results of the evaluation:
The model performed well in identifying both positive and negative senti-
ments, though there are areas for improvement. For example, some reviews
that were sarcastic or ambiguous were classified incorrectly due to the inherent
limitations of the Naive Bayes model. The next steps could involve experiment-
ing with more advanced models or incorporating additional features to improve
accuracy.
• accuracy: 0.85

• precision: 0.84
• recall: 0.85
• F1-score: 0.844

9 Conclusion
The project successfully developed a sentiment analysis model capable of classi-
fying movie reviews as positive or negative with an accuracy of 85%. The model
was saved using the Joblib library, making it easy to deploy and use in future
applications. Additionally, the Streamlit-based user interface offers a simple
way for users to interact with the model and get real-time predictions.
While the model’s accuracy is satisfactory, further improvements can be
made by experimenting with other machine learning models (e.g., Support Vec-
tor Machines, Logistic Regression) or deep learning approaches (e.g., LSTMs or
Transformers). Additionally, expanding the model to handle multi-class senti-
ment classification or domain-specific reviews could further enhance its utility.

10 References
• IMDB Dataset: https://www.kaggle.com/datasets/lakshmi25npathi/imdb-
dataset-of-50k-movie-reviews/data

• Relevant research papers and articles on sentiment analysis, TF-IDF, and

Naive Bayes classifier.
• Streamlit documentation: https://docs.streamlit.io/
• Joblib documentation: https://joblib.readthedocs.io/en/latest/

Mahwish Noreen December 2024

Sentiment Analysis of IMDb Movie Reviews Using LSTM
No ratings yet
Sentiment Analysis of IMDb Movie Reviews Using LSTM
4 pages
Sentiment Analysis
100% (1)
Sentiment Analysis
35 pages
Sentiment Analysis For Social Media
No ratings yet
Sentiment Analysis For Social Media
26 pages
Natural Language Processing For Sentiment Analysis - Ankur Shukla
No ratings yet
Natural Language Processing For Sentiment Analysis - Ankur Shukla
27 pages
Report Dhruv
No ratings yet
Report Dhruv
28 pages
Wa0002
No ratings yet
Wa0002
21 pages
Improvement in Sentiment Analysis of Twitter Texts Using Machine Learning Algorithms
No ratings yet
Improvement in Sentiment Analysis of Twitter Texts Using Machine Learning Algorithms
21 pages
F13 Final
No ratings yet
F13 Final
23 pages
Text Classification Week 6
No ratings yet
Text Classification Week 6
16 pages
Document From Atharva
No ratings yet
Document From Atharva
8 pages
Presentation4INTERNSHIP 2
No ratings yet
Presentation4INTERNSHIP 2
9 pages
Cream and Dark Brown Aesthetic Abstract Corner Project Presentation - 20250702 - 205800 - 0000
No ratings yet
Cream and Dark Brown Aesthetic Abstract Corner Project Presentation - 20250702 - 205800 - 0000
17 pages
431 Paper
No ratings yet
431 Paper
5 pages
MN2
No ratings yet
MN2
17 pages
Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Sentiment Analysis Using Machine Learning Algorithms
23 pages
Aditya, Aditya and Abishek
No ratings yet
Aditya, Aditya and Abishek
15 pages
Panchbhai 2021
No ratings yet
Panchbhai 2021
6 pages
ML 11
No ratings yet
ML 11
13 pages
Research Paper Text Classification
No ratings yet
Research Paper Text Classification
17 pages
Sentiment Analysis of Movie Reviews Using Machine Learning: Members
No ratings yet
Sentiment Analysis of Movie Reviews Using Machine Learning: Members
17 pages
Synopsis
No ratings yet
Synopsis
8 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
14 pages
IMDB Sentiment Analysis
No ratings yet
IMDB Sentiment Analysis
44 pages
Cse-564 (Final Viva Voce
No ratings yet
Cse-564 (Final Viva Voce
32 pages
DL Project
No ratings yet
DL Project
21 pages
Project Review
No ratings yet
Project Review
17 pages
Data Science Project
No ratings yet
Data Science Project
24 pages
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
No ratings yet
Sentiment Analysis of IMDb Movie Reviews A Comparative Study On Performance of Hyperparameter-Tuned Classification Algorithms
6 pages
OKE JUGA - Sentiment Analysis of IMDb Movie Reviews Using Long Short-Term Memory
No ratings yet
OKE JUGA - Sentiment Analysis of IMDb Movie Reviews Using Long Short-Term Memory
4 pages
Machine Learning With Advance Model
No ratings yet
Machine Learning With Advance Model
19 pages
Seminar Report (SA)
No ratings yet
Seminar Report (SA)
24 pages
Thesis - Aru Omarali
No ratings yet
Thesis - Aru Omarali
34 pages
Natural Language Processing (NLP) For Big Data: Text Analysis and Sentiment Mining
No ratings yet
Natural Language Processing (NLP) For Big Data: Text Analysis and Sentiment Mining
22 pages
Presentation 16
No ratings yet
Presentation 16
8 pages
Analyzing Sentiment Using IMDb Dataset
No ratings yet
Analyzing Sentiment Using IMDb Dataset
4 pages
Sentimental Analysis of Web Scapping Data
No ratings yet
Sentimental Analysis of Web Scapping Data
9 pages
AI Report Shivam
No ratings yet
AI Report Shivam
8 pages
ISSS609 Project Proposal Group 7
No ratings yet
ISSS609 Project Proposal Group 7
8 pages
NLPNEW
No ratings yet
NLPNEW
3 pages
Sentiment Analysis Using Recurrent Neural Network
No ratings yet
Sentiment Analysis Using Recurrent Neural Network
7 pages
Building An AI Model Capable of Judging User Sentiments
No ratings yet
Building An AI Model Capable of Judging User Sentiments
2 pages
ML Project Report
No ratings yet
ML Project Report
26 pages
Twitter Analysis
No ratings yet
Twitter Analysis
8 pages
Restaurant Review Production Analysis Using Python
No ratings yet
Restaurant Review Production Analysis Using Python
33 pages
MP 1
No ratings yet
MP 1
14 pages
NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
Design & Operation of Clean Room-1
0% (1)
Design & Operation of Clean Room-1
39 pages
Mukesh Joshiyara FInal
No ratings yet
Mukesh Joshiyara FInal
31 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
9 pages
Synopsis 6th Sem
No ratings yet
Synopsis 6th Sem
5 pages
Sentiment Analysis: A NLP And: 2. Detailed Approach
No ratings yet
Sentiment Analysis: A NLP And: 2. Detailed Approach
6 pages
Minor Project Presentation
No ratings yet
Minor Project Presentation
16 pages
Dupesh
No ratings yet
Dupesh
9 pages
Software Engineering - Documentation 02023
No ratings yet
Software Engineering - Documentation 02023
9 pages
Maintaining and Repairing A Rice Cooker
100% (1)
Maintaining and Repairing A Rice Cooker
24 pages
Shivamani
No ratings yet
Shivamani
63 pages
Shotcrete Testing. When and How?
100% (1)
Shotcrete Testing. When and How?
5 pages
RES Presentation
No ratings yet
RES Presentation
21 pages
Project Review On The Opinion Minin
No ratings yet
Project Review On The Opinion Minin
4 pages
Actualtestdumps Microsoft Az 900 Exam Dumps by Frank 20-10-2022
No ratings yet
Actualtestdumps Microsoft Az 900 Exam Dumps by Frank 20-10-2022
9 pages
Mini Project
No ratings yet
Mini Project
16 pages
CV Reza Iqbal PDF
No ratings yet
CV Reza Iqbal PDF
1 page
Praveen Phase 3
No ratings yet
Praveen Phase 3
6 pages
Python Project Synopsis Sample
No ratings yet
Python Project Synopsis Sample
2 pages
ML QB Odd 2023
No ratings yet
ML QB Odd 2023
23 pages
Fintech Q2 2024 Review - A Report On Key Regulatory and Non-Regulatory Developments
No ratings yet
Fintech Q2 2024 Review - A Report On Key Regulatory and Non-Regulatory Developments
48 pages
21csl581 Angular Js Rrce
No ratings yet
21csl581 Angular Js Rrce
37 pages
Commerce Pro
No ratings yet
Commerce Pro
6 pages
Nitrogen Purging Procedures
No ratings yet
Nitrogen Purging Procedures
14 pages
Certificate of Originality
No ratings yet
Certificate of Originality
8 pages
Result Capcut
No ratings yet
Result Capcut
11 pages
ECD-1000 User Manual V.1.02.09 20141003
No ratings yet
ECD-1000 User Manual V.1.02.09 20141003
133 pages
Acct Statement - XX4590 - 07122024
No ratings yet
Acct Statement - XX4590 - 07122024
9 pages
Drylok Catalog
No ratings yet
Drylok Catalog
28 pages
Bladder Accumulators Standard: 1. Description
No ratings yet
Bladder Accumulators Standard: 1. Description
7 pages
E90 PDC Pinout
100% (1)
E90 PDC Pinout
2 pages
Impotatori Miere
No ratings yet
Impotatori Miere
11 pages
Powerpoint: Presentation
No ratings yet
Powerpoint: Presentation
13 pages
22011-Customer Journey Map PPT Template
No ratings yet
22011-Customer Journey Map PPT Template
1 page
Al Billet Cutting
No ratings yet
Al Billet Cutting
5 pages
Development of Hazard Safety Tips Mobile Application in Local Dialect
No ratings yet
Development of Hazard Safety Tips Mobile Application in Local Dialect
6 pages
Anand - Kumar Resume
No ratings yet
Anand - Kumar Resume
3 pages
Drop Leaf Dining Table in Solid Pine - Seats 2-Emerson: Popular Bundles
No ratings yet
Drop Leaf Dining Table in Solid Pine - Seats 2-Emerson: Popular Bundles
5 pages
Technical Bulletin: Service Action: Engine Control Module Reflash
No ratings yet
Technical Bulletin: Service Action: Engine Control Module Reflash
2 pages
Lesson 3-STS (Midterm)
No ratings yet
Lesson 3-STS (Midterm)
6 pages
Net Com Lab Assigment
No ratings yet
Net Com Lab Assigment
9 pages
Controllers. LC-LCD 108-110pdf
No ratings yet
Controllers. LC-LCD 108-110pdf
2 pages
Interlock Plus Type: Din Standard Connections
No ratings yet
Interlock Plus Type: Din Standard Connections
1 page
Cant Help Falling in Love With You Chords (Ver 2) by Elvis Presley at Ultimate-Guitar
No ratings yet
Cant Help Falling in Love With You Chords (Ver 2) by Elvis Presley at Ultimate-Guitar
2 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet