nlp_project(documentation)
nlp_project(documentation)
Session: 2021-2025
Submitted By:
Mahwish Noreen (2021-CS-29)
Supervised By:
Dr. Usman Ghani
Contents
1 Abstract 3
2 Introduction 4
2.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
7 Methodology 6
7.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.3 Feature Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . 6
7.4 Model Training . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7.5 Model Saving . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
7.6 User Interface (UI) . . . . . . . . . . . . . . . . . . . . . . . . . . 7
9 Conclusion 8
10 References 8
1 Abstract
This project focuses on developing a Sentiment Analysis model designed to clas-
sify movie reviews as either positive or negative based solely on their textual
content. Sentiment Analysis, a key application within Natural Language Pro-
cessing (NLP), is increasingly valuable for understanding public opinion across
various domains. It enables businesses, analysts, and researchers to gauge sen-
timents in consumer feedback, social media interactions, and product reviews.
In this project, we train a Naive Bayes machine learning model using a labeled
dataset of movie reviews, achieving an accuracy of 85%. The project also im-
plements a user interface (UI) using the Streamlit framework, and the model
is saved using the Joblib library for efficient deployment. This comprehensive
project covers data preprocessing, feature extraction, model training, evalua-
tion, and deployment in a user-friendly interface.
2 Introduction
Sentiment Analysis is a fundamental task in Natural Language Processing (NLP)
that deals with identifying the sentiment or emotion expressed in textual data.
With the rapid growth of user-generated content on platforms like social me-
dia and review sites, sentiment analysis has become essential for businesses and
analysts to understand customer opinions and feedback.
In this project, the primary dataset used is the IMDB Movie Reviews Dataset,
a widely recognized labeled dataset containing positive and negative movie re-
views. The goal is to develop an effective and accurate model that can auto-
matically classify movie reviews into positive or negative sentiments, aiding in
real-time analysis of public opinion.
2.2 Objectives
The primary objectives of this project include:
7 Methodology
The project workflow follows a structured approach, as outlined below:
• precision: 0.84
• recall: 0.85
• F1-score: 0.844
9 Conclusion
The project successfully developed a sentiment analysis model capable of classi-
fying movie reviews as positive or negative with an accuracy of 85%. The model
was saved using the Joblib library, making it easy to deploy and use in future
applications. Additionally, the Streamlit-based user interface offers a simple
way for users to interact with the model and get real-time predictions.
While the model’s accuracy is satisfactory, further improvements can be
made by experimenting with other machine learning models (e.g., Support Vec-
tor Machines, Logistic Regression) or deep learning approaches (e.g., LSTMs or
Transformers). Additionally, expanding the model to handle multi-class senti-
ment classification or domain-specific reviews could further enhance its utility.
10 References
• IMDB Dataset: https://www.kaggle.com/datasets/lakshmi25npathi/imdb-
dataset-of-50k-movie-reviews/data