Report Dhruv
Report Dhruv
1
Declaration
I, Yash Goel, a student of Computer Science Engineering, 5th Semester in
University School of Information, Communication & Technology, Dwarka
hereby declare that the work presented in this project report was undertaken
in October 2024 under the mentorship of Mr. Rocky Jagtiani.
The matter embodied in this project report has not been submitted by me or
anybody else to any institution for award of any other degree or diploma
except to University School of Information, Communication & Technology,
for the fulfilment of the requirements for the award of degree of Bachelor of
Technology.
Yash Goel
00816401522
2
Acknowledgement
I would like to take this opportunity to express my sincere gratitude to
Suven Consultants & Technology Pvt Ltd. for providing me with an
internship opportunity at their organization. I am truly grateful for the chance
to gain practical experience and knowledge in my field of study.
Learning here has been a wonderful learning experience for me, and I
have greatly appreciated the support and guidance of my supervisors. I
have also enjoyed getting to know my fellow interns and mentoring them
as part of a team.
I am thankful for the valuable skills and experience that I have gained during
my time at here, and I am confident that they will be of great benefit to me in
my future endeavours.
Yash Goel
00816403221
3
Abstract
Organization Profile
Role
During my tenure at Suven Consultants & Technology Pvt Ltd., I served as
a Intern where I Single- handedly built a sentiment analysis project from
scratch, encompassing new tech stack and brought the project from
inception to completion in a time frame of 1 month.
4
Introduction
About Internship
Throughout my internship, I independently executed a sentiment analysis
project using the IMDb dataset, building a complete machine learning
pipeline from data preprocessing to model evaluation. This journey
showcased a wide range of skills in NLP, data handling, and machine
learning, contributing to a deeper understanding of natural language
processing and its applications in sentiment analysis.
Feature Engineering
To enhance the predictive capability of the models, I carefully engineered
features that could capture the nuances of the text data. This involved using
techniques like bag-of-words and custom features based on the length and
structure of the reviews. These features helped improve the model's ability
to detect sentiment effectively.
Model Selection
I explored different algorithms for sentiment classification, including logistic
regression and a random classifier, to analyze and compare their
performance on the IMDb dataset. Using Scikit-learn, I trained both models
on the dataset, fine-tuning parameters to optimize their predictive accuracy
and efficiency.
Model Evaluation
Finally, I evaluated each model's performance using metrics like accuracy,
precision, recall, gaining insights into each model’s strengths and
limitations. This step allowed me to assess which model was best suited for
sentiment analysis on this dataset, refining the solution for real-world
applications.
5
Tech Stack -
Core of the sentiment analysis is developed in Python, using libraries for
data processing, feature extraction,model building and evaluation.
6
Problem Statement
By utilizing the NLTK library, the project will implement various Natural
Language Processing techniques, such as text normalization and feature
extraction, to enhance the performance of machine learning models. The
analysis will involve training classifiers on a labeled dataset, allowing the
system to learn patterns associated with positive and negative sentimentsve
sentiments.
7
Dataset Description
Basic Statistics of Data:
IMDB review dataset contained four columns: Ratings, Reviews,
Movies, Resenhas
Number of Movies: 149780
Number of Movies: 14205
Attrribue Information:
1. Review: User review in English language
2. Ratings: Rating between 1 to 10
3. Movies: Movie names
4. Resenhas: User review translation in Protuguese language
8
Project Description
9
Code Snippets
10
11
12
13
14
15
16
Result
My journey with the Suven Consultants and Technology & Pvt Ltd. on
sentiment analysis project utilizing the IMDb movie review dataset has
been highly rewarding, significantly enhancing my skills in natural
language processing and machine learning. Over the course of the
project, I successfully progressed through various stages, from data
preprocessing and feature engineering to model selection and
evaluation. This hands-on experience allowed me to apply theoretical
concepts in a practical context, deepening my understanding of
sentiment analysis techniques.
17
Bibliography
1. https://www.kaggle.com
2. https://www.imdb.com/
3. https://www.nltk.org/
4. https://scikit-learn.org/stable/
5. https://pandas.pydata.org/
6. https://towardsdatascience.com
7. https://medium.com
8. https://towardsdatascience.com
9. https://stanfordnlp.github.io/CoreNLP/
18
Certification
19
20
21
22
23
24
25
26
27