0% found this document useful (0 votes)

9 views23 pages

NLP Presentation

Uploaded by

rameshtharu076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views23 pages

NLP Presentation

Uploaded by

rameshtharu076

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 23

Sakshi Goel

Bilingual Sentiment PES1201700148

Suhail Rahman

Analysis PES1201701420
UE17CS333 Project
Submission
ABOUT THE PROJECT
- The main aim of the project is to develop a sentiment analyzer
that can be used on twitter data to classify it as positive or
negative.

- Our project takes care of the challenge of bilingual comments,

where people tweet in two languages, in this case Hindi and
English, in the English Alphabet.

UE17CS333-PROJECT_2020 2
UNIQUENESS AND ANALYSIS
- We created an aggregated model consisting of all the
classifiers used
during the process. The ensemble model created worked to
our advantage
as we saw in the previous slides that it provided one of the
highest
accuracy compared to other classifiers.

- When a sentence is in Hindi, we use Google Translate to

directly
convert it to English. If the sentence consists of a
combination of Hindi and
English, we make use of TextBlob to identify that.
UE17CS333-PROJECT_2020 3
DATASET SOURCE
- The dataset that was used was obtained from “Kaggle”
called the Sentiment140 dataset.

- It contains 1,600,000 tweets extracted using the twitter

API. The tweets have been annotated (0 = negative, 4 =
positive) and they can be used to detect sentiment.

- The two columns that we mainly need are as follows:

- The Label
- The Tweet

UE17CS333-PROJECT_2020 4
DATASET SOURCE
- The format of the Tweet column was not useful and had to
be cleaned and tokenized. We also limited the number of
tweets to 40 thousand.

UE17CS333-PROJECT_2020 5
DATASET PREPROCESSING
- Chose the relevant columns that were required for our
study, which were the tweet and the sentiment associated.

- If there were any emoticons used, we converted them into

their equivalent emotion that they are trying to signify,
while emojis were removed.

- We also expanded some words which were joined together

such as “Can’t” was changed to “Can not”.

UE17CS333-PROJECT_2020 6
DATASET PREPROCESSING
- Removal of numbers, URLs, html tags and symbols, the
“@” symbol followed by the account handle.

- These were all some data cleaning steps that were

important to the study to function effectively. Finally, the
dataset contained the cleaned tweets which we converted
to lowercase for simplicity.

- Certain features, like adjectives, abstract nouns and

adverbs were focused on and the rest of the words were
removed as they did not add any value to the sentiment.
UE17CS333-PROJECT_2020 7
LITERATURE REVIEW - TABLE
1
Papers Title Authors Methodology Used

Paper 1 Machine translation of R. Mahesh, K.Sinha, Makes use a system designed

bi-lingual Hindi-English Anil Thakur specifically to separate out the Hindi
(Hinglish) text and English parts of a word that has
a combination of the two.

Paper 2 Towards Sub-Word Aditya Joshi,Ameya Introduces a constantly learning sub-

Level Compositions for Prabhu Pandurang, word level representation in LSTM
Sentiment Analysis of Manish Shrivatsava and (Subword-LSTM) architecture
Hindi-English Code Vasudeva Varma instead of character-level or word-
Mixed Text level representations.

UE17CS333-PROJECT_2020 8
LITERATURE REVIEW - TABLE
1
Paper 3 A Dataset of Hindi- Aditya Bohra, Deepanshu Makes use of a system created
English Code-Mixed Vijay, Vinay Singh, Syed that classifies a tweet having a
Social Media Text for S. Akhtar and Manish combination of Hindi and English to
Hate Speech Detection Shrivatsava negative or not.
Paper 4 Resource Creation for Sakshi Gupta, Piyush Proposes a method to successfully
Hindi-English Code Bansal and Radhika aggregate data to form a dataset of
Mixed Social Media Text Mamidi words that have a multilingual
characteristic.
Paper 5 Sentiment classification Kumar Ravi and Made use of different combinations
of Hinglish text Vadlamani Ravi of feature selection methods and a
host of classifiers using term
frequency-inverse document
frequency feature representation.

UE17CS333-PROJECT_2020 9
LITERATURE REVIEW - TABLE
2
Papers Accuracy Benefits Drawbacks

Paper 1 90% The strategy described here is equally Elaborate testing is not possible as
applicable to all Indian languages as these languages are used in verbal
these are verb ending languages and communication.
have similar mixture of lexicons as in
case of Hindi.
Paper 2 69.7% Sub-Word LSTM interprets sentiment The lexicon lookup approach didn’t
based on morpheme-like structures and perform well owing to the heavily
the results thus produced are misspelt words in the text, which led to
signiﬁcantly better than baselines. incorrect transliterations.

UE17CS333-PROJECT_2020 10
LITERATURE REVIEW - TABLE
2
Paper 3 71.7% The features used in the classification The corpus was not annotated with
system are character n-grams, word n- part-of-speech tags at word level
grams, punctuations, negation words and which would have yield better results.
hate lexicon which are integrated in the
SVM as the classification system.
Paper 4 89.94% They have used an existing language Have not taken into consideration the
identification system, and improved a sentence-level context for word
normalisation system, achieving a higher disambiguation.
accuracy than the base system.
Paper 5 AUC = Proposed a triumvirate of TF-IDF, GR, and Did not employ sentence parser for
0.8601 RBFNN, which is found as the best considering relation between different
combination for classifying sentiment parts-of-speech of a sentence.
expressed in the Hinglish text.

UE17CS333-PROJECT_2020 11
BLOCK DIAGRAM FOR
IMPLEMENTATION

UE17CS333-PROJECT_2020 12
QUANTITY OF WORK – THE
MAIN CODE MODULES
Sl. No. Code Module Description Status (% completed) Comments
func(test_text) 100% The master module
2. hinglish(test_text) 100% Takes care of text translation
3. text_classify(text) 100% Classifies text using all 8 models
4. hybrid(test_set_formatted) 100% Builds the hybrid model classifier
5. features(test_text) 100% Filters features from the text
6. start(text) 100% Preprocessing module

UE17CS333-PROJECT_2020 13
QUALITY OF WORK –
MILESTONES THAT ARE DONE
AND WORKING
Serial Milestone description Status Comments
no (%
complet
e)
1. Dataset Selection 100% A better dataset can be used.
2. Preprocessing 100% Cleaning done efficiently.
3. Feature Selection 100% Adjectives, Abstract Nouns, Adverbs
4. Choice of Classifiers 100% 7 Classifiers chosen.
5. Building Classifiers 100% Successfully built
6. Training Classifiers 100% Trained on 85% data.
7. Creation of Hybrid Model 100% Voting Based Ensemble Model.
8. Translation Challenge 100% Google Translate Machine, TextBlob
9. Creating a controller module 100% func module combines all functionality. 14
UE17CS333-PROJECT_2020
RESULTS OBTAINED - Accuracy
Comparison of Accuracies Classifier Used Accuracy

Naive Bayes 62.0729

Multinomial Naive Bayes 62.2062

Bernoulli Naive Bayes 62.2062

Accuracy

Logistic Regression 62.2562

SGD 61.2397

SVC Classifier 61.3897

Max Entropy 613897

Hybrid Model 62.2563

Classifier

UE17CS333-PROJECT_2020 15
RESULTS OBTAINED - Confusion
Matrix
For Hybrid Model:

UE17CS333-PROJECT_2020 16
RESULTS OBTAINED - F1 Score
Naive Bayes’
Classifier:

Bernouille’s Naive Bayes’

Classifier:

UE17CS333-PROJECT_2020 17
RESULTS OBTAINED - F1 Score
Multinomial Naive Bayes’
Classifier:

Logistic Regression
Classifier:

UE17CS333-PROJECT_2020 18
RESULTS OBTAINED - F1 Score
Stochastic Gradient
Descent Classifier:

Support Vector Machines

Classifier:

UE17CS333-PROJECT_2020 19
RESULTS OBTAINED - F1 Score
Maximum Entropy
Classifer:

Hybrid Model:

UE17CS333-PROJECT_2020 20
OUR TOP THREE LEARNING IN
THIS PROJECT
1. We were able to get familiar with the usage and
implementation of different classifiers.

2. Understanding which classifiers work when used on a

certain type of data. Learning the advantages and
drawbacks of the used classification models.

3. Getting the opportunity to create an ensemble model to

give us optimal results.

UE17CS333-PROJECT_2020 21
TOP CHALLENGES
UNRESOLVED SO FAR
1. Accuracy for the testing of the models was around 60%,
even after several efforts to increase it.

2. Two separate modules, instead of one, used for translation.

3. Dataset used for training could be a better one.

UE17CS333-PROJECT_2020 22
OUR GOING FORWARD PLAN
(IF ANY)
1. Find a better dataset to work with.

2. Try more complex machine learning models for the

classification of text.

3. Use better translation techniques.

UE17CS333-PROJECT_2020 23

Bai601 NLP
No ratings yet
Bai601 NLP
5 pages
Grade 3 PPT - Q3 - W6 - TIMBRE
No ratings yet
Grade 3 PPT - Q3 - W6 - TIMBRE
47 pages
6th Sem AIML Syllabus 2022 Scheme
No ratings yet
6th Sem AIML Syllabus 2022 Scheme
53 pages
Vita 3d-Master Shade Guide To Use
No ratings yet
Vita 3d-Master Shade Guide To Use
2 pages
Krishna Reddy (Oracle 11g)
No ratings yet
Krishna Reddy (Oracle 11g)
246 pages
European Suzuki Association - Teachers Newsletter 2014
No ratings yet
European Suzuki Association - Teachers Newsletter 2014
12 pages
Al Karama School, Phase 2, Abu Dhabi, UAE Risk Assessment Record Activity
No ratings yet
Al Karama School, Phase 2, Abu Dhabi, UAE Risk Assessment Record Activity
11 pages
Content: Unit 1
No ratings yet
Content: Unit 1
219 pages
NLP Presentation
No ratings yet
NLP Presentation
23 pages
Livestock Farming
100% (1)
Livestock Farming
10 pages
ARC List
No ratings yet
ARC List
4 pages
Mail Merge and Hyperlink
No ratings yet
Mail Merge and Hyperlink
7 pages
The Art of Strategy and Force Planning
No ratings yet
The Art of Strategy and Force Planning
14 pages
Whitley Penn NY Trump Crap
No ratings yet
Whitley Penn NY Trump Crap
10 pages
Abdominal Compartment Syndrome
100% (1)
Abdominal Compartment Syndrome
29 pages
Isp98 Confirming Undertaking
No ratings yet
Isp98 Confirming Undertaking
5 pages
INGLES II Cuadernillo
No ratings yet
INGLES II Cuadernillo
38 pages
How To Draw and Read Line Diagrams Onboard Ships
No ratings yet
How To Draw and Read Line Diagrams Onboard Ships
23 pages
Sentiment Analysis Using NLP
No ratings yet
Sentiment Analysis Using NLP
42 pages
NLP Project Report NLP Project Report
No ratings yet
NLP Project Report NLP Project Report
48 pages
Complete Report
No ratings yet
Complete Report
56 pages
Sentiment Prediction in Hindi and English Language
No ratings yet
Sentiment Prediction in Hindi and English Language
25 pages
Language Model Evaluation in Open-Ended Text Gener
No ratings yet
Language Model Evaluation in Open-Ended Text Gener
70 pages
2022.dravidianlangtech 1.44
No ratings yet
2022.dravidianlangtech 1.44
7 pages
Minor Project Report
No ratings yet
Minor Project Report
29 pages
Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Sentiment Analysis Using Machine Learning Algorithms
23 pages
Majorprojectdoc
No ratings yet
Majorprojectdoc
23 pages
DS - Lab Report.
No ratings yet
DS - Lab Report.
25 pages
Towards Understanding People From Multilingual Societies (Deepanshu Vijay, MS, 201302093)
No ratings yet
Towards Understanding People From Multilingual Societies (Deepanshu Vijay, MS, 201302093)
46 pages
Ca 4 NLP Report - 1
No ratings yet
Ca 4 NLP Report - 1
21 pages
Analysis of Hinglish Content Major Project
No ratings yet
Analysis of Hinglish Content Major Project
13 pages
Report
No ratings yet
Report
12 pages
Wa0002
No ratings yet
Wa0002
21 pages
Thesis - Aru Omarali
No ratings yet
Thesis - Aru Omarali
34 pages
AI Report Shivam
No ratings yet
AI Report Shivam
8 pages
Akshada Tweet Report With Pages Removed
No ratings yet
Akshada Tweet Report With Pages Removed
15 pages
LP Rascel
No ratings yet
LP Rascel
13 pages
Proposal After 12th Changes
No ratings yet
Proposal After 12th Changes
18 pages
5.1 s2.0 S095006182032657X Main
No ratings yet
5.1 s2.0 S095006182032657X Main
15 pages
Manuscript Updated-1
No ratings yet
Manuscript Updated-1
10 pages
Sentiment Analysis of Code-Mixed Languages Leveraging Resource Rich Languages
No ratings yet
Sentiment Analysis of Code-Mixed Languages Leveraging Resource Rich Languages
11 pages
Ecology of Pelagic Marine Animals (OCN627) : Spring 2014
No ratings yet
Ecology of Pelagic Marine Animals (OCN627) : Spring 2014
4 pages
ML Project Report
No ratings yet
ML Project Report
26 pages
Social Media Text Analytics of Malayalam - English Code Mixed Using Deep Learning
No ratings yet
Social Media Text Analytics of Malayalam - English Code Mixed Using Deep Learning
25 pages
Lecture 1 Definitions & Terminologies in Experimental Design
No ratings yet
Lecture 1 Definitions & Terminologies in Experimental Design
11 pages
Course Project and Term Paper Logistics
No ratings yet
Course Project and Term Paper Logistics
7 pages
The Role of Catestatin in Pree
No ratings yet
The Role of Catestatin in Pree
18 pages
Part 1 «Listening»: Содержание ↑ Audioscript ↓
No ratings yet
Part 1 «Listening»: Содержание ↑ Audioscript ↓
7 pages
Session 7
No ratings yet
Session 7
17 pages
Paper 4
No ratings yet
Paper 4
9 pages
Od123134082577368000 2
No ratings yet
Od123134082577368000 2
2 pages
Sentiment Analysis On Tamil Code-Mixed Text Using Bi-Lstm: Pradeep Kumar Abhinav
No ratings yet
Sentiment Analysis On Tamil Code-Mixed Text Using Bi-Lstm: Pradeep Kumar Abhinav
7 pages
CM-Sentence Generation Proposal
No ratings yet
CM-Sentence Generation Proposal
8 pages
NLP Project (Documentation)
No ratings yet
NLP Project (Documentation)
8 pages
CSE4062S21 Group3 Project Delivery7 FinalReport
No ratings yet
CSE4062S21 Group3 Project Delivery7 FinalReport
9 pages
Bon - Gas: Reduced Bore Ball Valve For Fuel Gas
No ratings yet
Bon - Gas: Reduced Bore Ball Valve For Fuel Gas
7 pages
AIDI 1003 Presentation
No ratings yet
AIDI 1003 Presentation
9 pages
Language Detector: Bachelor of Engineering (Sem-VIII)
No ratings yet
Language Detector: Bachelor of Engineering (Sem-VIII)
10 pages
2023 Dravidianlangtech-1 30
No ratings yet
2023 Dravidianlangtech-1 30
6 pages
Overview of The Track On Hasoc-Offensive Language Identification-Dravidiancodemix
No ratings yet
Overview of The Track On Hasoc-Offensive Language Identification-Dravidiancodemix
9 pages
Case Studies 1,2,3
No ratings yet
Case Studies 1,2,3
6 pages
Deep Learning Based Sentiment Analysis For Malayalam, Tamil and Kannada Languages
No ratings yet
Deep Learning Based Sentiment Analysis For Malayalam, Tamil and Kannada Languages
9 pages
2024 Dravidianlangtech-1 21
No ratings yet
2024 Dravidianlangtech-1 21
5 pages
Cbs 350 Chapter 08
No ratings yet
Cbs 350 Chapter 08
18 pages
L2 and L3-Network Classification-Topology
No ratings yet
L2 and L3-Network Classification-Topology
17 pages
COMP 4650 6490 Assignment 3 2023-v1.1
No ratings yet
COMP 4650 6490 Assignment 3 2023-v1.1
6 pages
2023 Dravidianlangtech-1 24
No ratings yet
2023 Dravidianlangtech-1 24
4 pages
BK XXLS400 1-0
No ratings yet
BK XXLS400 1-0
9 pages
Insult Detection in Hindi: Course Project On Artificial Intelligence
No ratings yet
Insult Detection in Hindi: Course Project On Artificial Intelligence
8 pages
Learning Based Approach For Hindi Text S 77957aeb
No ratings yet
Learning Based Approach For Hindi Text S 77957aeb
8 pages
2024 Dravidianlangtech-1 43
No ratings yet
2024 Dravidianlangtech-1 43
5 pages
6 Aimlsyll
No ratings yet
6 Aimlsyll
9 pages
ANT 4468 - Syllabus PDF
No ratings yet
ANT 4468 - Syllabus PDF
5 pages
Twitter Sentiment Analysis System
No ratings yet
Twitter Sentiment Analysis System
5 pages
205 Political Sentiment Analys
No ratings yet
205 Political Sentiment Analys
5 pages
NLP Paper
No ratings yet
NLP Paper
5 pages
INTRODUCTION
No ratings yet
INTRODUCTION
3 pages
Solution
No ratings yet
Solution
3 pages
NLP Exp1
No ratings yet
NLP Exp1
5 pages
Twitter Analysis
No ratings yet
Twitter Analysis
8 pages
NLP-2 - Problem Statement
No ratings yet
NLP-2 - Problem Statement
3 pages
Assignment 4
No ratings yet
Assignment 4
5 pages
Table of Contents
No ratings yet
Table of Contents
7 pages
Project Proposal Machine Learning: Title: Team Members
No ratings yet
Project Proposal Machine Learning: Title: Team Members
2 pages
Snapdragon X POCO F7 KOL Narrative
No ratings yet
Snapdragon X POCO F7 KOL Narrative
6 pages
NLP Previous Sem
No ratings yet
NLP Previous Sem
5 pages
Natural Language Processing (Ue16Cs333) MINI-PROJECT (2019) Sentiment Analysis
No ratings yet
Natural Language Processing (Ue16Cs333) MINI-PROJECT (2019) Sentiment Analysis
2 pages
Call For Applications Adjunct Faculty Member Positions October 24 Ver1
No ratings yet
Call For Applications Adjunct Faculty Member Positions October 24 Ver1
3 pages
On The Extension of Fermat's Theorem To Matrices of Order N: by J. B. Marshall
No ratings yet
On The Extension of Fermat's Theorem To Matrices of Order N: by J. B. Marshall
7 pages
1279-Article Text-5449-1-10-20200212
No ratings yet
1279-Article Text-5449-1-10-20200212
4 pages
Assignment Top Sheet Department of Civil Engineering & Technology
No ratings yet
Assignment Top Sheet Department of Civil Engineering & Technology
3 pages
Butterfly Arrow 500 W Mixer Grinder: Grand Total 1625.00
No ratings yet
Butterfly Arrow 500 W Mixer Grinder: Grand Total 1625.00
1 page
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
From Everand
The Newbie’s Guidebook to ChatGPT: A Beginner's Tutorial: The Newbie’s Guidebook
Timothy King
No ratings yet
Applied HuggingSound for Speech Recognition: The Complete Guide for Developers and Engineers
From Everand
Applied HuggingSound for Speech Recognition: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Python Regular Expressions Explained: A Practical Guide with Examples
From Everand
Python Regular Expressions Explained: A Practical Guide with Examples
William E. Clark
No ratings yet

NLP Presentation

Uploaded by

NLP Presentation

Uploaded by

Sakshi Goel

Bilingual Sentiment PES1201700148

- Our project takes care of the challenge of bilingual comments,

- When a sentence is in Hindi, we use Google Translate to

- It contains 1,600,000 tweets extracted using the twitter

- The two columns that we mainly need are as follows:

- If there were any emoticons used, we converted them into

- We also expanded some words which were joined together

- These were all some data cleaning steps that were

- Certain features, like adjectives, abstract nouns and

Paper 1 Machine translation of R. Mahesh, K.Sinha, Makes use a system designed

Paper 2 Towards Sub-Word Aditya Joshi,Ameya Introduces a constantly learning sub-

Naive Bayes 62.0729

Multinomial Naive Bayes 62.2062

Bernoulli Naive Bayes 62.2062

Logistic Regression 62.2562

SVC Classifier 61.3897

Max Entropy 613897

Hybrid Model 62.2563

Bernouille’s Naive Bayes’

Support Vector Machines

2. Understanding which classifiers work when used on a

3. Getting the opportunity to create an ensemble model to

2. Two separate modules, instead of one, used for translation.

3. Dataset used for training could be a better one.

2. Try more complex machine learning models for the

3. Use better translation techniques.

You might also like