0% found this document useful (0 votes)

18 views

Chapter 4

Uploaded by

Ramdhan Firdaus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Chapter 4

Uploaded by

Ramdhan Firdaus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Let's predict the

sentiment!
S E N T I M E N T A N A LY S I S I N P Y T H O N

Violeta Misheva
Data Scientist
Classification problems
Product and movie reviews: positive or negative sentiment (binary classi cation)

Tweets about airline companies: positive, neutral and negative (multi-class classi cation)

SENTIMENT ANALYSIS IN PYTHON

Linear and logistic regressions

SENTIMENT ANALYSIS IN PYTHON

Logistic function
Linear regression: numeric outcome

Logistic regression: probability:

P robability(sentiment = positive∣review)

SENTIMENT ANALYSIS IN PYTHON

Logistic regression in Python
from sklearn.linear_model import LogisticRegression

log_reg = LogisticRegression().fit(X, y)

SENTIMENT ANALYSIS IN PYTHON

Measuring model performance
Accuracy: Fraction of predictions our model got right.

The higher and closer the accuracy is to 1, the be er

# Accuracy using score

score = log_reg.score(X, y)
print(score)

0.9009

SENTIMENT ANALYSIS IN PYTHON

Using accuracy score
# Accuracy using accuracy_score
from sklearn.metrics import accuracy_score

y_predicted = log_reg.predict(X)
acurracy = accuracy_score(y, y_predicted)

0.9009

SENTIMENT ANALYSIS IN PYTHON

Let's practice!
S E N T I M E N T A N A LY S I S I N P Y T H O N
Did we really predict
the sentiment well?
S E N T I M E N T A N A LY S I S I N P Y T H O N

Violeta Misheva
Data Scientist
Train/test split

Training set: used to train the model (70-80% of the whole data)

Testing set: used to evaluate the performance of the model

SENTIMENT ANALYSIS IN PYTHON

Train/test in Python
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123, stratify=y)

X : features

y : labels

test_size: proportion of data used in testing

random_state: seed generator used to make the split

stratify: proportion of classes in the sample produced will be the same as the proportion of
values provided to this parameter

SENTIMENT ANALYSIS IN PYTHON

Logistic regression with train/test split
log_reg = LogisticRegression().fit(X_train, y_train)

print('Accuracy on training data: ', log_reg.score(X_train, y_train))

0.76

print('Accuracy on testing data: ', log_reg.score(X_test, y_test))

0.73

SENTIMENT ANALYSIS IN PYTHON

Accuracy score with train/test split
from sklearn.metrics import accuracy_score

log_reg = LogisticRegression().fit(X_train, y_train)

y_predicted = log_reg.predict(X_test)
print('Accuracy score on test data: ', accuracy_score(y_test, y_predicted))

0.73

SENTIMENT ANALYSIS IN PYTHON

Confusion matrix

SENTIMENT ANALYSIS IN PYTHON

Confusion matrix in Python
from sklearn.metrics import confusion_matrix

log_reg = LogisticRegression().fit(X_train, y_train)

y_predicted = log_reg.predict(X_test)

print(confusion_matrix(y_test, y_predicted)/len(y_test))

[[0.3788 0.1224]
[0.1352 0.3636]]

SENTIMENT ANALYSIS IN PYTHON

Let's practice!
S E N T I M E N T A N A LY S I S I N P Y T H O N
Logistic regression:
revisted
S E N T I M E N T A N A LY S I S I N P Y T H O N

Violeta Misheva
Data Scientist
Complex models and regularization
Complex models:
Complex model that captures the noise in the data (over ing)

Having a large number of features or parameters

Regularization:
A way to simplify and ensure we have a less complex model

SENTIMENT ANALYSIS IN PYTHON

Regularization in a logistic regression
from sklearn.linear_model import LogisticRegression

# Regularization arguments
LogisticRegression(penalty='l2', C=1.0)

L2: shrinks all coe cients towards zero

High values of C: low penalization, model ts the training data well.

Low values of C: high penalization, model less exible.

SENTIMENT ANALYSIS IN PYTHON

Predicting a probability vs. predicting a class
log_reg = LogisticRegression().fit(X_train, y_train)

# Predict labels
y_predicted = log_reg.predict(X_test)

# Predict probability
y_probab = log_reg.predict_proba(X_test)

SENTIMENT ANALYSIS IN PYTHON

Predicting a probability vs. predicting a class
y_probab
array([[0.5002245, 0.4997755],
[0.4900345, 0.5099655],
...,
[0.7040499, 0.2959501]])

# Select the probabilities of class 1

y_probab = log_reg.predict_proba(X_test)[:, 1]

array([0.4997755, 0.5099655 ..., 0.2959501]])

SENTIMENT ANALYSIS IN PYTHON

Model metrics with predicted probabilities
Raise ValueError when applied with probabilities.

Accuracy score and confusion matrix work with classes.

# Default probability encoding:

# If probability >= 0.5, then class 1 Else class 0

SENTIMENT ANALYSIS IN PYTHON

Let's practice!
S E N T I M E N T A N A LY S I S I N P Y T H O N
Bringing it all
together
S E N T I M E N T A N A LY S I S I N P Y T H O N

Violeta Misheva
Data Scientist
The Sentiment Analysis problem
Sentiment analysis as the process of understanding the opinion of an author about a
subject

Movie reviews

Amazon product reviews

Twi er airline sentiment

Various emotionally charged literary examples

SENTIMENT ANALYSIS IN PYTHON

Exploration of the reviews
Basic information about size of reviews

Word clouds

Features for the length of reviews: number of words, number of sentences

Feature detecting the language of a review

SENTIMENT ANALYSIS IN PYTHON

Numeric transformations of sentiment-carrying
columns
Bag-of-words

TfIdf vectorization

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

# Vectorizer syntax
vect = CountVectorizer().fit(data.text_column)
X = vect.transform(data.text_column)

SENTIMENT ANALYSIS IN PYTHON

Arguments of the vectorizers
stop words: non-informative, frequently occurring words

n-gram range: use phrases not only single words

control size of vocabulary: max_features, max_df, min_df

capturing a pa ern of tokens: remove digits or certain characters

Important but NOT arguments to the vectorizers

lemmas and stems

SENTIMENT ANALYSIS IN PYTHON

Supervised learning model
Logistic regression classi er to predict the sentiment

Evaluated with accuracy and confusion matrix

Importance of train/test split

SENTIMENT ANALYSIS IN PYTHON

Let's practice!
S E N T I M E N T A N A LY S I S I N P Y T H O N
Wrap up
S E N T I M E N T A N A LY S I S I N P Y T H O N

Violeta Misheva
Data Scientist
The Sentiment Analysis world

SENTIMENT ANALYSIS IN PYTHON

Sentiment analysis types

SENTIMENT ANALYSIS IN PYTHON

The automated sentiment analysis system

SENTIMENT ANALYSIS IN PYTHON

Congratulations!
S E N T I M E N T A N A LY S I S I N P Y T H O N

Wms 2014 Sma Reach Poster
No ratings yet
Wms 2014 Sma Reach Poster
1 page
Chapter 2
No ratings yet
Chapter 2
34 pages
Ai Project
No ratings yet
Ai Project
15 pages
Sentimental Analysis
No ratings yet
Sentimental Analysis
3 pages
Natural Language Processing Assignment
No ratings yet
Natural Language Processing Assignment
3 pages
Chapter 3
No ratings yet
Chapter 3
28 pages
Chapter 1
No ratings yet
Chapter 1
26 pages
Template For The First Slide of PPT Presentation1
No ratings yet
Template For The First Slide of PPT Presentation1
18 pages
fin_ijprems1714118825
No ratings yet
fin_ijprems1714118825
6 pages
Ppt- Sentiment Analysis Using Machine Learning Algorithms
No ratings yet
Ppt- Sentiment Analysis Using Machine Learning Algorithms
23 pages
Session 7
No ratings yet
Session 7
17 pages
Introduction To Sentiment Analysis PDF
No ratings yet
Introduction To Sentiment Analysis PDF
32 pages
Introduction
No ratings yet
Introduction
27 pages
document-dsbda-codes-for-mini-project
No ratings yet
document-dsbda-codes-for-mini-project
9 pages
IC-RTETM_Final_Sentiment_Analysis
No ratings yet
IC-RTETM_Final_Sentiment_Analysis
13 pages
Applsci 13 04550
No ratings yet
Applsci 13 04550
21 pages
17 Practicals
No ratings yet
17 Practicals
7 pages
Anjali Presentation
No ratings yet
Anjali Presentation
21 pages
Maneesha Nidigonda Major Project
No ratings yet
Maneesha Nidigonda Major Project
11 pages
Report
No ratings yet
Report
12 pages
Emotion AI Driven Sentiment Analysis A S
No ratings yet
Emotion AI Driven Sentiment Analysis A S
27 pages
Importing Packages: Id Label Tweet 0 1 2 3 4
No ratings yet
Importing Packages: Id Label Tweet 0 1 2 3 4
8 pages
MP 1
No ratings yet
MP 1
14 pages
Q 3
No ratings yet
Q 3
2 pages
15 SentimentAnalysis
No ratings yet
15 SentimentAnalysis
17 pages
Logistic Regression Example (1)
No ratings yet
Logistic Regression Example (1)
7 pages
Sentiment Analysis of Social Media with Python _ by Haaya Naushan _ Towards Data Science
No ratings yet
Sentiment Analysis of Social Media with Python _ by Haaya Naushan _ Towards Data Science
9 pages
Maneesha Nidigonda Verzeo Major Project
No ratings yet
Maneesha Nidigonda Verzeo Major Project
11 pages
10 1109@icaccs48705 2020 9074208
No ratings yet
10 1109@icaccs48705 2020 9074208
3 pages
Sentiment Analysis of Twitter
No ratings yet
Sentiment Analysis of Twitter
26 pages
Study of Twitter Sentiment Analysis Using Machine
No ratings yet
Study of Twitter Sentiment Analysis Using Machine
7 pages
ML Sentimentanalysis
No ratings yet
ML Sentimentanalysis
5 pages
A Comprehensive Analysis of Sentiment Analysis Approaches Applications and Classifier Comparisons
No ratings yet
A Comprehensive Analysis of Sentiment Analysis Approaches Applications and Classifier Comparisons
8 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
14 pages
IR Case Study Final Presentation
No ratings yet
IR Case Study Final Presentation
12 pages
Ai ML Microproject
No ratings yet
Ai ML Microproject
5 pages
Sentiment Analysis of Twitter Data My
75% (4)
Sentiment Analysis of Twitter Data My
14 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
14 pages
Sentiment Analysis To Handle Complex Linguistic Structures: A Review On Existing Methodologies
No ratings yet
Sentiment Analysis To Handle Complex Linguistic Structures: A Review On Existing Methodologies
7 pages
Social Media Sentiment
No ratings yet
Social Media Sentiment
8 pages
Twittersentiment
No ratings yet
Twittersentiment
12 pages
Senti bp1
No ratings yet
Senti bp1
2 pages
Sentimental Analysis
No ratings yet
Sentimental Analysis
15 pages
SMA 5
No ratings yet
SMA 5
3 pages
COMP 4650 6490 Assignment 3 2023-v1.1
No ratings yet
COMP 4650 6490 Assignment 3 2023-v1.1
6 pages
LabAssignment 03Ai
No ratings yet
LabAssignment 03Ai
7 pages
Sentiment Analysis On User-Generated Tweets
No ratings yet
Sentiment Analysis On User-Generated Tweets
15 pages
Study on Sentiment Analysis
No ratings yet
Study on Sentiment Analysis
5 pages
Twitter Sentiment Analysis For Product Review
No ratings yet
Twitter Sentiment Analysis For Product Review
19 pages
Twitte Analysis
No ratings yet
Twitte Analysis
53 pages
Sentimental Analysis On Twitter Data Using Naive Bayes: Ijarcce
No ratings yet
Sentimental Analysis On Twitter Data Using Naive Bayes: Ijarcce
4 pages
2024.dravidianlangtech-1.21
No ratings yet
2024.dravidianlangtech-1.21
5 pages
NLP_Sentimental_Analysis__1736351356
No ratings yet
NLP_Sentimental_Analysis__1736351356
32 pages
Pre Processing
No ratings yet
Pre Processing
9 pages
Sentiment Analysis of Tweets Using Machine Learning
No ratings yet
Sentiment Analysis of Tweets Using Machine Learning
22 pages
Sentiments of Public Opinion
No ratings yet
Sentiments of Public Opinion
3 pages
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
No ratings yet
Sentiment Analysis On Twitter Data Using Machine Learning Algorithms in Python
15 pages
6 Project Report Sem6
No ratings yet
6 Project Report Sem6
13 pages
Python 21to30
No ratings yet
Python 21to30
9 pages
Sentiment Analysis
No ratings yet
Sentiment Analysis
3 pages
Exercises of Statistical Inference
From Everand
Exercises of Statistical Inference
Simone Malacrida
No ratings yet
Kesan Amalan Kepemimpinan Pengetua Sekolah Ke Atas Motivasi Dan Efikasi Kendiri Guru Terhadap Pencapaian Aktiviti Kokurikulum Pelajar
No ratings yet
Kesan Amalan Kepemimpinan Pengetua Sekolah Ke Atas Motivasi Dan Efikasi Kendiri Guru Terhadap Pencapaian Aktiviti Kokurikulum Pelajar
17 pages
EJ1136651
No ratings yet
EJ1136651
9 pages
College of Engineering: Appendix 1. Student Internship Information Sheet
No ratings yet
College of Engineering: Appendix 1. Student Internship Information Sheet
16 pages
Vocabulary Unit 8 Language Summary
No ratings yet
Vocabulary Unit 8 Language Summary
2 pages
List of Books - 2024-25 (AU) - Except 3 and 6
No ratings yet
List of Books - 2024-25 (AU) - Except 3 and 6
13 pages
ERP Consulting Track
No ratings yet
ERP Consulting Track
7 pages
Amayrani Nune1
No ratings yet
Amayrani Nune1
2 pages
ECE Brochure
No ratings yet
ECE Brochure
56 pages
Revised Prospectus DVM2021
No ratings yet
Revised Prospectus DVM2021
1 page
Verification of Ohm's Law
No ratings yet
Verification of Ohm's Law
3 pages
Nature of Inquiry & Research
100% (1)
Nature of Inquiry & Research
11 pages
CSIR-CLRI-Junior-Secretariat-Assistant-Paper-II-2018-English
No ratings yet
CSIR-CLRI-Junior-Secretariat-Assistant-Paper-II-2018-English
24 pages
Topic 6 Software Complexity
No ratings yet
Topic 6 Software Complexity
31 pages
Code Mixing Analysis
No ratings yet
Code Mixing Analysis
110 pages
Abm Research
No ratings yet
Abm Research
65 pages
Interviewing-What Is It?
No ratings yet
Interviewing-What Is It?
34 pages
Dit Resume
No ratings yet
Dit Resume
5 pages
Academic Job Description Home Room Teacher
No ratings yet
Academic Job Description Home Room Teacher
3 pages
Synchronous Machine Training
No ratings yet
Synchronous Machine Training
3 pages
Aristotle C. Calical Bansa National High School: (For Classroom Instruction Purposes Only)
No ratings yet
Aristotle C. Calical Bansa National High School: (For Classroom Instruction Purposes Only)
4 pages
Song Scramble 33
No ratings yet
Song Scramble 33
2 pages
My Contribution For Indonesia (LPDP)
No ratings yet
My Contribution For Indonesia (LPDP)
1 page
Confucianism in Korea Ancient and Contemporary
No ratings yet
Confucianism in Korea Ancient and Contemporary
15 pages
9709 s13 QP 32
No ratings yet
9709 s13 QP 32
4 pages
Blattner Dylan Resume
No ratings yet
Blattner Dylan Resume
1 page
Sample Training Outline: Training and Make Note of Them and The Name of The Person Associated With Them
No ratings yet
Sample Training Outline: Training and Make Note of Them and The Name of The Person Associated With Them
5 pages
Mathematics 2018 Stage 4 - Answer Sheet
No ratings yet
Mathematics 2018 Stage 4 - Answer Sheet
4 pages
वक्रतुंड महाकाय कोटिसूर्यसमप्रभ Shri Ganesh Mantra
No ratings yet
वक्रतुंड महाकाय कोटिसूर्यसमप्रभ Shri Ganesh Mantra
2 pages
Postgraduate Mba Master Programs at Ifa Paris 2020
No ratings yet
Postgraduate Mba Master Programs at Ifa Paris 2020
33 pages

Chapter 4

Uploaded by

Chapter 4

Uploaded by

Let's predict the

SENTIMENT ANALYSIS IN PYTHON

SENTIMENT ANALYSIS IN PYTHON

Logistic regression: probability:

SENTIMENT ANALYSIS IN PYTHON

SENTIMENT ANALYSIS IN PYTHON

The higher and closer the accuracy is to 1, the be er

# Accuracy using score

SENTIMENT ANALYSIS IN PYTHON

SENTIMENT ANALYSIS IN PYTHON

Testing set: used to evaluate the performance of the model

SENTIMENT ANALYSIS IN PYTHON

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=123, stratify=y)

test_size: proportion of data used in testing

random_state: seed generator used to make the split

SENTIMENT ANALYSIS IN PYTHON

print('Accuracy on training data: ', log_reg.score(X_train, y_train))

print('Accuracy on testing data: ', log_reg.score(X_test, y_test))

SENTIMENT ANALYSIS IN PYTHON

log_reg = LogisticRegression().fit(X_train, y_train)

SENTIMENT ANALYSIS IN PYTHON

SENTIMENT ANALYSIS IN PYTHON

log_reg = LogisticRegression().fit(X_train, y_train)

SENTIMENT ANALYSIS IN PYTHON

Having a large number of features or parameters

SENTIMENT ANALYSIS IN PYTHON

L2: shrinks all coe cients towards zero

High values of C: low penalization, model ts the training data well.

Low values of C: high penalization, model less exible.

SENTIMENT ANALYSIS IN PYTHON

SENTIMENT ANALYSIS IN PYTHON

# Select the probabilities of class 1

array([0.4997755, 0.5099655 ..., 0.2959501]])

SENTIMENT ANALYSIS IN PYTHON

Accuracy score and confusion matrix work with classes.

# Default probability encoding:

SENTIMENT ANALYSIS IN PYTHON

Amazon product reviews

Twi er airline sentiment

Various emotionally charged literary examples

SENTIMENT ANALYSIS IN PYTHON

Features for the length of reviews: number of words, number of sentences

Feature detecting the language of a review

SENTIMENT ANALYSIS IN PYTHON

from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer

SENTIMENT ANALYSIS IN PYTHON

n-gram range: use phrases not only single words

control size of vocabulary: max_features, max_df, min_df

capturing a pa ern of tokens: remove digits or certain characters

Important but NOT arguments to the vectorizers

lemmas and stems

SENTIMENT ANALYSIS IN PYTHON

Evaluated with accuracy and confusion matrix

Importance of train/test split

SENTIMENT ANALYSIS IN PYTHON

SENTIMENT ANALYSIS IN PYTHON

SENTIMENT ANALYSIS IN PYTHON

SENTIMENT ANALYSIS IN PYTHON

You might also like