0% found this document useful (0 votes)
57 views9 pages

Authors:: Apoorv Agarwal Boyi Xie Ilia Vovsha Owen Rambow Rebecca Passonneau

This document discusses sentiment analysis on Twitter data. It presents an overview of how Twitter is used to express sentiment on topics through tweets of up to 140 characters. It then describes several classification tasks for detecting sentiment polarity in tweets, including binary and 3-way classification. Features used for the classifiers include unigrams, Twitter-specific features, a tree kernel model, and a sentiment dictionary. The document also covers preprocessing of tweets, including replacing emoticons and acronyms, and prior polarity scoring of words. Evaluation shows that sentiment features and a combination of models achieve the best accuracy for sentiment classification of tweets.

Uploaded by

Gaurav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views9 pages

Authors:: Apoorv Agarwal Boyi Xie Ilia Vovsha Owen Rambow Rebecca Passonneau

This document discusses sentiment analysis on Twitter data. It presents an overview of how Twitter is used to express sentiment on topics through tweets of up to 140 characters. It then describes several classification tasks for detecting sentiment polarity in tweets, including binary and 3-way classification. Features used for the classifiers include unigrams, Twitter-specific features, a tree kernel model, and a sentiment dictionary. The document also covers preprocessing of tweets, including replacing emoticons and acronyms, and prior polarity scoring of words. Evaluation shows that sentiment features and a combination of models achieve the best accuracy for sentiment classification of tweets.

Uploaded by

Gaurav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 9

Sentiment Analysis on Twitter

Data
Authors:

Apoorv Agarwal

Boyi Xie

Ilia Vovsha

Owen Rambow

Rebecca Passonneau

Presented by Kripa K S

Overview:

twitter.com is a popular microblogging website.

Each tweet is 140 characters in length

Tweets are frequently used to express a tweeter's


emotion on a particular subject.
There are firms which poll twitter for analysing
sentiment on a particular topic.
The challenge is to gather all such relevant data,
detect and summarize the overall sentiment on a
topic.

Classification Tasks and Tools:

Polarity classification positive or negative


sentiment

3-way classification positive/negative/neutral

10,000 unigram features baseline

100 twitter specific features

A tree kernel based model

A combination of models.

A hand annotated dictionary for emoticons and


acronyms

About twitter and structure of tweets:

140 charactes spelling errors, acronyms,


emoticons, etc.

@ symbol refers to a target twitter user

# hashtags can refer to topics

11,875 such manually annotated tweets

1709 positive/negative/neutral tweets to balance


the training data

Preprocessing of data

Emoticons are replaced with their labels

:) = positive

:( = negative

170 such emoticons.

Acronyms are translated. 'lol' to laughing out loud.

5184 such acronyms

URLs are replaced with ||U|| tag and targets with


||T|| tag
All types of negations like no, n't, never are
replaced by NOT
Replace repeated characters by 3 characters.

Prior Polarity Scoring

Features based on prior polarity of words.

Using DAL assign scores between 1(neg) - 3(pos)

Normalize the scores

< 0.5 = negative

> 0.8 = positive

If word is not in dictionary, retrieve synonyms.

Prior polarity for about 88.9% of English words

Tree Kernel

@Fernando this isnt a great day for playing the


HARP! :)

Features

It is shown that f2+f3+f4+f9 (senti-features) achieves


better accuracy than other features.

3-way classification

Chance baseline is 33.33%


Senti-features and unigram model perform on par
and achieve 23.25% gain over the baseline.
The tree kernel model outperforms both by 4.02%
Accuracy for the 3-way classification task is found
to be greatest with the combination of f2+f3+f4+f9
Both classification tasks used SVM with 5-fold
cross-validation.

You might also like