Text Sentiment Analysis
Text Sentiment Analysis
Outline
1. Introduction
1. Introduction
Each sentence and paragraph contains it own sentiment feature. With sentence: This is a good movie positive comment.
This movie contains bad words, bad characters and unrelated scenes negative comments
1. Introduction
1. Introduction
Purpose of sentiment detection:
Emotional state
intended emotional communication
Outline
1. Introduction
2) Estimate the semantic orientation of each extracted phrase 3) assign the given review to a class, recommended or not recommended,
PMI-IR method
The Pointwise Mutual Information (PMI) between two words, word1 and word2, is defined as follows (Church & Hanks, 1989):
10
11
12
13
14
15
Each training point belongs to one of N dierent classes. The goal is to construct a function which, given a new data point, will correctly predict the class to which the new point belongs [5].
(i) Solve K different binary problems: classify class k" versus the rest classes" for k = 1; .;K. (ii) Assign a test sample to the class giving the largest fk (x) (most positive) value, where fk (x) is the solution from the kth problem Purpose: Classify reviews as output labels (score rank) and evaluate the accuracy.
16
In statistics, regression analysis is a statistical process for estimating the relationships among variables. It includes many techniques for modeling and analyzing several variables, when the focus is on the relationship between a dependent variable and one or more independent variables. Assuming that the labels come from a discretization of a continuous function g mapping from the feature space to a metric space ..
17
the idea is to find the hyperplane that best the training data, but where training points whose labels are within distance of the hyperplane incur no loss:
, is the negative of the distance between l and the value predicted for x by the filted hyperplane function Koppel and Schler (2005) found that applying linear regression to classify documents (in a different corpus than ours) with respect to a three-point rating scale provided greater accuracy than OVA SVMs and other algorithms.
18
Let d be a distance metric on labels, and let nnk (x) denote the k nearest neighbors of item x according to some item-similarity.
Then, it is quite natural to pose our problem as finding a mapping of instances x to labels lx (respecting the original labels of the training instances) that minimize
19
20
21
The decoder then predicts the m ranks which minimize the overall grief:
22
23
24
Outline
1. Introduction
25
26
Neural network
Artificial neural networks are models inspired by animal central nervous systems (in particular the brain) that are capable of machine learning and pattern recognition. They are usually presented as systems of interconnected "neurons" that can compute values from inputs by feeding information through the network. Main components:
Input, output Weight. Activation function.
27
Learning:
28
Activation function
This is similar to the behavior of the linear perceptron in neural networks However, its a nonlinear function, which allows such networks to compute nontrivial problems using only a small number of nodes.
29
30
Problems:
the time the machine must be run in order to collect equilibrium statistics grows exponentially with the machine's size, and with the magnitude of the connection strengths connection strengths are more plastic when the units being connected have activation probabilities intermediate between zero and one, leading to a so-called variance trap. The net effect is that noise causes the connection strengths to random walk until the activities saturate.
31
32
33
34
35
36
37
Outline
1. Introduction
38
39
40
41
42
43
Represent phrase i
44
45
Recursive neural network (RNN) matrix-vector RNNs . New algorithm: Recursive Neural Tensor Network (RNTN).
46
P1= g(b, c)
. Not
very
good .
47
f : tanh function, standard element-wise nonlinearity. Compute label value by soft-max classifier: = softmax( a)
48
The word matrices can capture compositional effects specific to each word, whereas W captures a general composition function. Nonlinear functions to compute compositional meaning representations for multi-word phrases or full sentences. Disadvantage: the number of parameters becomes very large and depends on the size of the vocabulary [26]
49
50
51
The full derivative for slice V[k] for this tri-gram tree then is the sum at each node:
52
53
Online demo:
nlp.stanford.edu:8080/sentiment/rntnDemo.html http://nlp.stanford.edu/sentiment/treebank.html
54
55
56
Outline
1. Introduction
57
58
59