Web Content Mining and NLP: Bing Liu Department of Computer Science University of Illinois at Chicago Liub@cs - Uic.edu
Web Content Mining and NLP: Bing Liu Department of Computer Science University of Illinois at Chicago Liub@cs - Uic.edu
Introduction
Structured
1. Structured data extraction
data
2. Information integration Semi-structured
data
3. Information synthesis
Unstructured
4. Opinion mining
text
Conclusions
nesting
image 1 Cabinet Organizers by Copco 9-in. Round Turntable: White ***** $4.95
image 1 Cabinet Organizers by Copco 12-in. Round Turntable: White ***** $7.95
HTML
BODY
HEAD
TABLE P
TABLE
TBODY
TR TR TR TR TR TR TR TR TR TR
| | | | | |
TD TD TD TD TD TD TD TD TD TD
data data
TD TD TD TD TD TD TD TD
record 1 record 2
Data
records
Data
region2
alignment
… x b d
T2 p
b n c k g
… x b n c d h k g
Introduction
Structured
1. Structured data extraction
data
2. Information integration Semi-structured
text
3. Information synthesis
Unstructured
4. Opinion mining
text
Conclusions
Author =
Mining negative correlations {Last Name, First Name}
Format = Binding
Similarity functions
linguistic similarity
domain similarity
…, final clusters:
{{a1,b1,c1}, {b2,c2},{a2},{b3}}
Observations:
- It is difficult to match “vehicle” field, A, with “make” field, B
- But A’s instances are similar to C’s, and C’s label is similar to B’s
- Thus, C might serve as a “bridge” to connect A and B!
Title?
Introduction
Structured
1. Structured data extraction
data
2. Information integration Semi-structured
text
3. Information synthesis
Unstructured
4. Opinion mining
text
Conclusions
Introduction
Structured
1. Structured data extraction
data
2. Information integration Semi-structured
text
3. Information synthesis
Unstructured
4. Opinion mining
text
Conclusions
Feature1: picture
Reviewer: jprice174 from Atlanta,
Ga. Positive: 12
The pictures coming out of this camera
I did a lot of research last year are amazing.
before I bought this camera... It Overall this is a good camera with a
kinda hurt to leave behind my really good picture clarity.
beloved nikon 35mm SLR, but I …
was going to Italy, and I needed Negative: 2
something smaller, and digital. The pictures come out hazy if your
hands shake even for a moment
The pictures coming out of this during the entire process of taking a
camera are amazing. The 'auto' picture.
feature takes great pictures most Focusing on a display rack about 20
of the time. And with digital, feet away in a brightly lit room during
day time, pictures produced by this
you're not wasting film if the camera were blurry and in a shade of
picture doesn't come out. … orange.
Comparison of +
reviews of
Digital camera 1
Digital camera 2
_
Bing Liu, UIC 55
Mining Tasks
(Hu and Liu, KDD-04; Liu, Web Data Mining book
2006)
Task 1: Identifying and extracting object
features that have been commented on in
each review.
Task 2: Determining whether the opinions on
the features are positive, negative or neutral.
Task 3: Grouping synonym features.
Produce a feature-based opinion summary.
A structured and quantitative summary.
Introduction
Structured
1. Structured data extraction
data
2. Information integration Semi-structured
text
3. Information synthesis
Unstructured
4. Opinion mining
text
Conclusions