0% found this document useful (0 votes)
13 views

Unit 1 Data Objects Attributes Visualization

Chapter 2 discusses the various types of data objects and attribute types, emphasizing the importance of understanding data characteristics for effective data analysis. It covers data visualization techniques that help in gaining insights and identifying patterns in large datasets. The chapter categorizes visualization methods into pixel-oriented, geometric projection, icon-based, and hierarchical techniques, providing examples of each.

Uploaded by

Sa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views

Unit 1 Data Objects Attributes Visualization

Chapter 2 discusses the various types of data objects and attribute types, emphasizing the importance of understanding data characteristics for effective data analysis. It covers data visualization techniques that help in gaining insights and identifying patterns in large datasets. The chapter categorizes visualization methods into pixel-oriented, geometric projection, icon-based, and hierarchical techniques, providing examples of each.

Uploaded by

Sa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

Concepts and

Techniques

— Chapter 2 —

Jiawei Han, Micheline Kamber, and Jian Pei


University of Illinois at Urbana-Champaign
Simon Fraser University
©2011 Han, Kamber, and Pei. All rights
reserved.
1
Chapter 2: Getting to Know Your
Data

 Data Objects and Attribute Types

 Data Visualization

2
Types of Data Sets
 Record

Relational records

Data matrix, e.g., numerical matrix,

timeout

season
coach

game
score
team

ball

lost
pla

wi
crosstabs

n
y

Document data: text documents: term-
frequency vector
Document 1 3 0 5 0 2 6 0 2 0 2

Transaction data
 Graph and network Document 2 0 7 0 2 1 0 0 3 0 0

World Wide Web
Document 3 0 1 0 0 1 2 2 0 3 0

Social or information networks

Molecular Structures
 Ordered TID Items

Video data: sequence of images
1 Bread, Coke, Milk

Temporal data: time-series

Sequential Data: transaction 2 Beer, Bread
sequences 3 Beer, Coke, Diaper, Milk

Genetic sequence data 4 Beer, Bread, Diaper, Milk
 Spatial, image and multimedia:
5 Coke, Diaper, Milk

Spatial data: maps

Image data:

Video data:
3
Important Characteristics of
Structured Data

 Dimensionality
 Curse of dimensionality
 Sparsity
 Only presence counts
 Resolution

Patterns depend on the scale
 Distribution
 Centrality and dispersion

4
Data Objects

 Data sets are made up of data objects.


 A data object represents an entity.
 Examples:
 sales database: customers, store items, sales

medical database: patients, treatments

university database: students, professors, courses
 Also called samples , examples, instances, data
points, objects, tuples.
 Data objects are described by attributes.
 Database rows -> data objects; columns -
>attributes.
5
Attributes
 Attribute (or dimensions, features,
variables): a data field, representing a
characteristic or feature of a data object.
 E.g., customer _ID, name, address

 Types:
 Nominal

 Binary

 Numeric: quantitative


Interval-scaled

Ratio-scaled
6
Attribute Types
 Nominal: categories, states, or “names of things”
 Hair_color = {auburn, black, blond, brown, grey, red,
white}
 marital status, occupation, ID numbers, zip codes
 Binary
 Nominal attribute with only 2 states (0 and 1)
 Symmetric binary: both outcomes equally important

e.g., gender
 Asymmetric binary: outcomes not equally important.

e.g., medical test (positive vs. negative)

Convention: assign 1 to most important outcome
(e.g., HIV positive)
 Ordinal
 Values have a meaningful order (ranking) but magnitude
between successive values is not known.
 Size = {small, medium, large}, grades, army rankings
7
Numeric Attribute Types
 Quantity (integer or real-valued)
 Interval

Measured on a scale of equal-sized units

Values have order
 E.g., temperature in C˚or F˚, calendar dates

No true zero-point
 Ratio

Inherent zero-point

We can speak of values as being an order of
magnitude larger than the unit of
measurement (10 K˚ is twice as high as 5 K˚).
 e.g., temperature in Kelvin, length, counts,
monetary quantities

8
Discrete vs. Continuous
Attributes
 Discrete Attribute
 Has only a finite or countably infinite set of values


E.g., zip codes, profession, or the set of words in
a collection of documents
 Sometimes, represented as integer variables

 Note: Binary attributes are a special case of

discrete attributes
 Continuous Attribute
 Has real numbers as attribute values


E.g., temperature, height, or weight
 Practically, real values can only be measured and

represented using a finite number of digits


 Continuous attributes are typically represented as

floating-point variables
9
Graphic Displays of Basic Statistical
Descriptions

 Boxplot: graphic display of five-number summary


 Histogram: x-axis are values, y-axis repres.
frequencies
 Quantile plot: each value xi is paired with fi indicating
that approximately 100 fi % of data are  xi
 Quantile-quantile (q-q) plot: graphs the quantiles of
one univariant distribution against the corresponding
quantiles of another
 Scatter plot: each pair of values is a pair of
coordinates and plotted as points in the plane
10
Histogram Analysis
 Histogram: Graph display of
tabulated frequencies, shown as 40
bars 35
 It shows what proportion of cases
30
fall into each of several categories
25
 Differs from a bar chart in that it
is the area of the bar that denotes 20
the value, not the height as in bar 15
charts, a crucial distinction when
the categories are not of uniform 10
width 5
 The categories are usually 0
specified as non-overlapping 10000 30000 50000 70000 90000

intervals of some variable. The


categories (bars) must be
adjacent
11
Histograms Often Tell More than
Boxplots

 The two histograms


shown in the left
may have the same
boxplot
representation
 The same values
for: min, Q1,
median, Q3, max
 But they have
rather different data
distributions

12
Chapter 2: Getting to Know Your
Data

 Data Objects and Attribute Types

 Data Visualization

13
Data Visualization
 Why data visualization?
 Gain insight into an information space by mapping data onto
graphical primitives
 Provide qualitative overview of large data sets
 Search for patterns, trends, structure, irregularities, relationships
among data
 Help find interesting regions and suitable parameters for further
quantitative analysis
 Provide a visual proof of computer representations derived
 Categorization of visualization methods:
 Pixel-oriented visualization techniques
 Geometric projection visualization techniques
 Icon-based visualization techniques
 Hierarchical visualization techniques
 Visualizing complex data and relations
14
Pixel-Oriented Visualization
Techniques
 For a data set of m dimensions, create m windows on the
screen, one for each dimension
 The m dimension values of a record are mapped to m pixels
at the corresponding positions in the windows
 The colors of the pixels reflect the corresponding values

(a) Income (b) Credit (c) transaction (d) age


Limit volume 15
Laying Out Pixels in Circle
Segments
 To save space and show the connections among multiple
dimensions, space filling is often done in a circle segment

(a) Representing a data


(b) Laying out pixels in circle
record in circle segment
segment
16
Geometric Projection Visualization
Techniques

 Visualization of geometric transformations and


projections of the data
 Methods
 Direct visualization
 Scatterplot and scatterplot matrices
 Landscapes
 Projection pursuit technique: Help users find
meaningful projections of multidimensional data
 Prosection views
 Hyperslice
 Parallel coordinates
17
Ribbons with Twists Based on Vorticity
Direct Data Visualization

Data Mining: Concepts and Techniques 18


Scatterplot Matrices

Used by ermission of M. Ward, Worcester Polytechnic Institute

Matrix of scatterplots (x-y-diagrams) of the k-dim. data [total of (k2/2-k) scatterplots]

19
Landscapes
Used by permission of B. Wright, Visible Decisions Inc.

news articles
visualized as
a landscape

 Visualization of the data as perspective landscape


 The data needs to be transformed into a (possibly artificial) 2D spatial
representation which preserves the characteristics of the data

20
Parallel Coordinates
 n equidistant axes which are parallel to one of the screen
axes and correspond to the attributes
 The axes are scaled to the [minimum, maximum]: range of
the corresponding attribute
 Every data item corresponds to a polygonal line which
intersects each of the axes at the point which corresponds to
the value for the attribute

• • •

Attr. 1 Attr. 2 Attr. 3 Attr. k


21
Parallel Coordinates of a Data Set

22
Icon-Based Visualization
Techniques

 Visualization of the data values as features of icons


 Typical visualization methods
 Chernoff Faces
 Stick Figures
 General techniques
 Shape coding: Use shape to represent certain
information encoding
 Color icons: Use color icons to encode more
information
 Tile bars: Use small icons to represent the
relevant feature vectors in document retrieval
23
Chernoff Faces
 A way to display variables on a two-dimensional surface, e.g.,
let x be eyebrow slant, y be eye size, z be nose length, etc.
 The figure shows faces produced using 10 characteristics--
head eccentricity, eye size, eye spacing, eye eccentricity,
pupil size, eyebrow slant, nose size, mouth shape, mouth
size, and mouth opening): Each assigned one of 10 possible
values, generated using Mathematica (S. Dickson)
 REFERENCE: Gonick, L. and Smith, W.
The Cartoon Guide to Statistics. New York:
Harper Perennial, p. 212, 1993
 Weisstein, Eric W. "Chernoff Face." From
MathWorld--A Wolfram Web Resource.
mathworld.wolfram.com/ChernoffFace.htm
l
24
Stick Figure
A census data
figure showing
age, income,
used by permission of G. Grinstein, University of Massachusettes at Lowell

gender,
education, etc.

A 5-piece
stick figure (1
body and 4
limbs w.
different
Two attributes mapped to axes, remaining attributes mapped to angle or length of limbs”. Look at texture pattern
angle/length)
25
Hierarchical Visualization
Techniques

 Visualization of the data using a


hierarchical partitioning into subspaces
 Methods
 Dimensional Stacking
 Worlds-within-Worlds
 Tree-Map
 Cone Trees
 InfoCube

26
Dimensional Stacking

attribute 4
attribute 2

attribute 3

attribute 1

 Partitioning of the n-dimensional attribute space in 2-D


subspaces, which are ‘stacked’ into each other
 Partitioning of the attribute value ranges into classes. The
important attributes should be used on the outer levels.
 Adequate for data with ordinal attributes of low cardinality
 But, difficult to display more than nine dimensions
 Important to map dimensions appropriately

27
Dimensional Stacking
Used by permission of M. Ward, Worcester Polytechnic Institute

Visualization of oil mining data with longitude and latitude mapped to


the outer x-, y-axes and ore grade and depth mapped to the inner x-, y-
axes
28
Worlds-within-Worlds
 Assign the function and two most important parameters to
innermost world
 Fix all other parameters at constant values - draw other (1 or 2
or 3 dimensional worlds choosing these as the axes)
 Software that uses this paradigm
 N–vision: Dynamic
interaction through
data glove and stereo
displays, including
rotation, scaling
(inner) and translation
(inner/outer)
 Auto Visual: Static
interaction by means
of queries
29
Tree-Map
 Screen-filling method which uses a hierarchical partitioning
of the screen into regions depending on the attribute values
 The x- and y-dimension of the screen are partitioned
alternately according to the attribute values (classes)

MSR Netscan Image

Ack.: 30
Tree-Map of a File System
(Schneiderman)

31
InfoCube
 A 3-D visualization technique where
hierarchical information is displayed as
nested semi-transparent cubes
 The outermost cubes correspond to the top
level data, while the subnodes or the lower
level data are represented as smaller cubes
inside the outermost cubes, and so on

32
Three-D Cone Trees
 3D cone tree visualization technique
works well for up to a thousand nodes or
so
 First build a 2D circle tree that arranges
its nodes in concentric circles centered
on the root node
 Cannot avoid overlaps when projected to
2D
 G. Robertson, J. Mackinlay, S. Card.
“Cone Trees: Animated 3D Visualizations
of Hierarchical Information”, ACM
SIGCHI'91
 Graph from Nadeau Software Consulting
website: Visualize a social network data
set that models the way an infection
spreads from one person to the next
Ack.: http://nadeausoftware.com/articles/visualization
33
Visualizing Complex Data and
Relations
 Visualizing non-numerical data: text and social networks
 Tag cloud: visualizing user-generated tags


The importance
of tag is
represented by
font size/color
 Besides text data,
there are also
methods to visualize
relationships, such
as visualizing social
networks

Newsmap: Google News Stories in

You might also like