Unit 1 Data Objects Attributes Visualization
Unit 1 Data Objects Attributes Visualization
Techniques
— Chapter 2 —
Data Visualization
2
Types of Data Sets
Record
Relational records
Data matrix, e.g., numerical matrix,
timeout
season
coach
game
score
team
ball
lost
pla
wi
crosstabs
n
y
Document data: text documents: term-
frequency vector
Document 1 3 0 5 0 2 6 0 2 0 2
Transaction data
Graph and network Document 2 0 7 0 2 1 0 0 3 0 0
World Wide Web
Document 3 0 1 0 0 1 2 2 0 3 0
Social or information networks
Molecular Structures
Ordered TID Items
Video data: sequence of images
1 Bread, Coke, Milk
Temporal data: time-series
Sequential Data: transaction 2 Beer, Bread
sequences 3 Beer, Coke, Diaper, Milk
Genetic sequence data 4 Beer, Bread, Diaper, Milk
Spatial, image and multimedia:
5 Coke, Diaper, Milk
Spatial data: maps
Image data:
Video data:
3
Important Characteristics of
Structured Data
Dimensionality
Curse of dimensionality
Sparsity
Only presence counts
Resolution
Patterns depend on the scale
Distribution
Centrality and dispersion
4
Data Objects
Types:
Nominal
Binary
Numeric: quantitative
Interval-scaled
Ratio-scaled
6
Attribute Types
Nominal: categories, states, or “names of things”
Hair_color = {auburn, black, blond, brown, grey, red,
white}
marital status, occupation, ID numbers, zip codes
Binary
Nominal attribute with only 2 states (0 and 1)
Symmetric binary: both outcomes equally important
e.g., gender
Asymmetric binary: outcomes not equally important.
e.g., medical test (positive vs. negative)
Convention: assign 1 to most important outcome
(e.g., HIV positive)
Ordinal
Values have a meaningful order (ranking) but magnitude
between successive values is not known.
Size = {small, medium, large}, grades, army rankings
7
Numeric Attribute Types
Quantity (integer or real-valued)
Interval
Measured on a scale of equal-sized units
Values have order
E.g., temperature in C˚or F˚, calendar dates
No true zero-point
Ratio
Inherent zero-point
We can speak of values as being an order of
magnitude larger than the unit of
measurement (10 K˚ is twice as high as 5 K˚).
e.g., temperature in Kelvin, length, counts,
monetary quantities
8
Discrete vs. Continuous
Attributes
Discrete Attribute
Has only a finite or countably infinite set of values
E.g., zip codes, profession, or the set of words in
a collection of documents
Sometimes, represented as integer variables
discrete attributes
Continuous Attribute
Has real numbers as attribute values
E.g., temperature, height, or weight
Practically, real values can only be measured and
floating-point variables
9
Graphic Displays of Basic Statistical
Descriptions
12
Chapter 2: Getting to Know Your
Data
Data Visualization
13
Data Visualization
Why data visualization?
Gain insight into an information space by mapping data onto
graphical primitives
Provide qualitative overview of large data sets
Search for patterns, trends, structure, irregularities, relationships
among data
Help find interesting regions and suitable parameters for further
quantitative analysis
Provide a visual proof of computer representations derived
Categorization of visualization methods:
Pixel-oriented visualization techniques
Geometric projection visualization techniques
Icon-based visualization techniques
Hierarchical visualization techniques
Visualizing complex data and relations
14
Pixel-Oriented Visualization
Techniques
For a data set of m dimensions, create m windows on the
screen, one for each dimension
The m dimension values of a record are mapped to m pixels
at the corresponding positions in the windows
The colors of the pixels reflect the corresponding values
19
Landscapes
Used by permission of B. Wright, Visible Decisions Inc.
news articles
visualized as
a landscape
20
Parallel Coordinates
n equidistant axes which are parallel to one of the screen
axes and correspond to the attributes
The axes are scaled to the [minimum, maximum]: range of
the corresponding attribute
Every data item corresponds to a polygonal line which
intersects each of the axes at the point which corresponds to
the value for the attribute
• • •
22
Icon-Based Visualization
Techniques
gender,
education, etc.
A 5-piece
stick figure (1
body and 4
limbs w.
different
Two attributes mapped to axes, remaining attributes mapped to angle or length of limbs”. Look at texture pattern
angle/length)
25
Hierarchical Visualization
Techniques
26
Dimensional Stacking
attribute 4
attribute 2
attribute 3
attribute 1
27
Dimensional Stacking
Used by permission of M. Ward, Worcester Polytechnic Institute
Ack.: 30
Tree-Map of a File System
(Schneiderman)
31
InfoCube
A 3-D visualization technique where
hierarchical information is displayed as
nested semi-transparent cubes
The outermost cubes correspond to the top
level data, while the subnodes or the lower
level data are represented as smaller cubes
inside the outermost cubes, and so on
32
Three-D Cone Trees
3D cone tree visualization technique
works well for up to a thousand nodes or
so
First build a 2D circle tree that arranges
its nodes in concentric circles centered
on the root node
Cannot avoid overlaps when projected to
2D
G. Robertson, J. Mackinlay, S. Card.
“Cone Trees: Animated 3D Visualizations
of Hierarchical Information”, ACM
SIGCHI'91
Graph from Nadeau Software Consulting
website: Visualize a social network data
set that models the way an infection
spreads from one person to the next
Ack.: http://nadeausoftware.com/articles/visualization
33
Visualizing Complex Data and
Relations
Visualizing non-numerical data: text and social networks
Tag cloud: visualizing user-generated tags
The importance
of tag is
represented by
font size/color
Besides text data,
there are also
methods to visualize
relationships, such
as visualizing social
networks