IDV-02-Data Foundations
IDV-02-Data Foundations
02
Data Foundations
IDV 2019/2020
Notice
! Author
! This material can be freely used for personal or academic purposes without
any previous authorization from the author, provided that this notice is kept
with.
! For commercial purposes the use of any part of this material requires the
Data Foundations - 2
Bibliography
Data Foundations - 3
Table of Contents
! Introduction
! Data Preprocessing
Data Foundations - 4
Interactive Data Visualization
Data Foundations - 5
Evaluation rules
! Specification
! Paper (20%)
! Code/implementation (30%)
! (mean (Test1; Test2) >= 10) AND (Test1 >= 8) AND (Test2 >= 8)
Data Foundations - 6
Important dates
Data Foundations - 7
Team Registration
" Fill 3 students on one available slot. Only on the yellow cells.
Data Foundations - 8
Team Registration
" Fill 3 students on one available slot. Only on the yellow cells.
! You will receive (later) access to a shared folder for the team: VID-19-20-GNN
" Use this folder to share the information inside the group
Data Foundations - 8
Team Registration
" Fill 3 students on one available slot. Only on the yellow cells.
! You will receive (later) access to a shared folder for the team: VID-19-20-GNN
" Use this folder to share the information inside the group
Data Foundations - 8
Interactive Data Visualization
Data Foundations - 9
What is the Goal of Data Visualization?
by John C. Hart
D V
f
a lo
go
e )
a t
tl im
(u “Data visualization is not just about seeing data !
he
T
Is about UNDERSTANDING data,
by John C. Hart
Map
ping
to V data
isua
l Va
riab
les
Map
ping
to V data
isua
l Va
riab
les
Question(s) / Task
Map
ping
to V data
isua
l Va
riab
les
Interactivity
Question(s) / Task
Data Foundations - 12
What you should know
! What is Data Visualization.
Data Foundations - 12
What you should know
! What is Data Visualization.
Data Foundations - 12
What you should know
! What is Data Visualization.
Data Foundations - 12
What you should know
! What is Data Visualization.
Data Foundations - 12
What you should know
! What is Data Visualization.
Data Foundations - 12
What you should know
! What is Data Visualization.
Data Foundations - 12
What you should know
! What is Data Visualization.
Data Foundations - 12
What you should know
! What is Data Visualization.
Data Foundations - 12
What you should know
! What is Data Visualization.
" Raw data -> data -> viz structures -> images -> perception + feedback
Data Foundations - 12
What you should know
! What is Data Visualization.
" Raw data -> data -> viz structures -> images -> perception + feedback
Data Foundations - 12
What you should know
! What is Data Visualization.
" Raw data -> data -> viz structures -> images -> perception + feedback
Data Foundations - 12
Interactive Data Visualization
Data Foundations - 13
Visualization Process: visualization pipeline
Data Foundations - 14
Data: Sources
! Sources
" Sensors;
" Surveys;
" Simulations;
" Computations;
Data Foundations - 15
Data: Sources
! Sources
" Sensors;
" Surveys;
" Simulations;
" Computations;
Data Foundations - 15
Data: typical data set in visualization
Data Foundations - 16
Data: typical data set in visualization
! List of n records
! (r1, r2, …, rn )
( v1, v2, …, vm )
Data Foundations - 16
Data: typical data set in visualization
! List of n records
! (r1, r2, …, rn )
( v1, v2, …, vm )
Data Foundations - 16
Data: typical data set in visualization
! List of n records
! (r1, r2, …, rn )
( v1, v2, …, vm )
independent variables
Data Foundations - 16
Data: typical data set in visualization
Data Foundations - 17
Data: typical data set in visualization
! We may not know which variables are dependent and which are independent.
Data Foundations - 17
Data: typical data set in visualization
! We may not know which variables are dependent and which are independent.
! In general a data set will not contain an exhaustive list of all possible
Data Foundations - 17
Data: typical data set in visualization
! We may not know which variables are dependent and which are independent.
! In general a data set will not contain an exhaustive list of all possible
Data Foundations - 17
Interactive Data Visualization
Data
(Matthew O. Ward, et all)
Data Foundations - 18
Interactive Data Visualization
Data Types
Data Foundations - 19
Types of data. Numeric versus Non-Numeric
Data Foundations - 20
Types of data. Numeric versus Non-Numeric
! Numeric (ordinal):
Data Foundations - 20
Types of data. Numeric versus Non-Numeric
! Numeric (ordinal):
! categorial: finite (normally short) list of values (e.g., red, green, blue);
! ranked: a categorial variable that has an implied order (e.g., small, medium, large);
Data Foundations - 20
Types of data. Type of scale
Data Foundations - 21
Types of data. Type of scale
Data Foundations - 21
Types of data. Type of scale
another. That is, some values are larger and some are smaller.
Data Foundations - 21
Types of data. Type of scale
another. That is, some values are larger and some are smaller.
! Equal intervals. Scale units along the scale are equal to one another. This means,
for example, that the difference between 1 and 2 would be equal to the difference
Data Foundations - 21
Types of data. Type of scale
another. That is, some values are larger and some are smaller.
! Equal intervals. Scale units along the scale are equal to one another. This means,
for example, that the difference between 1 and 2 would be equal to the difference
! A minimum value of zero. The scale has a true zero point, below which no values
exist. When a scale has an absolute zero then it makes sense to apply all the
Data Foundations - 21
Types of data. Type of scale
Data Foundations - 22
Types of data. Type of scale
Data Foundations - 22
Types of data. Type of scale
Data Foundations - 22
Types of data. Type of scale
Data Foundations - 22
Types of data. Type of scale
" Satisfies identity, magnitude, equal intervals, and a minimum value of zero.
" Continuous. e.g., weight, distance, etc. Can apply operations of / and *.
Data Foundations - 22
Interactive Data Visualization
Data Foundations - 23
Data sets structure
Data Foundations - 24
Data sets structure
! Syntactical rules
Data Foundations - 24
Data sets structure
! Syntactical rules
Data Foundations - 24
Data sets structure
! Syntactical rules
Data Foundations - 24
Scalar, Vector and Tensor
! e.g.: Position coordinates (2D or 3D); Color using RGB(Red, Green, Blue)
components, Phone number (Country code, area code and local number), etc.
! each component (of the vector) can be considered individually but is most
! e.g.: Position coordinates (2D or 3D); Color using RGB(Red, Green, Blue)
components, Phone number (Country code, area code and local number), etc.
! each component (of the vector) can be considered individually but is most
! Tensor: a tensor is defined by its rank and its dimensionality. A scalar is a tensor of
! Geometry via explicit coordinates for each record in the data set.
Data Foundations - 26
Geometry and Grids
! Geometry via explicit coordinates for each record in the data set.
! Data set about fires in Portugal. Associated to each fire a coordinate of the
starting point;
Data Foundations - 26
Geometry and Grids
! Geometry via explicit coordinates for each record in the data set.
! Data set about fires in Portugal. Associated to each fire a coordinate of the
starting point;
! Data set about temperature readings from sensors and associated with all the
Data Foundations - 26
Geometry and Grids
! Geometry via explicit coordinates for each record in the data set.
! Data set about fires in Portugal. Associated to each fire a coordinate of the
starting point;
! Data set about temperature readings from sensors and associated with all the
! Data set describing 3D world. The geometry concept is the majority of the data.
Data Foundations - 26
Geometry and Grids
! Geometry via explicit coordinates for each record in the data set.
! Data set about fires in Portugal. Associated to each fire a coordinate of the
starting point;
! Data set about temperature readings from sensors and associated with all the
! Data set describing 3D world. The geometry concept is the majority of the data.
Data Foundations - 26
Geometry and Grids
! Geometry via explicit coordinates for each record in the data set.
! Data set about fires in Portugal. Associated to each fire a coordinate of the
starting point;
! Data set about temperature readings from sensors and associated with all the
! Data set describing 3D world. The geometry concept is the majority of the data.
! Geometric structure is implied and it is assumed some form of grid. Successive data
records are located at successive positions. It requires to set the starting point, the
Data Foundations - 26
Geometry and Grids
! Geometry via explicit coordinates for each record in the data set.
! Data set about fires in Portugal. Associated to each fire a coordinate of the
starting point;
! Data set about temperature readings from sensors and associated with all the
! Data set describing 3D world. The geometry concept is the majority of the data.
! Geometric structure is implied and it is assumed some form of grid. Successive data
records are located at successive positions. It requires to set the starting point, the
! Satellite images.
Data Foundations - 26
Other forms of structure
! Time
Data Foundations - 27
Other forms of structure
! Time http://www.timeviz.net
! Present in many data sets check to see so many
visualization techniques for
! Uniformly spaced versus non-uniformly spaced Time-Oriented Data
! Relative versus absolute
Data Foundations - 27
Other forms of structure
! Time http://www.timeviz.net
! Present in many data sets check to see so many
visualization techniques for
! Uniformly spaced versus non-uniformly spaced Time-Oriented Data
! Relative versus absolute
! Topology
! This form of structure can be explicitly included in the data record or as an auxiliary data
structure
Data Foundations - 27
Examples
Data Foundations - 28
Interactive Data Visualization
Data
(Tamara Munzner)
Data Foundations - 29
items, attributes, links, positions, and grids. An attribute is some are variab
specific property that can be measured, observed, or logged.! For mension,
Data Types
example, and Dataset
attributes could Types
be salary, price, number of sales, pro- sion for sh
tein expression levels, or temperature. An item is an individual sion has m
entity that is discrete, such as a row in a simple table or a node this book
! Data Types the visual
tial positio
Data Types Section 6.
Figure 2.2. The five basic data types: items, attributes, links, positions, and grids.
Data Foundations - 30
items, attributes, links, positions, and grids. An attribute is some are variab
specific property that can be measured, observed, or logged.! For mension,
Data Types
example, and Dataset
attributes could Types
be salary, price, number of sales, pro- sion for sh
tein expression levels, or temperature. An item is an individual sion has m
entity that is discrete, such as a row in a simple table or a node this book
! Data Types the visual
tial positio
Data Types Section 6.
" An item is an individual entity that is discrete, such as a row in a simple table or a node
Figure 2.2. The five basic data types: items, attributes, links, positions, and grids.
in a network
Data Foundations - 30
items, attributes, links, positions, and grids. An attribute is some are variab
specific property that can be measured, observed, or logged.! For mension,
Data Types
example, and Dataset
attributes could Types
be salary, price, number of sales, pro- sion for sh
tein expression levels, or temperature. An item is an individual sion has m
entity that is discrete, such as a row in a simple table or a node this book
! Data Types the visual
tial positio
Data Types Section 6.
" An item is an individual entity that is discrete, such as a row in a simple table or a node
Figure 2.2. The five basic data types: items, attributes, links, positions, and grids.
in a network
" An attribute is some specific property that can be measured, observed, or logged.⋆
Data Foundations - 30
items, attributes, links, positions, and grids. An attribute is some are variab
specific property that can be measured, observed, or logged.! For mension,
Data Types
example, and Dataset
attributes could Types
be salary, price, number of sales, pro- sion for sh
tein expression levels, or temperature. An item is an individual sion has m
entity that is discrete, such as a row in a simple table or a node this book
! Data Types the visual
tial positio
Data Types Section 6.
" An item is an individual entity that is discrete, such as a row in a simple table or a node
Figure 2.2. The five basic data types: items, attributes, links, positions, and grids.
in a network
" An attribute is some specific property that can be measured, observed, or logged.⋆
Data Foundations - 30
items, attributes, links, positions, and grids. An attribute is some are variab
specific property that can be measured, observed, or logged.! For mension,
Data Types
example, and Dataset
attributes could Types
be salary, price, number of sales, pro- sion for sh
tein expression levels, or temperature. An item is an individual sion has m
entity that is discrete, such as a row in a simple table or a node this book
! Data Types the visual
tial positio
Data Types Section 6.
" An item is an individual entity that is discrete, such as a row in a simple table or a node
Figure 2.2. The five basic data types: items, attributes, links, positions, and grids.
in a network
" An attribute is some specific property that can be measured, observed, or logged.⋆
Data Foundations - 30
items, attributes, links, positions, and grids. An attribute is some are variab
specific property that can be measured, observed, or logged.! For mension,
Data Types
example, and Dataset
attributes could Types
be salary, price, number of sales, pro- sion for sh
tein expression levels, or temperature. An item is an individual sion has m
entity that is discrete, such as a row in a simple table or a node this book
! Data Types the visual
tial positio
Data Types Section 6.
" An item is an individual entity that is discrete, such as a row in a simple table or a node
Figure 2.2. The five basic data types: items, attributes, links, positions, and grids.
in a network
" An attribute is some specific property that can be measured, observed, or logged.⋆
" A grid specifies the strategy for sampling continuous data in terms of both geometric
Data Foundations - 30
Figure 2.4 shows the internal structure of the four basic dataset
types in detail. Tables have cells indexed by items and attributes,
for either the simple flat case or the more complex multidimen-
Data Types and Dataset Types
sional case. In a network, items are usually called nodes, and
they are connected with links; a special case of networks is trees.
Continuous fields have grids based on spatial positions where cells
! Datasetcontain
Types attributes. Spatial geometry has only position information.
Figure 2.3. The four basic dataset types are tables, networks, fields, and geome-
try; other possible collections of items are clusters, sets, and lists. These datasets
are made up of five core data types: items, attributes, links, positions, and grids.
Data Foundations - 31
Figure 2.4 shows the internal structure of the four basic dataset
types in detail. Tables have cells indexed by items and attributes,
for either the simple flat case or the more complex multidimen-
Data Types and Dataset Types
sional case. In a network, items are usually called nodes, and
they are connected with links; a special case of networks is trees.
Continuous fields have grids based on spatial positions where cells
! Datasetcontain
Types attributes. Spatial geometry has only position information.
Figure
" Other ways2.3. The four
to group basic
items dataset include
together types areclusters,
tables, networks, fields,
sets, and and geome-
lists.
try; other possible collections of items are clusters, sets, and lists. These datasets
are made up of five core data types: items, attributes, links, positions, and grids.
Data Foundations - 31
Figure 2.4 shows the internal structure of the four basic dataset
types in detail. Tables have cells indexed by items and attributes,
for either the simple flat case or the more complex multidimen-
Data Types and Dataset Types
sional case. In a network, items are usually called nodes, and
they are connected with links; a special case of networks is trees.
Continuous fields have grids based on spatial positions where cells
! Datasetcontain
Types attributes. Spatial geometry has only position information.
Figure
" Other ways2.3. The four
to group basic
items dataset include
together types areclusters,
tables, networks, fields,
sets, and and geome-
lists.
try; other possible collections of items are clusters, sets, and lists. These datasets
are madesituations,
" In real-world up of five core data types:
complex items, attributes,
combinations of theselinks, positions,
basic andcommon.
types are grids.
Data Foundations - 31
Data Types and Dataset Types
2.4. Dataset Types 25
Dataset Types
Tables Networks Fields (Continuous) Geometry (Spatial)
Attributes (columns) Grid of positions
Items Link
Cell
(rows) Position
Node
(item)
Cell containing value Attributes (columns)
Value in cell
Value in cell
Figure 2.4. The detailed structure of the four basic dataset types.
Dataset Types:
26
Table 2. What: Data Abstraction
Dataset Types
Tables Networks Fields (Continuous) Geometry (Sp
Attributes (columns) Grid of positions
item cell
20 Value in cell
Value in cell
Figure 2.5. In a simple table of orders, a row represents an item, a column rep-
Figure 2.4. The detailed structure of the four basic dataset types.
resents an attribute, and their intersection is the cell containing the value for that
pairwise combination.
! A synonym for networks
is graphs. The word graph
is also deeply overloaded in
2.4.1 Tables vis. Sometimes it is used
to mean network as we dis- 2.4.2 Networks and Trees Data Foundations - 33
cuss here, for instance in
Many datasets come in the form The tablestype
of dataset that are made
of networks is well up offor specifying that there
suited
2.4. Dataset Types
Dataset Types:
26
Table 2. What: Data Abstraction
Dataset Types
Tables Networks Fields (Continuous) Geometry (Sp
Attributes (columns) Grid of positions
item cell
20 Value in cell
Value in cell
Figure 2.5. In a simple table of orders, a row represents an item, a column rep-
A multidimensionalFigure 2.4.a The detailed
table has structure of the four basic dataset types.
resents an attribute, and their intersection is the cell containing the value for that
more complex!structure for indexing pairwise combination.
A synonym for networks
into a cell, withis multiple keys.
graphs. The word graph
is also deeply overloaded in
2.4.1 Tablesvis. Sometimes it is used
to mean network as we dis- 2.4.2 Networks and Trees Data Foundations - 33
cuss here, for instance in
Many datasets come in the form The tablestype
of dataset that are made
of networks is well up offor specifying that there
suited
Data Types and Dataset Types
Link
Cell
Position
Node
(item)
ue Attributes (columns)
Value in cell
ble Trees
n cell
Link
Cell
Position
Node
(item)
ue Attributes (columns)
Value in cell
Link
Cell
Position
Node
(item)
ue Attributes (columns)
Value in cell
scientific visualization
e 2.4. The detailed structure of the four basic dataset types.
Data Foundations - 34
Data Types and Dataset Types
Link
Cell
Position
Node
(item)
ue Attributes (columns)
Value in cell
ble Trees
Attributes
Attribute Types
Categorical Ordered
Ordinal Quantitative
Ordering Direction
Data Foundations - 36
Figure 2.7. Attribute types are categorical, ordinal, or quantitative. The direction
Attribute Types 2. What: Data Abstraction
Attributes
Attribute Types
Categorical Ordered
Ordinal Quantitative
Ordering Direction
Data Foundations - 37
Figure 2.7. Attribute types are categorical, ordinal, or quantitative. The direction
What?
Datasets Attributes
Dataset Types
Ordering Direction
Tables Networks Fields (Continuous)
Sequential
Attributes (columns) Grid of positions
Items Link
Cell
(rows)
Node
Diverging
(item)
Cell containing value Attributes (columns)
Value in cell
Tamara Munzner
Value in cell
Geometry (Spatial)
Position
How?
Interactive Data Visualization
Data Preprocessing
Data Foundations - 39
Data Preprocessing
! Metadata
Data Foundations - 40
Data Preprocessing
! Metadata
Data Foundations - 40
Data Preprocessing
! Metadata
Data Foundations - 40
Data Preprocessing
! Metadata
! Normalization
Data Foundations - 40
Data Preprocessing
! Metadata
! Normalization
! Dimension reduction
Data Foundations - 40
Data Preprocessing
! Metadata
! Normalization
! Dimension reduction
Data Foundations - 40
Data Preprocessing
! Metadata
! Normalization
! Dimension reduction
Data Foundations - 40
Metadata
Data Foundations - 41
Metadata
! With the exception of first column (Vehicle name) we need more information!
Data Foundations - 41
Metadata
! With the exception of first column (Vehicle name) we need more information!
Data Foundations - 41
Metadata
! With the exception of first column (Vehicle name) we need more information!
Data Foundations - 41
Metadata
! Associated Metadata
Data Foundations - 42
Metadata
! Associated Metadata
Data Foundations - 42
Metadata
! Associated Metadata
Data Foundations - 42
Metadata
! Associated Metadata
Data Foundations - 42
Metadata
! Associated Metadata
Data Foundations - 42
Metadata
! Metadata provides:
" Units
Data Foundations - 43
Basic statistics about the (scalar) data
Data Foundations - 44
Basic statistics about the (scalar) data
Data Foundations - 44
Basic statistics about the (scalar) data
Data Foundations - 44
Basic statistics about the (scalar) data
Data Foundations - 44
Basic statistics about the (scalar) data
" Number of values out of range (if the range of variable is provided)
Data Foundations - 44
Basic statistics about the (scalar) data
" Number of values out of range (if the range of variable is provided)
Data Foundations - 44
Basic statistics about the (scalar) data
" Number of values out of range (if the range of variable is provided)
Data Foundations - 44
Basic statistics about the (scalar) data
" Number of values out of range (if the range of variable is provided)
" Mode
Data Foundations - 44
Basic statistics about the (scalar) data
" Number of values out of range (if the range of variable is provided)
" Mode
Data Foundations - 44
Basic statistics about the (scalar) data
" Number of values out of range (if the range of variable is provided)
" Mode
Data Foundations - 44
Basic statistics about the (scalar) data
Data Foundations - 45
Basic statistics about the (scalar) data
Data Foundations - 46
Statistics techniques for getting additional insights
! Outlier detection
indicate experimental error; the latter are sometimes excluded from the data set.!”
https://en.wikipedia.org/wiki/Outlier
https://www.siam.org/meetings/sdm10/tutorial3.pdf
Data Foundations - 47
Statistics techniques for getting additional insights
! Outlier detection
indicate experimental error; the latter are sometimes excluded from the data set.!”
https://en.wikipedia.org/wiki/Outlier
https://www.siam.org/meetings/sdm10/tutorial3.pdf
! Cluster Analysis
! Can help segment the data into groups with strong similarities
https://en.wikipedia.org/wiki/Cluster_analysis
Data Foundations - 47
Statistics techniques for getting additional insights
! Outlier detection
indicate experimental error; the latter are sometimes excluded from the data set.!”
https://en.wikipedia.org/wiki/Outlier
https://www.siam.org/meetings/sdm10/tutorial3.pdf
! Cluster Analysis
! Can help segment the data into groups with strong similarities
https://en.wikipedia.org/wiki/Cluster_analysis
! Correlation Analysis
Data Foundations - 47
Statistics techniques for getting additional insights
! Correlation Analysis
Data Foundations - 48
Missing Values and Data Cleansing
! Missing data:
Data Foundations - 49
Missing Values and Data Cleansing
! Missing data:
Data Foundations - 49
Missing Values and Data Cleansing
! Missing data:
the application domain, the number of missing values, the quality of the other
variables.
Data Foundations - 49
Missing Values and Data Cleansing
! Missing data:
the application domain, the number of missing values, the quality of the other
variables.
! Erroneous data
Data Foundations - 49
Missing Values and Data Cleansing
! Missing data:
the application domain, the number of missing values, the quality of the other
variables.
! Erroneous data
Data Foundations - 49
Missing Values and Data Cleansing
! Missing data:
the application domain, the number of missing values, the quality of the other
variables.
! Erroneous data
! May be very hard to detect unless they are out of range values or obvious outlier.
Data Foundations - 49
Missing Values
evaluated. Sometimes the records with missing values are the most interesting to
be analyzed.
Data Foundations - 50
Missing Values
evaluated. Sometimes the records with missing values are the most interesting to
be analyzed.
! Assign a sentinel value for each variable when the real value is in question
processing.
Data Foundations - 50
Missing Values
evaluated. Sometimes the records with missing values are the most interesting to
be analyzed.
! Assign a sentinel value for each variable when the real value is in question
processing.
! Average value for that variable; Minimally affects the statistics of that variable;
Data Foundations - 50
Missing Values and Data Cleansing
Data Foundations - 51
Missing Values and Data Cleansing
! Try to find the (missing) value for one variable i for one particular record based on the
value(s) for that variable based on the records that are the most similar to this
particular record (based on the other variables). We are assuming that the variable i
Data Foundations - 51
Missing Values and Data Cleansing
! Try to find the (missing) value for one variable i for one particular record based on the
value(s) for that variable based on the records that are the most similar to this
particular record (based on the other variables). We are assuming that the variable i
! All the previous methods are had hoc ! Some new statistical approaches propose
methods and algorithms to make multiple imputations for the missing values
Data Foundations - 51
Normalization
Data Foundations - 52
Normalization
Data Foundations - 52
Normalization
Data Foundations - 52
Normalization
Data Foundations - 52
Normalization
Data Foundations - 52
Normalization
+-./01/234+5/1
• !"#$%&'()*+ = (+527 4+5/1 )
Data Foundations - 52
Normalization
+-./01/234+5/1
• !"#$%&'()*+ = (+527 4+5/1 )
Data Foundations - 52
Normalization
+-./01/234+5/1
• !"#$%&'()*+ = (+527 4+5/1 )
Data Foundations - 52
Normalization
+-./01/234+5/1 *+,-./01#2
• !"#$%&'()*+ = • !"#$%&'( =
(+527 4+5/1 ) 3
Data Foundations - 52
Normalization
+-./01/234+5/1 *+,-./01#2
• !"#$%&'()*+ = • !"#$%&'( =
(+527 4+5/1 ) 3
Data Foundations - 52
Normalization
! Data from 414 cars (from 2004); Variable: City Miles Per Gallon (City MPG)
City-MPG
120
100
80
60
40
20
0
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
Data Foundations - 53
Normalization
! Data from 414 cars (from 2004); Variable: City Miles Per Gallon (City MPG)
City-MPG City0MPG0Norm
120 120
100 100
80 80
60 60
40 40
20 20
0
0
0.2
0.4
0.6
0.8
0.04
0.08
0.12
0.16
0.24
0.28
0.32
0.36
0.44
0.48
0.52
0.56
0.64
0.68
0.72
0.76
0.84
0.88
0.92
0.96
1
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
Data Foundations - 53
Normalization
! Data from 414 cars (from 2004); Variable: City Miles Per Gallon (City MPG)
City-MPG City0MPG0Norm
120 120
100 100
80 80
60 60
40 40
20 20
0
0
0.2
0.4
0.6
0.8
0.04
0.08
0.12
0.16
0.24
0.28
0.32
0.36
0.44
0.48
0.52
0.56
0.64
0.68
0.72
0.76
0.84
0.88
0.92
0.96
1
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
City0MPG0SQRT0Norm
120
100
80
60
40
20
0
0.2
0.4
0.6
0.8
0.04
0.08
0.12
0.16
0.24
0.28
0.32
0.36
0.44
0.48
0.52
0.56
0.64
0.68
0.72
0.76
0.84
0.88
0.92
0.96
1
Data Foundations - 53
Normalization
! Data from 414 cars (from 2004); Variable: City Miles Per Gallon (City MPG)
City-MPG City0MPG0Norm
120 120
100 100
80 80
60 60
40 40
20 20
0
0
0.2
0.4
0.6
0.8
0.04
0.08
0.12
0.16
0.24
0.28
0.32
0.36
0.44
0.48
0.52
0.56
0.64
0.68
0.72
0.76
0.84
0.88
0.92
0.96
1
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
City0MPG0SQRT0Norm City0MPG0LOG0Norm
120 120
100 100
80 80
60 60
40 40
20 20
0 0
0.2
0.4
0.6
0.8
0.2
0.4
0.6
0.8
0.04
0.08
0.12
0.16
0.24
0.28
0.32
0.36
0.44
0.48
0.52
0.56
0.64
0.68
0.72
0.76
0.84
0.88
0.92
0.96
0.04
0.08
0.12
0.16
0.24
0.28
0.32
0.36
0.44
0.48
0.52
0.56
0.64
0.68
0.72
0.76
0.84
0.88
0.92
0.96
1
1
Data Foundations - 53
Normalization
! Data from 414 cars (from 2004); Variable: City Miles Per Gallon (City MPG)
City-MPG
120
Normalization6Maps
100 1
80
0.9
60
0.8
40
20 0.7
0 0.6
12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60
Normalize6min;max
0.5 Normalize6SQRT
City/MPG/Z'Score Normalize6LOG
0.4 Normalize6Percentil
120
100 0.3
80
0.2
60
0.1
40
20 0
10 15 20 25 30 35 40 45 50 55 60
0
'02 '01 '01 00 00 00 01 01 02 02 02 03 03 03 04 04 05 05 05 06 06 07 07 07 08
Data Foundations - 54
Dimension reduction
Data Foundations - 56
Dimension reduction
Data Foundations - 56
Dimension reduction
Data Foundations - 56
Dimension reduction
Data Foundations - 56
Dimension reduction - Principal Component Analysis (PCA)
Data Foundations - 57
Dimension reduction - Principal Component Analysis (PCA)
! The advantage of the new dimensions is that they can be sorted according to
Data Foundations - 57
Dimension reduction - Principal Component Analysis (PCA)
! The advantage of the new dimensions is that they can be sorted according to
Data Foundations - 57
Dimension reduction - Principal Component Analysis (PCA)
Iris versicolor
Iris setosa
Iris virginica
Data Foundations - 58
Dimension reduction - Principal Component Analysis (PCA)
! Figure 2.4 from Interactive Data Visualization: Foundations, Techniques, and Applications, Matthew O. Ward,
Georges Grinstein, Daniel Keim, 2010
Iris flower data set
4 Variables
2 Variables
Data Foundations - 59
Mapping Nominal Dimensions to Numbers
Data Foundations - 60
Mapping Nominal Dimensions to Numbers
Data Foundations - 60
Mapping Nominal Dimensions to Numbers
Data Foundations - 60
Mapping Nominal Dimensions to Numbers
Data Foundations - 60
Mapping Nominal Dimensions to Numbers
! Warning:
Data Foundations - 60
Mapping Nominal Dimensions to Numbers
! Warning:
Data Foundations - 60
Mapping Nominal Dimensions to Numbers
! Warning:
Data Foundations - 60
Mapping Nominal Dimensions to Numbers
Data Foundations - 61
Mapping Nominal Dimensions to Numbers
• Use this variable as the label for the graphical elements being displayed when
Data Foundations - 61
Mapping Nominal Dimensions to Numbers
• Use this variable as the label for the graphical elements being displayed when
• Showing random subsets of labels and changing the points with labels being
shown on a regular basis, and showing only the labels on objects near the
cursor.
Data Foundations - 61
Mapping Nominal Dimensions to Numbers
Data Foundations - 62
Mapping Nominal Dimensions to Numbers
! If the statistical properties of the records associated with one nominal value are
sufficiently similar to the properties of a different value, then that implies that
Data Foundations - 62
Mapping Nominal Dimensions to Numbers
! If the statistical properties of the records associated with one nominal value are
sufficiently similar to the properties of a different value, then that implies that
! Conversely, if there are sufficient differences in properties, then likely they should
Data Foundations - 62
Mapping Nominal Dimensions to Numbers
! If the statistical properties of the records associated with one nominal value are
sufficiently similar to the properties of a different value, then that implies that
! Conversely, if there are sufficient differences in properties, then likely they should
! Given all the pairwise similarities, we could use correspondence analysis to map the
Data Foundations - 62
Interactive Data Visualization
Data Foundations - 63
Segmentation
Data Foundations - 64
Segmentation
! In many situations, the data can be separated into contiguous regions, where
Data Foundations - 64
Segmentation
! In many situations, the data can be separated into contiguous regions, where
Data Foundations - 64
Segmentation
! In many situations, the data can be separated into contiguous regions, where
where each data point is assigned a probability for belonging to each of the
available classifications.
Data Foundations - 64
Segmentation
! In many situations, the data can be separated into contiguous regions, where
where each data point is assigned a probability for belonging to each of the
available classifications.
Data Foundations - 64
Segmentation
Data Foundations - 65
Sampling and subsetting
Data Foundations - 66
Sampling and subsetting
! To transform a data set with one spatial resolution into another data set with a
points and wish to fill in values for locations between our samples (assuming
Data Foundations - 66
Sampling and subsetting
! To transform a data set with one spatial resolution into another data set with a
points and wish to fill in values for locations between our samples (assuming
! Linear interpolation
! bi-linear interpolation
! Nonlinear interpolation
Data Foundations - 66
Sampling and subsetting
! Data subsetting is also a frequently used operation both prior to and during
visualization.
! This is especially helpful for very large data sets, as the visualization of the
Data Foundations - 67
Aggregation and Summarization
Data Foundations - 68
Aggregation and Summarization
! it is often useful to group data points based on their similarity in value and/or
Data Foundations - 68
Aggregation and Summarization
! it is often useful to group data points based on their similarity in value and/or
− https://en.wikipedia.org/wiki/Cluster_analysis
− http://www.ise.bgu.ac.il/faculty/liorr/hbchap15.pdf
Data Foundations - 68
Aggregation and Summarization
! it is often useful to group data points based on their similarity in value and/or
− https://en.wikipedia.org/wiki/Cluster_analysis
− http://www.ise.bgu.ac.il/faculty/liorr/hbchap15.pdf
" Provide sufficient information for the user to decide whether he or she wishes to
Data Foundations - 68
Aggregation and Summarization
Data Foundations - 69
Smoothing and Filtering
Data Foundations - 70
Smoothing and Filtering
Data Foundations - 70
Smoothing and Filtering
(presumably because of noise) are reduced, and points that are lower than the
Data Foundations - 70
Smoothing and Filtering
(presumably because of noise) are reduced, and points that are lower than the
! See more:
! https://en.wikipedia.org/wiki/Smoothing
Data Foundations - 70
Raster to vector conversion
Data Foundations - 71
Raster to vector conversion
! In Computer Graphics:
! Vector data (vertices, edges, and triangular or quadrilateral patches) => Image
(pixel-based)
Data Foundations - 71
Raster to vector conversion
! In Computer Graphics:
! Vector data (vertices, edges, and triangular or quadrilateral patches) => Image
(pixel-based)
Data Foundations - 71
Raster to vector conversion
! In Computer Graphics:
! Vector data (vertices, edges, and triangular or quadrilateral patches) => Image
(pixel-based)
Data Foundations - 71
Interactive Data Visualization
Data Foundations - 72
Further Reading
! Recommend Readings
Applications
! Supplemental readings:
" https://en.wikipedia.org/wiki/Outlier
" https://en.wikipedia.org/wiki/Cluster_analysis
" https://en.wikipedia.org/wiki/Correspondence_analysis
" https://en.wikipedia.org/wiki/Cluster_analysis
! The various data types taxonomies and the impact of a data type in visualization.
! Data pre-processing techniques: the goal of each one and the most important ones
" Outlier detection and process; normalization; dimensionality reduction, Sampling and
subsetting; Aggregation and Summarization
Data Foundations - 74
Recommended Actions
! Install Tableau software (desktop version). Activate with a students license.
! http://www.tableau.com/academic/students
! http://www.tableau.com/learn/tutorials/on-demand/getting-started
! Get familiar with the dataset 2004 Cars and Trucks Data Set
! http://www.idvbook.com/teaching-aid/teaching-aid/data-sets/2004-cars-and-trucks-data/
Data Foundations - 75