0% found this document useful (0 votes)
27 views37 pages

PR 10

Uploaded by

Humberto Vasques
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views37 pages

PR 10

Uploaded by

Humberto Vasques
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Data Visualization

460-4120

Fall 2022
Last update 8. 12. 2020
Graph Visualization
• Hierarchical visualization – similar to node-link visualization, exploit the notion of
hierarchy (graph may be naturally structured by means of nodes semantics which
is inherent for trees)
• Two step algorithm
• Nodes are assigned y-coordinates that are proportional to their layer numbers
(nodes are grouped into layers where edges point from a node in a lower
layer to a node in a higher layer)
• Nodes in each layer (top-down) are permuted to minimize the number of
edge intersections (expensive – use heuristics)
• Other methods: maximal layer width method and the depth-first search method

Fall 2022 Data Visualization 2


Graph Visualization
• Hierarchical graph layout example generated by GraphViz

Source: Alexandru C. Telea, Data


Visualization: Principles and Practice, 2014.

Fall 2022 Data Visualization 3


Graph Visualization
• More complex example The edges, although carefully laid
out using spline curves to minimize
crossings, are still quite tangled and
hard to tell apart from each other.
The call graph of a program visualized Addressing this problem in general is
using a hierarchical graph layout. Call quite difficult. We have two options:
graphs are used in software 1. Modify the graph to eliminate
engineering for understanding the edges that are of little interest for
structure of large software source the problem
code bases. Note the separation 2. Group related edges together,
between the main program and until a reduced edge count is
library subsystem reached
(we must know how to do it)
Source: Alexandru C. Telea, Data
Visualization: Principles and Practice, 2014.

Fall 2022 Data Visualization 4


Graph Visualization
• Blueprints are visual scripting system used in Unreal Engine
• An example of encoding additional attributes in the graph visualization

Source: Unreal Engine Documentation.

Fall 2022 Data Visualization 5


Graph Visualization
• Hierarchical graph layout with orthogonal edge routing

In contrast to the
straight lines and
splines, this
orthogonal routing
creates patterns that
are arguably easier to
follow. Note that
different levels of
detail are used
throughout the layout
Source: Alexandru C. Telea, Data
Visualization: Principles and Practice, 2014.

Fall 2022 Data Visualization 6


Graph Visualization
• Hierarchical edge bundling – a method for reduction of visual complexity when
displaying large hierachical graphs
• Edges that are visually close to each other are visually grouped (bundled) to
reduce clutter without lost of any information

The layout used suggests that the left system is


more modular than the right system.

Source: Alexandru C. Telea, Data


Visualization: Principles and Practice, 2014.

Fall 2022 Data Visualization 7


Graph Visualization
• Interactive hierarchical edge bundling in D3.js
• Radial dendograms representing trees

Source: https://observablehq.com/@d3/hierarchical-
edge-bundling

Fall 2022 Data Visualization 8


Graph Visualization
• Icicle plots

Icicle plots are a method for


presenting hierarchical/clustered
data. The technique was developed
in 1983 by Kruskal and Landwher.
They were named as such due to the
fact that the clustering in the
visualization looks like icicles.
Source:
https://www.cs.middlebury.edu/~candrew
s/showcase/infovis_techniques_s16/icicle_
plots/icicleplots.html

Fall 2022 Data Visualization 9


Graph Visualization
• Ascii style (old school) icicle plot example (no filtering, no zoom, no shading, no
interactivity, no exploration)

Source:
https://www.cs.middlebury.edu/~candrew
s/showcase/infovis_techniques_s16/icicle_
plots/icicleplots.html

Fall 2022 Data Visualization 10


Graph Visualization
• General graph-edge bundling methods
• Force-directed edge bundling (FDEB)
• Geometry-based edge bundling (GBEB)
• Winding roads (WR)
• Skeleton-based edge bundling (SBEB)
• Kernel density estimation edge bundling (KDEEB)

Fall 2022 Data Visualization 11


Graph Visualization Source: Alexandru C. Telea, Data
Visualization: Principles and Practice, 2014.

• Examples of general graph-edge bundling methods

Fall 2022 Data Visualization 12


Tool: D3 = Data Driven Documents
• What is D3
• A JavaScript library for manipulating documents based on data
• What D3 is not?
• Not a chart library; it is a visualization library
• Not a compatibility layer
• Not only about SVG, HTML, or Canvas
• See https://d3js.org for further reference

Fall 2022 Data Visualization 13


D3: Selection
• Modifying documents using W3C DOM API is tedious:

• var paragraphs = document.getElementsByTagName("p");


for (var i = 0; i < paragraphs.length; i++)
{
var paragraph = paragraphs.item(i);
paragraph.style.setProperty("color", "white", null);
}

Fall 2022 Data Visualization 14


D3: Selection
• D3 employs a declarative approach:

• Operating on arbitrary sets of nodes:


• d3.selectAll("p").style("color", "white");

• Manipulating individual nodes:


• d3.select("body").style("background-color", "black");

Fall 2022 Data Visualization 15


D3: Selection
• D3 uses CSS Selectors
• Single selector
• #foo // <any id="foo"> </any>
• foo // <foo> </foo>
• .foo // <any class="foo"> </any>
• [foo=bar] // <any foo="bar"> </any>
• foo bar // <foo><bar> </bar></foo>
• Multiple selectors:
• foo.bar // <foo class="bar"> </foo>
• foo#bar // <foo id="bar"> </foo>
Fall 2022 Data Visualization 16
D3: Select and Modifiy Element Properties
• var svg = d3.select("svg"); • d3.select("svg").select("rect")
.attr({
"width": 100,
• var rect = svg.select("rect"); "height": 100
rect.attr("width", 100); })
rect.attr("height", 100); .style({
rect.style("fill", "steelblue"); "fill": "steelblue"
});
• svg.select("rect")
.attr("width", 100)
.attr("height", 100)
.style("fill", "steelblue");

Fall 2022 Data Visualization 17


D3: Transitions
• var svg = d3.select("svg");

svg.selectAll("rect")
.data([127, 61, 256])
.transition()
.duration(1500) // 1.5 second
.attr("x", 0)
.attr("y", function(d,i) { return i*90+50; })
.attr("width", function(d,i) { return d; })
.attr("height", 20)
.style("fill", "steelblue");

Fall 2022 Data Visualization 18


D3: Setup
• Create a new folder for your project
• Within that folder create a subfolder called d3
• Download the latest version of D3 into that subfolder and decompress the ZIP file
• (notice both the minified and standard version)
• Or, download entire repository:
https://github.com/mbostock/d3
• Or, to link directly to the latest release, copy this snippet:
<script src="//d3js.org/d3.v3.min.js" charset="utf-
8"></script>

Fall 2022 Data Visualization 19


D3: Setup
• Create a simple HTML page within project folder named index.html:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html lang="en">
<head>
<meta http-equiv="content-type" content="text/html>
<meta charset="utf-8">
<title>D3 Page Template</title>
<script type="text/javascript" src="d3/d3.js" charset="utf-8"></script>
</head>
<body>
<script type="text/javascript">
// TODO
</script>
</body>
</html>

Fall 2022 Data Visualization 20


D3: Setup
• Running a Python mini web server:
Python 2.x:
python –m SimpleHTTPServer 8080

Python 3.x:
python –m http.server 8080

You should get:


127.0.0.1 - - [02/Dec/2022 22:58:35] "GET / HTTP/1.1" 200 -
127.0.0.1 - - [02/Dec/2022 22:58:35] "GET /d3/d3.js HTTP/1.1" 200 -
Fall 2022 Data Visualization 21
Multivariate Data Visualization
• Consider a set of 𝑁 data points 𝐷 = 𝒑𝑖 , 1 ≤ 𝑖 ≤ 𝑁
• Every data point 𝒑𝑖 is represented as a 𝐾-dimensional vector of attributes 𝒑𝑖 =
𝑎𝑖1 , … , 𝑎𝑖𝐾 ∈ 𝐴𝐾 where 𝐴 is some domain
• Dataset 𝒑𝑖 is called multivariate
• We want to visualize this dataset such that correlations, outliers, clusters, and
trend become visible

Fall 2022 Data Visualization 22


Multivariate Data Visualization Source: Alexandru C. Telea, Data
Visualization: Principles and Practice, 2014.

• Example of table visualization (left) and parallel coordinate plot (PCP) of 𝐾-


dimensional point 𝑝𝑗 (right)

Fall 2022 Data Visualization 23


Multivariate Data Visualization Source:
https://datavizcatalogue.com/metho
ds/parallel_coordinates.html

• Parallel coordinate plot showing 7 attributes for about 30 cars

Fall 2022 Data Visualization 24


Multivariate Data Visualization
• The graph can be (interactively) filtered by dragging the selection on each vertical
axis

Source: https://www.generativedesign.org

Fall 2022 Data Visualization 25


Dimensionality Reduction
• Consider the same multivariate dataset 𝐷 ⊂ 𝐴𝐾 as before, i.e. 𝐷 = 𝒑𝑖 where 𝒑𝑖
lives in some 𝐾-dimensional space 𝐴𝐾
• We want to visualize the structure of the dataset 𝐷
• To do this, we construct so-called projection function
𝑃: 𝐴𝐾 → ℝ𝑘
where 𝑘 is typically 2 or 3 what yields 2D or 3D scatter plot (graph splatting)
• Projection function 𝑃 should respect several constraints
• Distance preservation
• Neighborhood preservation

Fall 2022 Data Visualization 26


Dimensionality Reduction
• Stress function – global indicator of constraints preservation
2
σ𝑖,𝑗 𝒑𝑖 − 𝒑𝑗 − 𝑃 𝒑𝑖 − 𝑃 𝒑𝑗
𝜎= 2
σ𝑖,𝑗 𝒑𝑖 − 𝒑𝑗

• This function measures how well the placement of the projections preserves the
aforementioned constraints
• Techniques that compute a projection 𝑃 that minimizes stress function 𝜎 are
known as dimensionality reduction methods

Fall 2022 Data Visualization 27


Dimensionality Reduction Techniques
• Multidimensional scaling
• Projection-based dimensionality reduction

Fall 2022 Data Visualization 28


Multidimensional Scaling
• Instead of actual coordinates of the points 𝒑𝑖 in 𝐴𝐾 we only know the square
matrix 𝑀𝑁×𝑁 = 𝑑𝑖,𝑗 where 1 < 𝑖 < 𝑁, 1 < 𝑗 < 𝑁 and 𝑑𝑖,𝑗 are the distances (or
dissimilarities) between these 𝐾-dimensional points
• Distances are computed with the aim of arbitrarily designed function 𝛿: 𝐴 × 𝐴 →
ℝ+ which gives the one-dimensional distance between two attributes such that
𝐾
𝑙 𝑙 2
𝑑𝑖,𝑗 = 𝒑𝑖 − 𝒑𝑗 = ෍𝛿 𝑎𝑖 , 𝑎𝑗
𝑙=1

• Multidimensional scaling (MDS) is the group of methods that compute the


projection 𝑃 by directly minimizing the stress function 𝜎

Fall 2022 Data Visualization 29


Multidimensional Scaling
• Embeding – process of assigning 𝑘-dimensional coordinates to points in an unknown 𝐾-
dimensional space
• Word scaled means that the distances between data points in the low-dimension 𝑘
should be scaled distances between the same data points in the original 𝐾-dimensional
(unknown) space
• Force-directed layouts – the edge stiffness between
2
two points is inversely proportional
to their distance (it has high complexity 𝑂(𝑁 ); optimization: distant points are not
connected by an edge at all)
• FastMap – uses only the distance matrix 𝑀
• 1. Choose points 𝒑𝑖 and 𝒑𝑗 which maximize 𝑑𝑖,𝑗
• 2. Project all points 𝒑𝑙 on the line 𝒗 = 𝒑𝑗 − 𝒑𝑖 to find coordinate in the 𝑘-
dimensional space
• 3. Recursively apply FastMap to the projections of 𝒑𝑖 on a plane orthogonal to 𝒗, to
find the remaining 𝑘 − 1 coordinates

Fall 2022 Data Visualization 30


Multidimensional Scaling
• To find the coordinate 𝑥𝑙 of 𝒑𝑙 along the line 𝒗, we only need to know the
distances between the points 𝒑𝑖 , 𝒑𝑗 , and 𝒑𝑙 using the cosine law theorem

2 2 2
𝑑𝑖,𝑙 + 𝑑𝑖,𝑗 − 𝑑𝑙,𝑗
𝑥𝑙 =
2𝑑𝑖,𝑗
• FastMap has complexity only 𝑂(𝑘𝑁)

Fall 2022 Data Visualization 31


Multidimensional Scaling
• Other methods
• Spectral decomposition – project points along the eigenvectors having the
largest eigenvalues of the distance matrix
• LLE – topology preserving manifold learning method
• Isomap – captures nonlinear relationships in the dataset where point-to-point
distances are replaced by an approximation of the geodesic distance between
points given by the shortest path on a graph created connecting neighbor
points in the 𝐾-dimensional space with the original distance as weight

Source: http://benalexkeen.com/wp-
content/uploads/2017/05/isomap.png

Fall 2022 Data Visualization 32


Projection-Based Dimensionality Reduction
• Applicable when we know the original 𝐾-dimensional point coordinates
• Karhunen-Loève method (K-L transform) works as follows
• 1. Compute covariance matrix 𝐶 of the 𝑁 𝐾-dimensional points 𝒑𝑖
• 2. Compute the eigenvectors 𝒆𝑖 of 𝐶 corresponding to the first 𝑘 largest eigenvalues 𝜆𝑖 of 𝐶
• 3. Compute the projections 𝑃 𝒑𝑖 = 𝑞𝑖1 , … , 𝑞𝑖𝑘 as 𝑞𝑖𝑙 = 𝒆𝑙 ∙ 𝒑𝑖 for all 1 < 𝑙 < 𝑘
• Idea behind: eigenvectors corresponding to the largest eigenvalues of 𝐶 indicate
the direction in 𝐾-dimensional space along which the points 𝒑𝑖 spread the most.
If we construct our 𝑘-dimensional projections 𝑃(𝒑𝑖 ) by projecting data along
these directions, we preserve the most information encoded in interpoint
distances
• It is closely related to the singular value decomposition (SVD) technique

Fall 2022 Data Visualization 33


Advanced Dimensionality Reduction
Techniques
• Optimize balance between scalability (handle large datasets) and accuracy
(preserve distances and neighborhoods)
• Least square projection (LSP) – very precise in preserving neighborhoods
• Part-linear multidimensional projection (PLMP)
• Local affine multidimensional projection (LAMP) – needs to access the point
coordinates, very fast

Fall 2022 Data Visualization 34


Projections Examples
• 2D scatter plot of 2100 points from 18-
dimensional dataset projected in 2D by LAMP
• Each of 18 attributes represents some
statistical properties of observed objects
• One additional attribute describes the class

Source: Alexandru C. Telea, Data


Visualization: Principles and Practice, 2014.

Fall 2022 Data Visualization 35


Projection Explanations
• Explanatory visualization
mechanisms annotate 2D or 3D
projection plot with
information that enables users
to revert the 𝑘-dimensional
mapping to the original 𝐾-
dimensions

Source: Alexandru C. Telea, Data


Visualization: Principles and Practice, 2014.

Fall 2022 Data Visualization 36


Exercise
• Try to apply the FastMap algorithm on dimensionality reduction of some
multidimensional dataset (where 𝐾 > 4 and 𝑘 = 2)
• http://www.cs.cmu.edu/~christos/software.html
• The result should be a 2D colored scatter plot with highlighted ranges of selected
attributes or actual classes of the original data points
• You may use some trivial 𝐾-dim dataset rotated with a (random) 𝐾-dim rotation
matrix obtained from mgen library (https://github.com/NOhs/mgen)
• For further reference, see the original publication: FALOUTSOS, Christos; LIN,
King-Ip. FastMap: A fast algorithm for indexing, data-mining and visualization of
traditional and multimedia datasets. In: Proceedings of the 1995 ACM SIGMOD
international conference on Management of data. 1995. p. 163-174.

Fall 2022 Data Visualization 37

You might also like