Module-2 Notes
Module-2 Notes
1
Graph-theoretic Graphics
• A graph is a a non linear data structure that consists of (V,E)
• A finite collection of vertices or nodes V
• A finite collection of edges E, represented as ordered pairs of vertices (u,v)
2
Graph-theoretic Graphics
Term Description
Vertex Every individual data element is called a vertex or a node.
A connecting link between two nodes or vertices. Each edge has
Edge (Arc)
two ends and is represented as startingVertex, and endingVertex.
Undirected Edge It is a bidirectional edge.
Directed Edge It is a unidirectional edge.
Weighted Edge An edge with value on it.
Degree The total number of edges connected to a vertex in a graph.
Indegree The total number of incoming edges connected to a vertex.
Outdegree The total number of outgoing edges connected to a vertex.
Self-loop An edge is called a self-loop if its two endpoints coincide.
Adjacency Vertices are said to be adjacent if an edge is connected.
3
Graph-theoretic Graphics
Path:
• A finite or infinite set of edges which joins a set of vertices.
• It can connect to 2 or more nodes.
• If the path connects all the nodes of a graph, then it is a connected
graph, otherwise it is called a disconnected graph.
• There may or may not be path to each and every node of graph. In case,
there is no path to any node, then that node becomes an isolated node.
• The path from 'a' to 'e' is = {a, b, c, d, e}
Closed Path:
• A path is called as closed path if the initial node is same as terminal(end)
node, i.e., if : V0 = Vn, where V0 is the starting node if the graph and Vn
is the last node.
• The closed path = {e, d, f, g, e}
Simple Path:
• A path that does not repeat any nodes (vertices)
• A simple path in a graph exists if all the nodes of the graph are distinct,
expect for the first and the last vertex. {a, b, c, d}
4
Graph-theoretic Graphics
Cycle Graph:
• A simple graph of ‘n’ nodes(vertices) (n>=3) and n edges forming a
cycle of length ‘n’ is called as a cycle graph.
• In a cycle graph, all the vertices are of degree 2.
Connected Graph:
• A graph in which there is an edge or path joining each pair of vertices.
• In connected graph,
• Can visit from any one vertex to any other vertex.
• There exists at least one path between every pair of vertices.
• There is not a single vertex in a connected graph, which is unreachable(or isolated).
Complete Graph (full graph)
• An edge between every single pair of node in the graph or every vertex has an edge to all other
vertices.
• A complete graph of ‘n’ vertices contains exactly nC2 edges. (n*(n-1)/2 edges)
• A complete graph of ‘n’ vertices is represented as Kn
• Every complete graph is a connected graph, however, vice versa is not necessary.
• In a Complete graph, the degree of every node is n-1, where, n = number of nodes.
5
Graph-theoretic Graphics
Undirected: A graph in which all the edges are bi-directional.
The edges do not point in a specific direction.
Trivial Graph: If a finite graph has only a single vertex and no edge, it is known as a trivial graph.
7
Graph-theoretic Graphics
Loop:
• A loop (also called a self-loop) is an edge that connects a vertex to
itself.
• An edge with both ends as the same vertex.
• Although all loops are cycles, not all cycles are loops. Because,
cycles do not repeat edges or vertices except for the starting and
ending vertex.
8
Graph-theoretic Graphics
Isomorphic :
Two graphs G1=(V1,E1) and G2=(V2,E2) are isomorphic if there exists a bijective mapping between
the vertices in V1 and V2 and there is an edge between two vertices of one graph if and only if there is
an edge between the two corresponding vertices in the other graph.
Checklist
• Are the number of vertices in both graphs the same?
• Yes, both graphs have 4 vertices.
• Are the number of edges in both graphs the same?
• Yes, both graphs have 4 edges.
• Is the degree sequence in both graphs the same?
• Yes, each vertex is of degree 2.
• If the vertices in one graph can form a cycle of length k, can we
find the same cycle length in the other graph?
• Yes, each graph has a cycle of length 4.
• If answer is yes to all four of the above questions, then the graphs
are isomorphic.
• In other words, they are the equivalent graphs just in different
forms.
9
Graph-theoretic Graphics
10
Graph-theoretic Graphics
• The graph-theoretic distance (or geodesic distance) between connected nodes u
and v is the sum of the weights of the edges in any shortest path connecting the
nodes.
• If no such path exists (i.e., if the vertices lie in different connected components), then
the distance is set equal to infinity.
• In a grid graph the distance between two vertices is the sum of the "vertical" and
the "horizontal" distances.
• The matrix dij consisting of all distances from vertex vi to vertex vj is known as the all-
pairs shortest path matrix, or the graph distance matrix.
11
Graph-theoretic Graphics
Adjacency matrix:
• An adjacency matrix is a 2D array of V x V vertices. Each row and column represent a vertex.
• If the value of any element a[i][j] is 1, it represents that there is an edge connecting vertex i and vertex j.
Undirected Graph
Weighted Undirected Graph Directed Graph
• The nodes and branches of a tree may have various kinds of information associated
with them like estimate the amount of evolution that takes place between each node
on the tree, which can be represent as branch lengths (or edge lengths).
• Trees with branch lengths are sometimes called weighted trees.
13
Graph-theoretic Graphics
• A tree is a graph in which any two nodes are connected by
exactly one path.
• Trees are acyclic connected graphs.
• Trees may be directed or undirected.
• A tree with one node labeled root is a rooted tree.
• Directed trees are rooted trees, the root of a directed tree is the
node having no incoming edges.
• A hierarchical tree is a directed tree with a set of leaf nodes
representing a set of objects and a set of parent nodes
representing relations among the objects.
• In a hierarchical tree, every node has exactly one parent, except
for the root node, which has one or more children and no parent.
• Examples of hierarchical trees: Decision-trees A B C D E
• A Spanning tree is an undirected geometric tree, that will have
n − 1 edges that define all distances between n nodes. Shorthand Representation of
Tree: (((A,B),C),(D,E))
• A Minimum Spanning Tree (MST) has the shortest total edge
length of all possible spanning trees.
14
Graph-theoretic Graphics
Cladograms, Ultrametric trees and additive trees,
Cladograms :
• Branch Lengths are meaning less.
• Shows the evolutionary relations ship of nodes A B C D E
15
Graph-theoretic Graphics
Ultrametric Trees (Ultrametric spaces or Chronogram)
• An Ultrametric tree is a rooted tree with edge lengths where all
leaves are equidistant from the root.
• Ultrametric trees represent the molecular clock which states that the
rate of mutation is the same across all lineages of the tree.
• The term "Ultrametric" refers to a specific type of metric space where
the distance between any two points is always less than or equal
to the maximum of the distances from either point to a third fixed
point, the metric satisfies a stronger form of the triangle inequality.
• Mathematically, for points x, y, z in the space, the Ultrametric
inequality is given by:
d(x, y) ≤ max(d(x, z), d(y, z))
• In an Ultrametric tree, the graph-theoretic distances take at most n − 1
possible values, where n is the number of leaves.
• Ultrametric trees have applications in computer science, particularly in
hierarchical clustering algorithms. They are also used in
mathematical analysis and the study of p-adic numbers.
16
Graph-theoretic Graphics
Additive Trees (Phylogram or Additive hierarchical clustering or Additive
binary trees)
• Branch Lengths measure evolutionary distance.
• The rate of evolution vary over time.
• Additive trees possess the additive property, which means that the distance
between two leaves (data points) is equal to the sum of the edge lengths
along the unique path connecting those leaves in the tree.
• Mathematically, for leaves i, and j, and their common ancestor k:
d(i,j) ≤ (d(i,k) + d(j, k))
• Additive trees are widely used in clustering analysis, classification, and
visualization of relationships within datasets.
• They are applied in fields such as
• Bioinformatics to represent evolutionary relationships,
• Linguistics for language classification, and
• Various domains for exploratory data analysis.
17
Graph-theoretic Graphics
(a) Ultrametric and (b) additive trees along with their corresponding path-length matrices.
18
Graph-theoretic Graphics - Graph Drawing
• When a connected graph can be drawn without any edges crossing, it is called
planar.
• When a planar graph is drawn in this way, it divides the plane into regions called
faces.
• The graph above has 3 faces (include the “outside” region as a face).
• The number of faces does not change no matter how you draw the graph (as long
as you do so without the edges crossing), so it makes sense to ascribe the number
of faces as a property of the planar graph.
19
Graph-theoretic Graphics - Graph Drawing
• If you try to count faces using the graph on the left, you might say there are 5 faces
(including the outside).
• But drawing the graph with a planar representation shows that in fact there are only
4 faces.
• Eulers formula:
• For any connected planar graph with v vertices, e edges, and f faces
v– e+f=2
20
Graph-theoretic Graphics - Graph Drawing
• The graph G has 6 vertices with degrees 2, 2, 3, 4, 4, 5
• How many edges does G have?
• Could G be planar?
• If so, how many faces would it have.
• Solution:
No.of edges =
(2 + 2 + 3 + 4 + 4 + 5) / 2 = 10
It could be planar,
By using Euler's formula v– e+f=2
6 – 10 + f = 2
f=6
To make sure that it is actually planar though, we would need to draw a graph with
those vertex degrees without edges crossing.
This can be done by trial and error (and is possible).
21
Graph-theoretic Graphics - Graph Drawing
• Drawing graphs is more than a theoretical exercise.
• Finding compact planar drawings of graphs representing electrical circuits is a critical
application in the semiconductor industry. [If the circuit can be redrawn without any
wires crossing each other, then it is planar].
22
Graph-theoretic Graphics - Graph Drawing
• The graph-drawing (or graph-layout) problem is as follows.
• Given a planar graph, how do we produce an embedding on the plane or
sphere? And if a graph is not planar, how do we produce a planar layout that
minimizes edge crossings?
• Different types of graphs require different algorithms for clean layouts
• Hierarchical Trees
• Spanning Trees
• Networks
• Directed Graphs
• Tree Maps
23
Graph-theoretic Graphics - Graph Drawing
Hierarchical Trees:
• Suppose if we are given a recursive list of single parents and their children.
• In this list, each child has one parent and each parent has one or more children.
• One node, the root, has no parent.
• This tree is a directed graph because the edge relation is asymmetric.
25
Graph-theoretic Graphics - Graph Drawing
Hierarchical Trees:
• The data are adapted from weblogs of a small
website.
• The thicknesses of the branches of the tree are
proportional to the number of visitors navigating
between pages represented by nodes in the tree.
26
Graph-theoretic Graphics - Graph Drawing
Hierarchical Trees:
• Suppose if the nodes of a tree are ordered by an external variable such as
joining or splitting distance, then locate them on a scale instead of using
paternity (parent and child relation ship) to determine ordering.
• An inverted tree-shaped structure, called the dendrogram.
• There are two types of hierarchical clustering:
• Agglomerative: The data points are clustered using a bottom-up approach
starting with individual data points.
• Divisive: The top-down approach is followed where all the data points are
treated as one big cluster and the clustering process involves dividing the one big
cluster into several small clusters.
29
Graph-theoretic Graphics - Graph Drawing
Hierarchical Trees:
• A directed geometric tree with one root having many children.
• Such a tree may represent a flow from a source at the root branching to sinks at the
leaves.
• Example: Water and migration flows
30
Graph-theoretic Graphics - Graph Drawing
Treemaps:
• Used to identify categories and the proportional size of categories in a data set.
• A treemap visualizes large amounts of hierarchically structured data.
• The structure illustrates the hierarchy of the data content and the area of the
rectangle is proportionate to the amount of data it represents.
32
Graph-theoretic Graphics - Graph Drawing
33
Graph-theoretic Graphics - Graph Drawing
Displaying region-wise customer complaints about a product
Suppose there are 10 different types of complaints (assume these are denoted as C1 to
C10) about a product and the company wants to visualize which complaints are relevant
to a region then in such a case a treemap could be used as shown below.
Here, it can be clearly seen how different regions have specific types of user complaints.
34
Graph-theoretic Graphics - Graph Drawing
Showcasing category-wise product availability of mobile phones
Let us assume that there are four categories of mobile phones with their market share
percentages i.e., Low-end (up to 10,000 INR – 15%), Mid-Range (10,000-25000 INR-
55%), Premium (above 25,000 to 50,000 INR-25%), and Top-end (above 50,000 INR-
10%). Construct the treemap and draw your insights.
35
Graph-theoretic Graphics - Graph Drawing
Explore customer segmentation for a product
• Usually, companies for apparel or personal products divide their customers based on
their age.
• This way they can categorize their products and the product variants separately for
each age group.
• In the case of this treemap, the company could decide whether to launch more
products for particular customer segments based on the distribution.
36
Graph-theoretic Graphics - Graph Drawing
Continent Country Area
Northern Canada 9976140
America
United States 9372610
Greenland 2175600
Cameroon 475440
Zimbabwe 390580
Mongolia 1565000
India 3287590
Ukraine 603700
Poland 312683
Germany 356910
http://6.anychart.com/products/anychart/docs/users-guide/Tree-Map-Chart.html
37
Graph-theoretic Graphics - Graph Drawing
38
Graph-theoretic Graphics - Graph Drawing
• Treemap charts can be used for a variety of presentation types, industries, and areas of
study.
• For Business Analysis: Treemap charts can help businesses compare their sales
numbers of different models and brands. Such businesses will employ treemap charts to
visualize organizational structure, revenue breakdowns, market segmentations, and other
factors over a certain period of time.
• File Systems: Treemaps can identify the allocation of storage space in file systems. These
charts also enable users to identify large data sets, such as files or folders that can occupy
excessive space, through trends and patterns in the data chart.
• Inventory of different trends within a population: Treemap charts can depict literacy
rates or population densities in certain geographic areas over a specific time period.
• Portfolio Management: Treemap charts are also a useful tool for investors in order to
analyze portfolio allocations and assess how their investments are distributed across
resource categories and industries.
• Social Sciences: Researchers and scientists can use treemap charts to refer to
demographic information, inventory of animals, etc. This data chart can help facilitate the
exploration of population trends and other related factors among these distributions. 39
Graph-theoretic Graphics - Graph Drawing
• Treemaps are a good choice for categorical data visualization
• Treemaps do not support data with negative numbers.
• A treemap ignores negative values.
• Alternatives to Treemaps
• When there are too many categories to visualize and the focus is more on
finding the top ‘n’ categories based on a value or there is simply no hierarchy in
the data to be plotted. In such cases, treemaps prove to be difficult to read and
ineffective.
• A Bar chart can replace a treemap where the data to be plotted has one
quantitative and one categorical variable.
• A Scatter plot could be a replacement where the plotted data has two
quantitative variables.
• Example:
• To identify products with higher sales volume and profits, a 2D scatter plot is
a better option since both variables are quantitative.
• On the other hand, a bar chart could be a better choice if we only intend to
plot sales volume for different products or total revenue. 40
Graph-theoretic Graphics - Graph Drawing
Treemap Problems
Too disorderly
What does adjacency mean?
Aspect ratios uncontrolled leads to lots of skinny boxes
that clutter
Hard to understand
Must mentally convert nesting to hierarchy descent
Color not used appropriately
In fact, is meaningless here
Wrong application
Don’t need all this to just see the largest files in the OS
41
High-dimensional Data Visualization
• Data sets of dimensions 1,2,3 are common
• Number of variables per class
• 1 - Univariate data
• 2 - Bivariate data
• 3 - Trivariate data
• >3 – Hypervariate / Multivariate data
• One of the biggest challenges in data visualization is to find general representations
of data that can display the multivariate structure of more than two variables.
• Several graphic types like mosaic plots, parallel coordinate plots, trellis displays,
and the grand tour have been developed over the course of the last three
decades.
42
High-dimensional Data Visualization
Mosaic Plot:
• To draw a mosaic plot, begin by placing one categorical variable along the x axis and
subdivide the x axis by the relative proportions that make up the categories.
• Then place the other categorical variable along the y axis and, within each category
along the x axis, subdivide the y axis by the relative proportions that make up the
categories of the y variable.
• The result is a set of rectangles whose areas are proportional to the number of cases
representing each possible combination of the two categorical variables.
43
High-dimensional Data Visualization
• A contingency table is simply a table that displays a count (frequency) in each cell
that resides at the column and row intersections of two or more categorical variables.
• Consider a group of individuals for whom data was collected regarding two variables:
hair color (black, brown, red, and blond) and eye color (brown, blue, hazel, and
green).
• Then divide it into horizontal sections based on the second variable, in this case eye color, and once
again add some space between them.
High-dimensional Data Visualization
49
High-dimensional Data Visualization
Example: Survival on the Titanic
On Sunday, April 14th, 1912 at 11:40pm, the RMS
Titanic struck an iceberg in the North Atlantic. Within
two hours the ship had sunk. At best reckoning 705
survived the sinking, 1,523 did not.
The Data
• There is very good documentation on who survived
and who did not survive the sinking of the Titanic.
• Passengers on the Titanic, cross-classified by:
• Class: 1st, 2nd, 3rd, Crew.
• Sex: Male, Female.
• Age: Child, Adult.
• Survived: No, Yes.
50
High-dimensional Data Visualization
51
High-dimensional Data Visualization
Example: Sexual Discrimination at Berkeley
• In the 1980s, a court case brought against the University of California at Berkeley by
women seeking admission to graduate programs there.
• The women claimed that the proportion of women admitted to Berkeley was much
lower than that for men, and that this was the result of discrimination.
52
High-dimensional Data Visualization
The University Case
• The Dean of Letters and Science at
Berkeley was a famous statistician (Peter
Bickel) and he was able to argue that the
difference in admissions rates was not
caused by sexual discrimination in the
Berkeley admissions policy, but was
caused by the fact that males and
females generally sought admission to
different departments.
• The Dean broke the admissions data
down by department and showed that
within each program there was no
admission discrimination against women.
Indeed, there seemed to be some
admissions bias in favour of women. 53
High-dimensional Data Visualization
• The widths of the boxes are proportional to the percentage of females and males, respectively.
• In fact, 41% of applicants were female and 59% were male.
• The heights of the boxes are proportional to percent admitted.
• In fact, 45% of the male applicants were admitted, while only 30% of the female applicants were
admitted.
• This seems to show a large gender-bias in admission.
• To make the plot easier to interpret, the boxes for admitted females and males are colored blue while the
not admitted females and males are colored pink.
• It is easy to see that females’ blue box on the left is much shorter than the males’ blue box on the right
54
High-dimensional Data Visualization
• To understand admission pattern, the university department of applications was considered.
• In the following plot, the departments are shown across the plot in different colors, from department A
on the left in pink to department F on the right in yellow.
• The percentage of applicants to each department is proportional to the width of the bars.
• It is obvious that departments A and C have the largest number of applicants and departments B and
E have the smallest.
55
High-dimensional Data Visualization
Stratification on department:
• It appears that most departments have no gender bias,
and those departments that are biased favor women.
How can this be?
• First, note that depts A and B have very few female
applicants (the columns are narrow).
• It is also relatively easy to get into those departments---
the proportion rejected is lower than other departments,
especially F.
• So one explanation is that more males get in because
they are applying to the hungrier, perhaps fastest-
growing, departments.
• One problem with the mosaic plot in this context is that
when a proportion is very small, the corresponding box
is nearly invisible.
• Unusually large cells are emphasized in a mosaic plot,
while unusually small cells are hidden.
56
High-dimensional Data Visualization
Mosaic Plot vs Treemaps:
• To draw a mosaic plot, begin by placing one categorical variable along the x axis and
subdivide the x axis by the relative proportions that make up the categories.
• Then place the other categorical variable along the y axis and, within each category
along the x axis, subdivide the y axis by the relative proportions that make up the
categories of the y variable.
• The result is a set of rectangles whose areas are proportional to the number of cases
representing each possible combination of the two categorical variables.
57
High-dimensional Data Visualization
Trellis Displays (Lattice Graphics / Lattice Displays)
• Trellis Graphics is a family of techniques for viewing complex, multi-variable data
sets.
• The techniques were given the name Trellis because they usually result in a
rectangular array of plots, resembling a garden trellis.
• A number of statistical software systems provide multi-panel conditioning plots
under the name Trellis plots or Cross plots.
• Trellis displays use a grid like structure to plot the data conditioned on certain
subgroups.
• Each small plot in the grid represents a subset of the data, allowing for the comparison
of multiple conditions or variables simultaneously.
• To make plots comparable across rows and columns, the same scales are used in
all the panel plots.
• This technique is particularly useful for exploring and understanding complex
datasets.
58
High-dimensional Data Visualization
R : lattice package
60
High-dimensional Data Visualization
61
High-dimensional Data Visualization
Example: (0,1,-1,2)
0 0 0 0
63
High-dimensional Data Visualization
64
High-dimensional Data Visualization
65
High-dimensional Data Visualization
• https://r-graph-gallery.com/parallel-plot-ggally.html - R
• https://www.analyticsvidhya.com/blog/2021/11/visualize-data-using-parallel-coordinates-plot/ - Python
# R Libraries
library(GGally)
# Plot
ggparcoord()
# Python # Python
# Using Pandas # Using Plotly Express interface
pd.plotting.parallel_coordinates() import plotly.express as px
pd.plotting.parallel_coordinates() px.parallel_coordinates()
With the pandas interface, we have 2 issues
1. Cannot control the scale of individual axes #Plotly’s graph_objects interface
2. Cannot label the (poly-)lines inline import plotly.graph_objects as go
go.Figure(data= go.Parcoords())
66
High-dimensional Data Visualization
• The most interesting aspects in using parallel coordinate plots are the
investigation of groups/clusters, outliers, and structures over many variables
at a time.
• Three main uses of parallel coordinate plots in exploratory data analysis :
• Overview: An ideal tool to get a first overview of a data set.
• Profiles: Used to visualize the profile of a single case via highlighting.
• Profiles are not only restricted to single cases but can be plotted for a whole
group, to compare the profile of that group with the rest of the data.
• Monitor: When working on subsets of a data set parallel coordinate plots can
help to relate features of a specific subset to the rest of the data set.
67
High-dimensional Data Visualization
Sorting (ordering) and Scaling Issues:
• Especially useful for variables which either have an order such as time or all share
a common scale.
• Ordering: The order of the axes is critical for finding features, and in typical
data analysis and data visualization many reordering's will need to be tried.
• Scaling: The most important scaling option is to either individually scale the
axes or to use a common scale over all axes.
• Scaling options define the alignment of the values, which can be aligned at:
• The mean
• The median
• A specific case
• A specific value
68
High-dimensional Data Visualization
69
Multivariate Data Glyphs: Principles and Practice
70
Multivariate Data Glyphs: Principles and Practice
• In the context of data visualization, a glyph is the visual representation of a piece
of data where the attributes of a graphical entity are dictated by one or more
attributes of a data record.
• Glyphs adds extra dimensions of data to visualization.
• A glyph consists of a graphical entity with p components, each of which may have r
geometric attributes and s appearance attributes.
• Geometric attributes: shape, size, orientation, position, direction / magnitude of
motion
• Appearance attributes: color, texture, and transparency
• Demo
71
Multivariate Data Glyphs: Principles and Practice
Mappings
• List of graphical attributes to which data values can be mapped are
• Position (1-, 2-, or 3-D)
• Size (length, area, or volume),
• Shape, orientation,
• Material (hue, saturation, intensity, texture, or opacity),
• Line style (width, dashes), and
• Dynamics (speed of motion, direction of motion, rate of flashing).
• Mappings can be classified as follows:
• One-to-one mappings: Each data attribute maps to a distinct and different
graphical attribute;
• One-to-many mappings: Redundant mappings are used to improve the accuracy
and ease at which a user can interpret data values; and
• Many-to-one mappings: Several or all data attributes map to a common type of
graphical attribute, separated in space, orientation, or other transformation.
72
Multivariate Data Glyphs: Principles and Practice
• Profiles: Height and color of bars.
• Stars: Length of evenly spaced rays
emanating from center.
• Stars and Anderson/metroglyphs:
Length of rays.
• Stick figure icons: Length, angle, color
of limbs.
• Trees: Length, thickness, angles of
branches; branch structure derived from
analyzing relations between dimensions.
• Autoglyph: color of boxes.
• Boxes: Height, width, depth of first box;
height of successive boxes.
• Faces: Size and position of eyes, nose,
mouth; curvature of mouth; angle of
eyebrows.
73
Multivariate Data Glyphs: Principles and Practice
• Arrows: length, width, taper, and color
of base and head.
• Weathervanes: Level in bulb, length of
flags.
• Circular profiles: Distance from center
to vertices at equal angles.
• Bugs: wing shapes controlled by time
series; length of head spikes
(antennae); size and color of tail; size of
body markings.
• Wheels: Time wheels create ring of
time series plots, value controls
distance from base ring; 3D wheel
maps time to height, variable value to
radius.
74
Multivariate Data Glyphs: Principles and Practice
Biases in Glyph Mappings When watching a football
Perception-based bias game, we tend to group
Proximity-based bias individuals based on the
colors of their uniforms.
Grouping-based bias
Data Driven
Derived
Glyph Placement
Strategies Ordered
Network
76
Multivariate Data Glyphs: Principles and Practice
Glyph Layout Options / Placement Strategies
• The position of glyphs can convey many attributes of data, including data values or structure (order,
hierarchy), relationships, and derived attributes.
• Data-driven Placement:
• The data are used to compute or specify the location parameters for the glyph.
• The two categories of this strategy class are raw and derived based on whether the original data
values are used directly or whether positions are derived via computations involving these data
values.
• Derived Techniques: Dimension Reduction Techniques include Principal Component Analysis
(PCA), Multidimensional Scaling (MDS), and Self-Organizing Maps (SOMs).
• Resulting display coordinates have no semantic meaning.
77
Multivariate Data Glyphs: Principles and Practice
Glyph Layout Options / Placement Strategies
• Structure implies relationships or connectivity
• Explicit structure: One or more data dimensions driven structure
• Implicit structure: Structure derived from analyzing data
• Common structures: Ordered, Hierarchical, Network/graph
• Each kind of structure can help drive placement algorithm in distinct ways.
78
Multivariate Data Glyphs: Principles and Practice
Glyph Layout Options / Placement Strategies
• Ordered structure may be linear (1-D) or grid-based
(N-D).
• By sorting the data on one or more dimensions, and
using this ordering to specify the glyph placement.
• Various placement patterns for linearly structured
data, including raster, radial, and recursive raster.
79
Multivariate Data Glyphs: Principles and Practice
Hierarchical Structure
• Hierarchical structure in data sets can be explicit or implicit.
• Explicit: Each level of the hierarchy is associated with a single data dimension, and the
branches deriving from this level correspond to some number of distinct ranges for that data
dimension.
• Example: Sales data may have dimensions associated with particular time periods,
geographical locations, sales personnel, and products.
• Different hierarchies are generated depending on the order in which the dimensions are
processed. Other examples of explicit hierarchies are file systems and organizational
charts.
• Implicit: Hierarchies are generated algorithmically using clustering or partitioning
algorithms in conjunction with some N-dimensional distance or similarity metric.
• Given a hierarchical structure, the task is to position glyphs on the display in such a way as
to convey the relationships inherent in the structure.
• Node-link graphs vary by
• Where the root node is relative to the rest of the tree (e.g., centered, top-most)
• Relative direction between a node and its children (e.g., radially outward, horizontal,
vertical, or alternating horizontal and vertical). 80
Multivariate Data Glyphs: Principles and Practice
Graph/Network Structure
• A generalization of hierarchical structure is that of a graph or network, which
consists of a set of nodes (the data points) and a finite set of directed or undirected
links / connections, each of which represents a relationship between a pair of nodes.
• Harder to imply relation with just positioning - need explicit links
• Many factors to consider
• Minimizing crossings
• Uniform node distribution
• Drawing conventions for links
• Centering, clustering subgraphs
81
Linked Views for Visual Exploration
• The basic problem in visualization still is the physical limitation of the 2-D presentation
space of paper and computer screens.
• Four approaches to address this problem and to overcoming the restrictions of 2-D:
1. Create a virtual reality environment or a pseudo-3-D environment by rotation that
is capable of portraying higher-dimensional data at least in a 3-D setting.
2. Project high-dimensional data onto a 2-D coordinate system by using a data
reduction method such as principal component analysis, projection pursuit,
multidimensional scaling, or correspondence analysis.
3. Use a nonorthogonal coordinate system such as parallel coordinates which is less
restricted by the two-dimensionality of paper.
4. Link low-dimensional displays.
Demo : When you click on a point in the scatter plot, the histogram updates to show
the distribution of the corresponding x-coordinate.
82
Linked Views for Visual Exploration
• Linking procedures become particularly effective when datasets are complex, i.e., they
are large (many observations) and/or high-dimensional (many variables), consist of a
mixture of categorical and continuous variables, and have a lot of incomplete
observations (missing values).
83
Linked Views for Visual Exploration
• Linked views in data visualization refer to the coordination or connection between multiple graphical
displays, allowing users to interactively explore and analyze data from different perspectives
simultaneously. The advantages of linked views.
• Easiness of Graphical Displays:
• Enhance the simplicity and clarity of graphical displays.
• Users can easily grasp complex data relationships by observing multiple visualizations simultaneously.
• The interconnected nature helps in presenting information in a visually coherent manner, making it
easier for users to interpret and understand the data.
• Speed of Exploration:
• Facilitate a faster and more efficient exploration of data.
• Users can quickly navigate between different visualizations to uncover patterns, trends, and
relationships.
• Flexibility in Portraying Different Aspects:
• Allows users to portray various aspects of the data seamlessly.
• Comparative Analysis:
• Users can compare not only within the same visualization type but also across different types, enabling a
more comprehensive understanding of the data.
• Interactivity:
• Allowing users to dynamically modify parameters or filters. This interactivity empowers users to focus on
specific subsets of the data or zoom in on interesting patterns, enhancing the depth of exploration.
84
Linked Views for Visual Exploration
Theoretical Structures for Linked Views
• Linking views means that two or more plots share and exchange information with each other.
• To achieve the exchange of information, a linking procedure needs to establish a
relationship between two or more plots.
• Once a relation between two plots has been established, the question is which information
is shared and how the sharing of information can be realized?
• To explore the wide range of possibilities of linking schemes and structures
• A data analysis display D consists of a frame F, a type, and its associated set of graphical
elements G as well as its set of scale representing axes sG, a model X and its scale sX , and a
sample population Ω,
85
Linked Views for Visual Exploration
1. Frame (F): The frame is the outer boundary or container that defines the spatial limits of the
data display. It provides a structure within which the various graphical elements and
components are organized.
2. Type and Graphical Elements (G):
• The type of the data display refers to its overall format or structure, such as bar chart, line
chart, scatter plot, etc.
• The graphical elements (G) are the individual components or marks used to represent
data points within the chosen type. For example, in a bar chart, the bars themselves would
be the graphical elements.
3. Scale-Representing Axes (sG):
• The scale-representing axes are the axes on the display that represent the scales for the
variables being measured.
4. Model (X) and its Scale (sX):
• The model (X) refers to the mathematical or statistical model used to analyze the data.
• The scale (sX) associated with the model represents the range or values of the variables
in the model.
5. Sample Population (Ω):
• The set of data points or observations that are being analyzed and displayed. 86
• It is the dataset from which the information for the display is derived.
Linked Views for Visual Exploration
D = (F, (G, SG), (X , SX), Ω).
• The pair ((X , SX ), Ω) is the data part and (F, (G, SG)) is the plotting part.
• The linking structure controls the exchange and transfer of information between
different plots.
• The concept introduces the idea of an "active plot" and "passive plots."
• The active plot is the one from which changes or messages are initiated.
• The passive plots, on the other hand, receive these messages and respond
accordingly.
• This distinction is analogous to the sender-receiver relationship in communication
theory.
• The definition of data displays and the abstract concept of linking opens the possibility
of defining a linking structure as a set of relations among any two components of the
two displays.
87
Linked Views for Visual Exploration
• A general view on possible linking structures between the active plot D1 and the passive plot
D2 assuming that information sharing is only possible among identical plot layers where
D1 = (Ω1, X1, G1, F1) and D2 = (Ω2, X2, G2, F2)
• Four types of linking structures: • At the type and at the model level the linking
• Linking frames, structures can be further differentiated into data
• Linking types, linking and scale linking, the latter being used
• Linking models, and when scales or scale representing objects are
• Linking sample populations involved in the linking process 88
Linked Views for Visual Exploration
• Sharing and exchanging information between two plots can now be resolved in two different
ways.
• The direct linking scheme from one layer in display D1 to the corresponding layer in display
D2 .
• A combined scheme that first propagates the information internally in the active plot to
the sample population layer; then the sample population link is used to connect the two
displays, and the linked information is then internally propagated in the passive plot to the
relevant layers. Hence the most widely used and most important linking structure is
sample population linking.
89
Visualization Techniques for Linked Views
• Replacement
• Overlaying
• Repetition
• Special Forms
• Replacement
• In replacement mode, when old information is replaced by new information, there
is a risk of losing valuable insights, especially when it comes to subsetting and
conditioning approaches.
• This loss is particularly notable in the context of marginal distributions.
[marginal distribution gives the probabilities of various values of the variables
in the subset without reference to the values of the other variables]
• The user can only compare the current image with a mental copy of the previous
image and hence the comparison might get distorted.
• Especially in the exploratory stage of data analysis for which interactive graphics
are designed, it is helpful to keep track of changing scenarios and the different
plot versions. 90
Visualization Techniques for Linked Views
• Replacement
• Overlaying
• Repetition
• Replacement
• In replacement mode, when old information is replaced by new information, there
is a risk of losing valuable insights, especially when it comes to subsetting and
conditioning approaches.
• This loss is particularly notable in the context of marginal distributions.
[marginal distribution gives the probabilities of various values of the variables
in the subset without reference to the values of the other variables]
• The user can only compare the current image with a mental copy of the previous
image and hence the comparison might get distorted.
• Especially in the exploratory stage of data analysis for which interactive graphics
are designed, it is helpful to keep track of changing scenarios and the different
plot versions.
91
Visualization Techniques for Linked Views
Overlaying
• Common strategy used to look at conditional distribution in area plots.
• The conditional distribution of a variable is the distribution of that variable given
specific conditions or values of other variables. It provides insights into how the
distribution of one variable changes when another variable takes on a certain
value or falls within a certain range.
• Provide framework for comparison between conditional/marginal distributions
• Overlaying creates two problems
• Basic restriction in the freedom of parameter choice for the selected subset since the
plot parameters are inherited from the original plot;
• Occlusion/Overplotting: Part of the original display is hidden by new overlaid plot.