0% found this document useful (0 votes)
47 views35 pages

Parallel Coordinates

Uploaded by

2217055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
47 views35 pages

Parallel Coordinates

Uploaded by

2217055
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 35

Parallel Coordinates

Prepared by
P. Swarna Gowsalya
Assistant Professor
AI & DS
Definition
 A parallel coordinate plot is graphical method where each observation or data

point is depicted as a line traversing a series of parallel axes, corresponding to a


specific variable or dimension.
 This arrangement allows for the exploration of relationships, trends, and

variations that might be obscured in raw data.


 Parallel coordinates is a visualization technique used to display high-

dimensional data in a two-dimensional space.


 In a parallel coordinates plot, each variable is represented by a vertical axis, and

each data point is represented by a line that connects the values of each variable.
 The lines are colored according to a specific variable, which allows for easy

identification of patterns and relationships between variables.


Components of a Parallel Coordinate Plot
A parallel coordinate plot comprises two fundamental components:
 Parallel axes
 Data lines
These components work in tandem to create a visually informative
representation of multivariate data.
Parallel Axes
 The parallel axes in a parallel coordinate plot are the vertical lines running
across the plot.
 Each axis corresponds to a specific variable or dimension within the
dataset.
 These variables can represent diverse attributes, such as time, temperature,
pressure, or any other measurable quantity.
 Each axis serves as a reference for a particular data feature, allowing for
direct comparisons between data points.
Data Lines
 The data lines in a parallel coordinate plot are the connecting lines that
traverse the parallel axes.
 Each data point in the dataset is represented by one of these lines.
 The position where a data line intersects a particular axis corresponds to the
value of the variable represented by that axis for the specific data point.
Example 1:
Example 2:
Example 3:

https://www.highcharts.com/demo/highcharts/parallel-co
ordinates
Scenarios Where Parallel Coordinate Plots Are Particularly Useful

 Multidimensional data exploration


 Feature analysis
 Anomaly detection
 Cluster identification
 Dimensionality reduction validation
 Business Intelligence
 Scientific data visualization
Advantages
 Multidimensional Visualization: Allows for the visualization of high-
dimensional data in a two-dimensional space.
 Pattern Recognition: Makes it easier to identify patterns, trends, and
correlations among multiple variables.
 Comparative Analysis: Facilitates the comparison of data points across
multiple dimensions.
Disadvantages

 Scalability: Can become cluttered and hard to interpret with a large number
of variables or data points.
 Overlapping Lines: With many data points, lines can overlap, making it
difficult to distinguish individual observations.
 Normalization: Scaling axes individually can sometimes distort the
relationships between variables.
Trellis Display
 Display any one of the large variety of 1D, 2D and 3D plot types in a trellis layout of panels, where

each panel displays the selected plot type for a level or interval on additional discrete or continuous
conditioning variables
 Panels are laid out into columns, rows and pages

 Mapping of Variables

1. Axis variable
1.1 Mapped to one of the coordinates in the panels
2. Conditioning variable
2.1 Mapped to a horizontal bar at the top of each panel, representing one of its levels (discrete
variable) or intervals (continuous variable)
2.2 Continuous variables have to be divided into intervals
2.3 The intervals are usually overlapped a little to improve the effectiveness of visualizing
interrelationships
3. Superposed variable
3.1 Mapped to color or symbol of points in the panels
Characteristics
 Grid Layout: The panels are arranged in a grid layout, making it easy to
compare across different subsets of the data.
 Conditioning Variables: One or more variables are used to define the
subsets of the data displayed in each panel.
 Consistency: Each panel uses the same scales and graphical representation,
ensuring consistency and comparability.
Types of trellis display
 Trellis Plots for Quantitative Variables
 Trellis Plots for Categorical Variables
 Trellis Plots for Y versus X
 Trellis Plots for Z versus X and Y
Trellis Plots for Quantitative Variables
 To display characteristics of a single quantitative variable, the Numeric
Y trellis plot may be used.
 It creates box-and-whisker plots, frequency histograms, and normal
probability plots.
 For example, the plot below contains histograms showing the distribution
of weight for 514 attendees at selected fitness centers in California.
 The sample is segmented according to the age and gender of the attendees.
Controls at the top of the window let the analyst change the cells in the
histograms.
Trellis Plots for Categorical Variables
 To display characteristics of a single categorical variable, the Categorical
Y trellis plot may be used.
 It creates barcharts, piecharts and donut plots.
 The plot below shows the percentage of males and females attending the
fitness centers in the samples illustrated above, segmented by age.
Trellis Plots for Y versus X
 To display how the relationship between 2 quantitative variables changes
with the values of 1 or 2 conditioning variables, the Y versus X trellis plot
may be used.
 The plot below shows the relationship between height and weight for
individuals in the fitness center example.
 Linear regression lines have been fit to each segment. Nonlinear regression
models and nonparametric smoothers may be plotted instead.
Trellis Plots for Z versus X and Y

 To display how the relationship between 3 quantitative variables changes


with the values of 1 or 2 conditioning variables, the Z versus X and Y trellis
plot may be used.
 The plot below shows the relationship between height, bicep size and
weight for individuals in the fitness center example.
 A regression model has been fit to the data in each section of the plot.
 Bubble charts or a LOWESS smooth may be plotted instead.
Advantages

 Multidimensional Visualization: Allows for the visualization of high-


dimensional data in a two-dimensional space.
 Pattern Recognition: Makes it easier to identify patterns, trends, and
correlations among multiple variables.
 Comparative Analysis: Facilitates the comparison of data points across
multiple dimensions.
Disadvantages
 Scalability: Can become cluttered and hard to interpret with a large number
of variables or data points.
 Overlapping Lines: With many data points, lines can overlap, making it
difficult to distinguish individual observations.
 Normalization: Scaling axes individually can sometimes distort the
relationships between variables.
Scatterplot Matrices

 Many people are familiar with scatterplots.

 These simple plots are used to compare two dimensional data by plotting

points on an xy-Cartesian plane.


 Three dimensional can be compared using a three dimensional scatterplot

which would use the xyz-Cartesian space instead.


 For data that has more than three dimensions, these plots must be expanded

to a matrix.
Cont’d
 A scatterplot matrix is an n x n matrix that has all the rows and columns labeled by the n

dimensions.
 Each cell (i, j) in the matrix is a scatterplot with the dimension on the y-axis and the

dimension on the x-axis.


 Because this matrix has all n dimensions on the rows and columns, the matrix is

symmetric across the diagonal.


 This means that the cell (j, i) is the same scatterplot as (i, j) except which of the two axes

the dimensions are on .


 Scatterplot matrices work well for comparing a large number of records and

dimensions.
 However, these matrices only provides information about how two dimensions relate.

 Comparing three dimensions requires an understanding about how all three

dimensions relate to the other two dimensions.


Hyper boxes
 Like the scatterplot matrix, it also involves pairwise 2D plots of variables.
 A hyper box is a 2D depiction of a k-D box.

1. A very constrained picture, starting with k line segments radiating


from a point which are contained within an angle less than 180°.
2. The length of the line segments and the angles between them are
arbitrary, although they should ideally follow the banking to 45° principle (a
line segment with an orientation of 45° or -45° is the best to convey linear
properties of the curve).
Hyperbox (Cont’d)
Properties
 Contains lines and k(k-1)/2 faces

e.g. there are 52=25 lines and 5(5-1)/2=10 faces in a 5-D hyperbox
 For each line in a hyperbox, there are k-1 other lines with the same length

and orientation; lines with the same length and orientation form a direction
set.
Thank You

You might also like