Da Unit-5
Da Unit-5
(Professional Elective - I)
Subject Code: CS513PE
NOTES MATERIAL
UNIT 5 - Data Visualization
DEPARTMENT OF CSE
VIGNAN INSTITUTE OF TECHNOLOGY & SCIENCE
DESHMUKHI
UNIT - V
Data Visualization
Syllabus:
Data Visualization: Pixel-Oriented Visualization Techniques, Geometric Projection
Visualization Techniques, Icon-Based Visualization Techniques, Hierarchical Visualization
Techniques, Visualizing Complex Data and Relations.
Unit-5 Objectives:
1. To explore Pixel-Oriented Visualization Techniques
2. To learn Geometric Projection Visualization Techniques
3. To explore Icon-Based Visualization Techniques
4. To Learn Hierarchical Visualization Techniques
5. To understand Visualizing Complex Data and Relations
Unit-5 Outcomes:
After completion of this course students will be able to
1. To Describe the Pixel-Oriented Visualization Techniques
2. To demonstrate Geometric Projection Visualization Techniques
3. To analyze the Icon-Based Visualization Techniques
4. To explore the Hierarchical Visualization Techniques
5. To compare the Visualizing Complex Data and Relations
Data Visualization
Data visualization is the art and practice of gathering, analyzing, and graphically
representing empirical information.
They are sometimes called information graphics, or even just charts and graphs.
The goal of visualizing data is to tell the story in the data.
Telling the story is predicated on understanding the data at a very deep level, and
gathering insight from comparisons of data points in the numbers
Gain insight into an information space by mapping data onto graphical primitives
Provide qualitative overview of large data sets
Search for patterns, trends, structure, irregularities, and relationships among
data.
Help find interesting regions and suitable parameters for further quantitative
analysis.
Provide a visual proof of computer representations derived.
To save space and show the connections among multiple dimensions, space
filling is often done in a circle segment.
The line plots are nothing but the values on a series of data points will be
connected with straight lines.
The plot may seem very simple but it has more applications not only in machine
learning but in many other areas.
Used to analyze the performance of a model using the ROC- AUC curve.
Bar Plot
This is one of the widely used plots, that we would have seen multiple times not
just in data analysis, but we use this plot also wherever there is a trend analysis
in many fields.
We can visualize the data in a cool plot and can convey the details straight forward
to others.
This plot may be simple and clear but it’s not much frequently used in Data science
applications.
Stacked Bar Graph:
Unlike a Multi-set Bar Graph which displays their bars side-by-side, Stacked Bar
Graphs segment their bars. Stacked Bar Graphs are used to show how a larger
category is divided into smaller categories and what the relationship of each part
has on the total amount. There are two types of Stacked Bar Graphs:
Simple Stacked Bar Graphs place each value for the segment after the previous
one. The total value of the bar is all the segment values added together. Ideal for
comparing the total amounts across each group/segmented bar.
100% Stack Bar Graphs show the percentage-of-the-whole of each group and are
plotted by the percentage of each value to the total amount in each group. This
makes it easier to see the relative differences between quantities in each group.
One major flaw of Stacked Bar Graphs is that they become harder to read the
more segments each bar has. Also comparing each segment to each other is
difficult, as they're not aligned on a common baseline.
Scatter Plot
It is one of the most commonly used plots used for visualizing simple data in
Machine learning and Data Science.
This plot describes us as a representation, where each point in the entire dataset
is present with respect to any 2 to 3 features(Columns).
Scatter plots are available in both 2-D as well as in 3-D. The 2-D scatter plot is
the common one, where we will primarily try to find the patterns, clusters, and
separability of the data.
The colors are assigned to different data points based on how they were present
in the dataset i.e, target column representation.
We can color the data points as per their class label given in the dataset.
Box and Whisker Plot
This plot can be used to obtain more statistical details about the data.
The straight lines at the maximum and minimum are also called whiskers.
Points that lie outside the whiskers will
be considered as an outlier.
The box plot also gives us a description
of the 25th, 50th,75th quartiles.
With the help of a box plot, we can also
determine the Interquartile
range(IQR) where maximum details of
the data will be present
These box plots come under univariate
analysis, which means that we are
exploring data only with one variable.
Pie Chart :
A pie chart shows a static number and how categories represent part of a whole the
composition of something. A pie chart represents numbers in percentages, and the
total sum of all segments needs to equal 100%.
Extensively used in presentations and offices, Pie Charts help show proportions and
percentages between categories, by dividing a circle into proportional segments. Each
arc length represents a proportion of each category, while the full circle represents the
total sum of all the data, equal to 100%.
Donut Chart:
Marimekko Chart:
Chernoff Faces
A way to display variables on a two-dimensional surface, e.g., let x be
eyebrow slant, y be eye size, z be nose length, etc.
The figure shows faces produced using 10 characteristics–head eccentricity,
eye size, eye spacing, eye eccentricity, pupil size, eyebrow slant, nose size,
mouth shape, mouth size, and mouth opening. Each assigned one of 10 possible
values.
Stick Figure
As known as a Sunburst Chart, Ring Chart, Multi-level Pie Chart, Belt Chart, Radial
Treemap.
This type of visualisation shows hierarchy through a series of rings, that are
sliced for each category node. Each ring corresponds to a level in the hierarchy,
with the central circle representing the root node and the hierarchy moving
outwards from it.
Rings are sliced up and divided based on their hierarchical relationship to the
parent slice. The angle of each slice is either divided equally under its parent
node or can be made proportional to a value.
Colour can be used to highlight hierarchal groupings or specific categories.
Treemap:
Word Cloud:
A visualisation method that displays how frequently words appear in a given body of text,
by making the size of each word proportional to its frequency. All the words are then
arranged in a cluster or cloud of words. Alternatively, the words can also be arranged in
any format: horizontal lines, columns or within a shape.
Word Clouds can also be used to display words that have meta-data assigned to them.
For example, in a Word Cloud with all the World's country's names, the population could
be assigned to each name to determine its size.
Colour used on Word Clouds is usually meaningless and is primarily aesthetic, but it can
be used to categorise words or to display another data variable.
Typically, Word Clouds are used on websites or blogs to depict keyword or tag usage.
Word Clouds can also be used to compare two different bodies of text together.
Although being simple and easy to understand, Word Clouds have some major flaws:
Long words are emphasised over short words.
Words whose letters contain many ascenders and descenders may receive more
attention.
They're not great for analytical accuracy, so used more for aesthetic reasons instead.
Source: https://datavizcatalogue.com/index.html