Data Analytics - Unit-V
Data Analytics - Unit-V
(Data Analytics)
Data Visualization: Pixel-Oriented Visualization Techniques, Geometric Projection
Visualization Techniques, Icon-Based Visualization Techniques, Hierarchical
Visualization Techniques, Visualizing Complex Data and Relations.
Case Study:
All Electronics maintains a customer information table, which
consists of 4 dimensions: income, transaction_volume and age.
We analyse the correlation between income and other attributes by
visualization.
We sort all customers in income in ascending order and use this order
to layout the customer data in the 4 visualization windows as shown
in fig.
Geometric Projection visualization techniques:
A drawback of pixel-oriented visualization techniques is that they cannot
help us much in understanding the distribution of data in a
multidimensional space.
Geometric projection techniques help users find interesting projections of
multidimensional data sets.
Geometric projection techniques are a good choice for finding outliers and
correlation between attributes in multivariate data. A geometric projection
technique does this by using transformations and projections of the data.
When using large data sets a clustering algorithm is usually necessary to
apply before the visualization technique to avoid cluttered and unclear data
caused by the too much information. Some widely used geometric projection
techniques are:
Scatter plots:
A scatter plot is one of the most common visualization techniques and can
be visualized both in 3D and 2D. The scatter plot visualizes different
attributes of the data on the x,y axis for 2D visualizations and also along the
z-axis in 3D. Scatter plots are usable to find correlations between attributes
in arbitrary small data sets. If the data set gets too big or contains too many
attributes the scatter plot gets cluttered and hard to interpret.
HyperSlice:
HyperSlice is a new method for the visualization of scalar functions of many
variables. With this method the multi-dimensional function is presented in a
simple and easy to understand way in which all dimensions are treated
identically. The central concept is the representation of a multi-dimensional
function as a matrix of orthogonal two-dimensional slices. These two-
dimensional slices lend themselves very well to interaction via direct
manipulation, due to a one to one relation between screen space and
variable space.
Parallel coordinates:
To visualize n-dimensional data points, the parallel coordinates
technique draws n equally spaced axes, one for each dimension,
parallel to one of the display axes.
A data record is represented by a polygonal line that intersects each
axis at the point corresponding to the associated dimension value.
A major limitation of the parallel coordinate’s technique is that it
cannot effectively show a data set of many records.
Stick figures:
It maps multidimensional data to five –piece stick figure, where each figure
has 4 limbs and a body.
Two dimensions are mapped to the display (x and y) axes and the
remaining dimensions are mapped to the angle and/or length of the limbs.
Tree Map
All news stories are organized into seven categories, each shown in a large
rectangle of a unique color. Within each category (i.e., each rectangle at the
top level), the news stories are further partitioned into smaller
subcategories.
Data visualization choices:
Five factors that influence data visualization choices:
Audience: It’s important to adjust data representation to the specific target
audience.
Content: The type of data you are dealing with will determine the tactics.
Context: You can use different data visualization approaches and read data
depending on the context.
Dynamics: There are various types of data, and each type has a different
rate of change.
Purpose: The goal of data visualization affects the way it is implemented. In
order to make a complex analysis, visualizations are compiled into dynamic
and controllable dashboards that work as visual data analysis techniques
and tools.
Tools for Data visualization:
Data visualization tools for different types of users and purposes.
IBM Watson Analytics is known for its NLP capabilities. The platform
literally supports conversational data control a longside strong dashboard
building and data reporting tools.
Tools for complex data visualization:
The growing adoption of connected technology places a lot of opportunities
before the companies and organizations. To deal with large volumes of multi-
source often unstructured data, businesses search for more complex
visualization and analytics solutions. This category includes Power BI,
Kibana and Grafana.