Data Analytics-Data Visualization UNIT-V
Data Analytics-Data Visualization UNIT-V
KARMNAGAR - 505481
(Approved by AICTE, New Delhi and Affiliated to JNTU, Hyderabad)
Data visualization
• Data visualization is the practice of translating information into a visual context, such as a map or
graph, to make data easier for the human brain to understand and pull insights from.
• The main goal of data visualization is to make it easier to identify patterns, trends and outliers in
large data sets. The term is often used interchangeably with others, including information graphics,
information visualization and statistical graphics.
• Data visualization is one of the steps of the data science process, which states that after data has been
collected, processed and modeled, it must be visualized for conclusions to be made. Data visualization
is also an element of the broader data presentation architecture (DPA) discipline, which aims to
identify, locate, manipulate, format and deliver data in the most efficient way possible.
Methods
✓ Direct visualization
✓ Scatter plot and scatter plot matrices
✓ Landscapes Projection pursuit technique: Help users find meaningful projections of multidimensional
data
✓ Prosection views
✓ Hyperslice
✓ Parallel coordinates
Direct visualization
Direct visualizations of image data make use of the images in their original visible format. The first technique,
the slice histogram, arranges slices of images as histograms, organized by both visual and non-visual variables.
Prepared by N.Venkateswaran, Associate Professor, CSE Dept. JITS - Karimnagar Page 3
JYOTHISHMATHI INSTITUTE OF TECHNOLOGY AND SCIENCE
KARMNAGAR - 505481
(Approved by AICTE, New Delhi and Affiliated to JNTU, Hyderabad)
Scatter Plots
✓ A scatter plot displays 2-D data points using Cartesian coordinates.
✓ A third dimension can be added using different colors or shapes to represent different data points
✓ Through this visualization, in the adjacent figure, we can see that points of types “+” and “×” tend to be
collocated.
✓ Scatterplots show many points plotted in the Cartesian plane. Each point represents the values of two
variables. One variable is chosen in the horizontal axis and another in the vertical axis.
✓ The technique of scatter plot is inefficient if the number of dimensions in a data set is greater than
four. So enhanced techniques of scatter plot is called scatter-plot matrix.
Scatterplot Matrices
Parallel Coordinates
✓ The scatter-plot matrix becomes less effective as the dimensionality increases.
✓ Another technique, called parallel coordinates, can handle higher dimensionality
✓ n equidistant axes which are parallel to one of the screen axes and correspond to the attributes (i.e. n
dimensions)
✓ The axes are scaled to the [minimum, maximum]: range of the corresponding attribute
✓ Every data item corresponds to a polygonal line which intersects each of the axes at the point which
corresponds to the value for the attribute
Icon based visualization techniques makes use of small icons for representing multidimensional data vales.
Chernoff Faces
✓ It displays multidimensional data in the form of cartoon human face upto 18 dimensions.
✓ It specifies the dimensional value of various components like eyes, ears, mouth and nose by their
shape, position and orientation.
✓ Moveover, it also utilizes the mindset of a human in identifying the differences between facial
features.
✓ A way to display variables on a two-dimensional surface, e.g., let x be eyebrow slant, y be eye size, z be
nose length, etc.
✓ The figure shows faces produced using 10 characteristics–head eccentricity, eye size, eye spacing, eye
eccentricity, pupil size, eyebrow slant, nose size, mouth shape, mouth size, and mouth opening): Each
assigned one of 10 possible values.
Stick Figure
✓ It maps multidimensional data to five –piece stick figure, where each figure has 4 limbs and a body.
✓ 2 dimensions are mapped to the display axes and the remaining dimensions are mapped to the angle
and/ or length of the limbs.
Methods:
• Dimensional Stacking
• Worlds-Within-Worlds
• Tree-Map
• Cone Trees
• InfoCube
Dimensional Stacking:
• Partitioning of the n-dimensional attribute space in 2-D subspaces, which are 'stacked' into each other.
• Partitioning of the attribute value ranges into classes. The important attributes should be used on the
outer levels.
• Adequate for data with ordinal attributes of low cardinality
• But, difficult to display more than nine dimensions
• Important to map dimensions appropriately
Visualization of oil mining data with longitude and Latitude mapped to the outer x-,y-axes and ore grade and
depth mapped to the inner x-,y-axes.
Worlds-within-worlds
• Assign the function and two most important parameters to innermost world.
• Fix all other parameters at constant values-draw other (1 or 2 or 3 dimensional worlds choosing these
as the axes)
• Software that uses this paradigm.
• N-Vision: Dynamic interaction through data glove and stereo displays, including rotation, scaling (inner)
and translation (inner/outer).
• Auto Visual: Static interaction by means of queries.
Tree-Map
• Screen-Filling method which uses a hierarchical partitioning of the screen into regions departing on the
attribute values.
• The X- and Y- dimension of the screen are partitioned alternately according o the attribute values
(classes)
Info Cube
• A 3-D visualization techniques where hierarchical information is displayed as nested semi-transparent
cubes.
• The outermost cubes correspond to the top level data, while the sub nodes or the lower level data are
represented as smaller cubes inside the outermost cubes, and so on.
❖ Besides text data, there are also methods to visualize relationships, such as visualizing social network.