DA UNIT V Notes
DA UNIT V Notes
Data Visualization :
• Data visualization aims to communicate data clearly and effectively through graphical
representation.
• Data visualization is the graphical representation of information and data.
• Data visualization has been used extensively in many applications—for example, at work
for reporting, managing business operations, and tracking progress of tasks.
• More popularly, we can take advantage of visualization techniques to discover data
relationships.
Page 1 of
4
Pixel-oriented visualization of four attributes by sorting all customers in income ascending
order
The pixel colors are chosen so that the smaller the value, the lighter the shading. Using pixel based
visualization, we can easily observe the following:
❖ Credit limit increases as income increases;
❖ Customers whose income is in the middle range are more likely to purchase more
from AllElectronics;
❖ There is no clear correlation between income and age.
In pixel-oriented techniques, data records can also be ordered in a query-dependent way. Query-
dependent way visualizes the relevance of the data items with respect to a query. It calculate the
distances between data and query values, combine the distances for each data item into an overall
distance, and visualize the distances for the variables and the overall distance sorted according to
the overall distance. For example, given a point query, we can sort all records in descending order
of similarity to the point query.
Filling a window by laying out the data records in a linear way may not work well for
a wide window. The first pixel in a row is far away from the last pixel in the previous row, though
they are next to each other in the global order. Moreover, a pixel is next to the one above it in the
window, even though the two are not next to each other in the global order. To solve this problem,
we can lay out the data records in a space-filling curve to fill the windows.
Space-filling curve
A space-filling curve is a curve with a range that covers the entire n-dimensional unit
hypercube. Since the visualization windows are 2-D, we can use any 2-D space-filling curve.
Figure 2.11 shows some frequently used 2-D space-filling curves. Note that the windows do not
have to be rectangular.
Page 2 of
4
The circle segment technique: (a) Representing a data record in circle segments. (b) Laying
out pixels in circle segments.
For example, the circle segment technique uses windows in the shape of segments of a circle, as
illustrated in Figure 2.12. This technique can ease the comparison of dimensions because the
dimension windows are located side by side and form a circle.
Drawback of pixel-oriented visualization techniques
❖ A drawback of pixel-oriented visualization techniques is that they cannot help us much in
understanding the distribution of data in a multidimensional space.
❖ For example, they do not show whether there is a dense area in a multidimensional subspace.
Methods
• Scatterplot and scatterplot matrices
• Parallel coordinates
Scatterplot
A scatter plot displays 2-D data points using Cartesian coordinates. A third dimension can be added
using different colors or shapes to represent different data points. Figure shows an example, where
X and Y are two spatial attributes and the third dimension is represented by different shapes.
Through this visualization, we can see that points of types “+” and “_” tend to be colocated.
Page 3 of
4
Visualization of a 2-D data set using a scatter plot
A 3-D scatter plot uses three axes in a Cartesian coordinate system. If it also uses color, it can
display up to 4-D data points (Figure).
For data sets with more than four dimensions, scatter plots are usually ineffective.
Scatter-plot matrix
❖ The scatter-plot matrix technique is a useful extension to the scatter plot.
❖ For an n dimensional data set, a scatter-plot matrix is an n x n grid of 2-D scatter plots that
Page 4 of
4
❖ provides a visualization of each dimension with every other dimension.
❖ Figure shows an example, which visualizes the Iris data set.
Parallel coordinates
❖ Another popular technique, called parallel coordinates, can handle higher dimensionality.
❖ To visualize n-dimensional data points, the parallel coordinates technique draws n equally
spaced axes, one for each dimension, parallel to one of the display axes.
❖ A data record is represented by a polygonal line that intersects each axis at the point
corresponding to the associated dimension value (Figure).
Page 5 of
4
Visualization that uses parallel coordinates
Chernoff faces were introduced in 1973 by statistician Herman Chernoff. They display
multidimensional data of up to 18 variables (or dimensions) as a cartoon human face (Figure).
Chernoff faces help reveal trends in the data. Components of the face, such as the eyes, ears,
mouth, and nose, represent values of the dimensions by their shape, size, placement, and
orientation.
❖ The goal of Chernoff’s faces is to show a bunch of variables at once via facial features like
lips, eyes, and nose size.
Page 6 of
4
Chernoff faces. Each face represents an n-dimensional data point (n ≤ 18)
For example, dimensions can be mapped to the following facial characteristics: eye size, eye
spacing, nose length, nose width, mouth curvature, mouth width, mouth openness, pupil size,
eyebrow slant, eye eccentricity, and head eccentricity.
Drawback
Page 8 of
4
Two dimensions are mapped to the display (x and y) axes and the remaining dimensions are
mapped to the angle and/or length of the limbs. Figure 2.18 shows census data, where age and
income are mapped to the display axes, and the remaining dimensions (gender, education, and so
on) are mapped to stick figures.
Page 9 of
4
If the data items are relatively dense with respect to the two display dimensions, the resulting
visualization shows texture patterns, reflecting data trends.
Hierarchical visualization techniques partition all dimensions into subsets (i.e., subspaces). The
subspaces are visualized in a hierarchical manner.
o Worlds-within-Worlds
o Treemap
o Cone Trees
Worlds-within-Worlds
“Worlds-within-Worlds,” also known as n-Vision, is a representative hierarchical visualization
method. Suppose we want to visualize a 6-D data set, where the dimensions are F,X1, : : : ,X5. We
want to observe how dimension F changes with respect to the other dimensions. We can first fix
the values of dimensions X3,X4,X5 to some selected values, say, c3, c4, c5. We can then visualize
F,X1,X2 using a 3-D plot, called a world, as shown in Figure.
Page 10 of
4
The user then views the resulting changes of the inner world. Moreover, a user can vary the
dimensions used in the inner world and the outer world. Given more dimensions, more levels of
worlds can be used, which is why the method is called “worlds-within worlds.”
Tree-map
As another example of hierarchical visualization methods, tree-maps display hierarchical data as a
set of nested rectangles.
Page 11 of
4
Using a tag cloud to visualize popular website tags
Tag clouds are often used in two ways. First, in a tag cloud for a single item, we can use the size of
a tag to represent the number of times that the tag is applied to this item by different users. Second,
when visualizing the tag statistics on multiple items, we can use the size of a tag to represent the
number of items that the tag has been applied to, that is, the popularity of the tag.
Word Clouds can also be used to compare two different bodies of text together.
Complex Relation
In addition to complex data, complex relations among data entries also raise challenges for
visualization. For example, figure uses a disease influence graph to visualize the correlations
between diseases.
Page 12 of
4
Disease influence graph of people at least 20 years old in the NHANES data set
The nodes in the graph are diseases, and the size of each node is proportional to the prevalence of
the corresponding disease. Two nodes are linked by an edge if the corresponding diseases have a
strong correlation. The width of an edge is proportional to the strength of the correlation pattern of
the two corresponding diseases.
In summary, visualization provides effective tools to explore data. Visualization can be used in
data mining in various aspects. In addition to visualizing data, visualization can be used to
represent the data mining process, the patterns obtained from a mining method, and user interaction
with the data. Visual data mining is an important research and development direction.
Page 13 of
4
DATA VISUALIZATION IN TABLEAU
Data Visualization with tableau is the process of presenting information through visual rendering.
It helps decision-makers find out the relevance among the millions of variables and communicate
concepts and hypotheses.
Data visualization is the graphical representation of information and data. By using visual elements
like charts, graphs, and maps, data visualization tools provide an accessible way to see and
understand trends, outliers, and patterns in data.
What is Tableau?
Tableau provides an easy drag and drop interface that will quickly convert your data into business
insights. It delivers the visualized data in a more appealing format called data visualization
dashboards.
Features of Tableau
The visual analysis in a click: Tableau analyzes data in a logical and easy manner. The data is
converted into visualizations in less development time. Making quick visualizations is a great
advantage of Tableau.
Interactive dashboard: Tableau has a beautiful and interactive dashboard that gives results very
quickly. The dashboard will also show you rich visualizations. The data visualization dashboard
will give you an in-depth knowledge of the data.
Easy use: Some data visualization tools make data analytics hard, but Tableau makes it easier. It
reduces unnecessary complications. Tableau uses a drag and drop interface, which offers a user-
friendly environment. One can use Tableau if he knows the basics of MS Excel.
Direct connection: Tableau connects the users directly to databases, data warehouses and other
sources of data. It does not require any complicated setup.
Deals with big data: Tableau can analyze enormous data effortlessly and visualize it better than
any other DVT.
Publishing and sharing: The dashboard can be published live on the platform from which the
user is accessing it. The results can also be shared live.
Deals with all types of Data: The interface of data visualizations Tableau has a fast data engine
that extracts data from various sources with unequal instincts. All the data is created equal in
Tableau.
Page 14 of
4
Trending in the market: Tableau is growing rapidly in the business analytics market. It is being
used in all industries now. Many top-rated companies like Microsoft, Nokia, and Deloitte use
Tableau to meet their business intelligence requirements.
Interactive visualizations: One of the most important features of Tableau is its ability to create
more beautiful data visualizations types. It produces attractive and functional visualizations that
will help the user in making decisions.
Smart Maps: Tableau also lets you search in the map or lasso data points. It will help to get an
answer to the geographical question “Where ?”
Works across multiple platforms: Tableau can be accessed by the users through desktop,
browser, iPad, or mobile phone. By this feature, it has created a revolution in the data analytics
industry.
Copying between dashboards: Tableau lets you copy the worksheets or any dashboard elements
between different workbooks. Because of this feature, you need not start everything from scratch
if you have different business analysts using the data visualization types software. You can also
have seamless interactions between the dashboards.
64- bit version: Tableau gives you an option to choose between 32 bit or 64-bit versions. The version
downloaded depends on the OS. In a 32 bit OS, you can install only the 32-bit version of Tableau. The 64-
bit version has a lot of memory space and improves the speed.
SAML Authentication: SAML authentication lets a user create a single sign-on in a mixed
infrastructure environment. This will let the Tableau server mix into your core business areas and
internal applications.
Metadata Management: Tableau lets you rename the fields and modify the formats quickly and
easily. You can also create subsets of data by selecting groups of points.
Page 15 of
4
Your First Data Visualization
• Drag and drop the data visualization types icons to the columns and rows sections at the top of the
screen.
• Drag the medium icons into the columns section and visit icons to the rows section.
• This will let to know the data in a visual form.
Three line charts on the same axis comparing visits side by side.
Example:
To calculate the bounce rate, right-click in the measures area and select calculated fields, Then
enter [Bounces]/[Visits].
Page 16 of
4
The same approach can be used to do a variety of calculations. For instance, the code below
could be used to distinguish weekend vs. weekday traffic:
The above will give you a new dimension that allows you to separate your traffic by
weekend and weekday.
17