0% found this document useful (0 votes)
2 views17 pages

DA UNIT V Notes

The document discusses various data visualization techniques, including pixel-oriented, geometric projection, icon-based, and hierarchical methods, emphasizing their applications in effectively communicating data. It highlights the importance of visualization in understanding complex data relationships and trends, as well as the use of tools like Tableau for presenting data insights. Key features of Tableau include its user-friendly interface, interactive dashboards, and ability to quickly convert data into visual formats.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views17 pages

DA UNIT V Notes

The document discusses various data visualization techniques, including pixel-oriented, geometric projection, icon-based, and hierarchical methods, emphasizing their applications in effectively communicating data. It highlights the importance of visualization in understanding complex data relationships and trends, as well as the use of tools like Tableau for presenting data insights. Key features of Tableau include its user-friendly interface, interactive dashboards, and ability to quickly convert data into visual formats.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

UNIT V - DATA VISUALIZATION

1. Pixel-Oriented Visualization Techniques


2. Geometric Projection Visualization Techniques
3. Icon-Based Visualization Techniques
4. Hierarchical Visualization Techniques
5. Visualizing Complex Data and Relations.

Data Visualization :
• Data visualization aims to communicate data clearly and effectively through graphical
representation.
• Data visualization is the graphical representation of information and data.
• Data visualization has been used extensively in many applications—for example, at work
for reporting, managing business operations, and tracking progress of tasks.
• More popularly, we can take advantage of visualization techniques to discover data
relationships.

The categorization of visualization methods are


• Pixel-oriented visualization techniques
• Geometric projection visualization techniques
• Icon-based visualization techniques
• Hierarchical visualization techniques

1. Pixel-Oriented Visualization Techniques:


A simple way to visualize the value of a dimension is to use a pixel where the color of the
pixel reflects the dimension’s value. For a data set of m dimensions, pixel-oriented techniques
create m windows on the screen, one for each dimension. The m dimension values of a record are
mapped to m pixels at the corresponding positions in the windows.
The colors of the pixels reflect the corresponding values.
Inside a window, the data values are arranged in some global order shared by all windows.
The global order may be obtained by sorting all data records in a way that’s meaningful for the
task at hand.
AllElectronics(Name of the shop) maintains a customer information table, which consists of
four dimensions: income, credit limit, transaction volume, and age.
Can we analyze the correlation between income and the other attributes by visualization?
We can sort all customers in income-ascending order, and use this order to lay out
the customer data in the four visualization windows, as shown in Figure.

Page 1 of
4
Pixel-oriented visualization of four attributes by sorting all customers in income ascending
order
The pixel colors are chosen so that the smaller the value, the lighter the shading. Using pixel based
visualization, we can easily observe the following:
❖ Credit limit increases as income increases;
❖ Customers whose income is in the middle range are more likely to purchase more
from AllElectronics;
❖ There is no clear correlation between income and age.

In pixel-oriented techniques, data records can also be ordered in a query-dependent way. Query-
dependent way visualizes the relevance of the data items with respect to a query. It calculate the
distances between data and query values, combine the distances for each data item into an overall
distance, and visualize the distances for the variables and the overall distance sorted according to
the overall distance. For example, given a point query, we can sort all records in descending order
of similarity to the point query.

Filling a window by laying out the data records in a linear way may not work well for
a wide window. The first pixel in a row is far away from the last pixel in the previous row, though
they are next to each other in the global order. Moreover, a pixel is next to the one above it in the
window, even though the two are not next to each other in the global order. To solve this problem,
we can lay out the data records in a space-filling curve to fill the windows.
Space-filling curve
A space-filling curve is a curve with a range that covers the entire n-dimensional unit
hypercube. Since the visualization windows are 2-D, we can use any 2-D space-filling curve.
Figure 2.11 shows some frequently used 2-D space-filling curves. Note that the windows do not
have to be rectangular.

Some frequently used 2-D space-filling curves

Page 2 of
4
The circle segment technique: (a) Representing a data record in circle segments. (b) Laying
out pixels in circle segments.

For example, the circle segment technique uses windows in the shape of segments of a circle, as
illustrated in Figure 2.12. This technique can ease the comparison of dimensions because the
dimension windows are located side by side and form a circle.
Drawback of pixel-oriented visualization techniques
❖ A drawback of pixel-oriented visualization techniques is that they cannot help us much in
understanding the distribution of data in a multidimensional space.
❖ For example, they do not show whether there is a dense area in a multidimensional subspace.

2. Geometric Projection Visualization Techniques:


❖ Geometric projection techniques help users find interesting projections of multidimensional
data sets. The central challenge the geometric projection techniques try to address is how to
visualize a high-dimensional space on a 2-D display.

Methods
• Scatterplot and scatterplot matrices
• Parallel coordinates
Scatterplot
A scatter plot displays 2-D data points using Cartesian coordinates. A third dimension can be added
using different colors or shapes to represent different data points. Figure shows an example, where
X and Y are two spatial attributes and the third dimension is represented by different shapes.
Through this visualization, we can see that points of types “+” and “_” tend to be colocated.

Page 3 of
4
Visualization of a 2-D data set using a scatter plot

A 3-D scatter plot uses three axes in a Cartesian coordinate system. If it also uses color, it can
display up to 4-D data points (Figure).

Visualization of a 3-D data set using a scatter plot


Problem with scatterplot

For data sets with more than four dimensions, scatter plots are usually ineffective.

Scatter-plot matrix
❖ The scatter-plot matrix technique is a useful extension to the scatter plot.
❖ For an n dimensional data set, a scatter-plot matrix is an n x n grid of 2-D scatter plots that
Page 4 of
4
❖ provides a visualization of each dimension with every other dimension.
❖ Figure shows an example, which visualizes the Iris data set.

Visualization of the Iris dataset


❖ The data set consists of 450 samples from each of three species of Iris flowers. There are five
dimensions in the data set: length and width of sepal and petal, and species.

Problem with scatter-plot matrix


❖ The scatter-plot matrix becomes less effective as the dimensionality increases.

Parallel coordinates

❖ Another popular technique, called parallel coordinates, can handle higher dimensionality.
❖ To visualize n-dimensional data points, the parallel coordinates technique draws n equally
spaced axes, one for each dimension, parallel to one of the display axes.
❖ A data record is represented by a polygonal line that intersects each axis at the point
corresponding to the associated dimension value (Figure).

Page 5 of
4
Visualization that uses parallel coordinates

Limitation with parallel coordinates


A major limitation of the parallel coordinates technique is that it cannot effectively
show a data set of many records. Even for a data set of several thousand records, visual clutter and
overlap often reduce the readability of the visualization and make the patterns hard to find.

3. Icon-Based Visualization Techniques:


Icon-based visualization techniques use small icons to represent multidimensional
data values.
We look at two popular icon-based techniques: Chernoff faces and stick figures.

Chernoff faces were introduced in 1973 by statistician Herman Chernoff. They display
multidimensional data of up to 18 variables (or dimensions) as a cartoon human face (Figure).
Chernoff faces help reveal trends in the data. Components of the face, such as the eyes, ears,
mouth, and nose, represent values of the dimensions by their shape, size, placement, and
orientation.

Goal of Chernoff’s faces

❖ The goal of Chernoff’s faces is to show a bunch of variables at once via facial features like
lips, eyes, and nose size.

Page 6 of
4
Chernoff faces. Each face represents an n-dimensional data point (n ≤ 18)

For example, dimensions can be mapped to the following facial characteristics: eye size, eye
spacing, nose length, nose width, mouth curvature, mouth width, mouth openness, pupil size,
eyebrow slant, eye eccentricity, and head eccentricity.

General techniques of chernoff faces

❖ Shape coding: Use shape to represent certain information encoding


❖ Color icons: Use color icons to encode more information
❖ Tile bars: Use small icons to represent the relevant feature vectors in document retrieval
Shape Coding

• It use shape to represent certain information encoding.

Color icons and Tiles


• Use color icons to represent more information
• Use small icons to represent the relevant feature vectors in the document
Page 7 of
4
Chernoff faces make use of the ability of the human mind to recognize small differences in
facial characteristics and to assimilate many facial characteristics at once.
Viewing large tables of data can be tedious. By condensing the data, Chernoff faces make
the data easier for users to digest. In this way, they facilitate visualization of regularities and
irregularities present in the data, although their power in relating multiple relationships is limited.
Another limitation is that specific data values are not shown. Furthermore, facial features
vary in perceived importance. This means that the similarity of two faces (representing two
multidimensional data points) can vary depending on the order in which dimensions are assigned to
facial characteristics. Therefore, this mapping should be carefully chosen. Eye size and eyebrow
slant have been found to be important.
Asymmetrical Chernoff faces were proposed as an extension to the original technique. Since a
face has vertical symmetry (along the y-axis), the left and right side of a face are identical,
which wastes space. Asymmetrical Chernoff faces double the number of facial
characteristics, thus allowing up to 36 dimensions to be displayed.

Drawback

• Assignment of facial expressions to variables affects on the shape of the face.


Stick figures
The stick figure visualization technique maps multidimensional data to five-piece stick figures,
where each figure has four limbs and a body.

Page 8 of
4
Two dimensions are mapped to the display (x and y) axes and the remaining dimensions are
mapped to the angle and/or length of the limbs. Figure 2.18 shows census data, where age and
income are mapped to the display axes, and the remaining dimensions (gender, education, and so
on) are mapped to stick figures.

Census data represented using stick figures

Page 9 of
4
If the data items are relatively dense with respect to the two display dimensions, the resulting
visualization shows texture patterns, reflecting data trends.

4. Hierarchical Visualization Techniques:


The visualization techniques discussed so far focus on visualizing multiple dimensions
simultaneously. However, for a large data set of high dimensionality, it would be difficult to
visualize all dimensions at the same time.

Hierarchical visualization techniques partition all dimensions into subsets (i.e., subspaces). The
subspaces are visualized in a hierarchical manner.

o Worlds-within-Worlds
o Treemap
o Cone Trees
Worlds-within-Worlds
“Worlds-within-Worlds,” also known as n-Vision, is a representative hierarchical visualization
method. Suppose we want to visualize a 6-D data set, where the dimensions are F,X1, : : : ,X5. We
want to observe how dimension F changes with respect to the other dimensions. We can first fix
the values of dimensions X3,X4,X5 to some selected values, say, c3, c4, c5. We can then visualize
F,X1,X2 using a 3-D plot, called a world, as shown in Figure.

Worlds-within-Worlds (also known as n-Vision)


The position of the origin of the inner world is located at the point .c3, c4, c5 in the outer world,
which is another 3-D plot using dimensions X3,X4,X5. A user can interactively change, in the outer
world, the location of the origin of the inner world.

Page 10 of
4
The user then views the resulting changes of the inner world. Moreover, a user can vary the
dimensions used in the inner world and the outer world. Given more dimensions, more levels of
worlds can be used, which is why the method is called “worlds-within worlds.”

Tree-map
As another example of hierarchical visualization methods, tree-maps display hierarchical data as a
set of nested rectangles.

Newsmap: Use of tree-map to visualize Google news headline stories

5. Visualizing Complex Data and Relations:


In early days, visualization techniques were mainly for numeric data. Recently, more
and more non-numeric data, such as text and social networks, have become available.
Visualizing and analyzing such data attracts a lot of interest. There are many new visualization
techniques dedicated to these kinds of data. For example, many people on the Web tag various
objects such as pictures, blog entries, and product reviews. A word cloud is a collection, or
cluster, of words depicted in different sizes. A tag cloud is a visualization of statistics of user-
generated tags. Often, in a tag cloud, tags are listed alphabetically or in a user-preferred order. The
importance of a tag is indicated by font size or color. Figure shows a tag cloud for visualizing the
popular tags used in a Web site.

Page 11 of
4
Using a tag cloud to visualize popular website tags

Tag clouds are often used in two ways. First, in a tag cloud for a single item, we can use the size of
a tag to represent the number of times that the tag is applied to this item by different users. Second,
when visualizing the tag statistics on multiple items, we can use the size of a tag to represent the
number of items that the tag has been applied to, that is, the popularity of the tag.

Word Clouds can also be used to compare two different bodies of text together.

Complex Relation
In addition to complex data, complex relations among data entries also raise challenges for
visualization. For example, figure uses a disease influence graph to visualize the correlations
between diseases.

Page 12 of
4
Disease influence graph of people at least 20 years old in the NHANES data set

The nodes in the graph are diseases, and the size of each node is proportional to the prevalence of
the corresponding disease. Two nodes are linked by an edge if the corresponding diseases have a
strong correlation. The width of an edge is proportional to the strength of the correlation pattern of
the two corresponding diseases.

Visualizing Corona Virus Outbreak using tag cloud.


Advantage
• Simple
• Easy to understand
Disadvantages
• Long words are emphasized over short words.
• Words whose letters contain many ascenders and descenders may receive more attention.
• They're not great for analytical accuracy, so used more for aesthetic reasons instead.

In summary, visualization provides effective tools to explore data. Visualization can be used in
data mining in various aspects. In addition to visualizing data, visualization can be used to
represent the data mining process, the patterns obtained from a mining method, and user interaction
with the data. Visual data mining is an important research and development direction.
Page 13 of
4
DATA VISUALIZATION IN TABLEAU
Data Visualization with tableau is the process of presenting information through visual rendering.
It helps decision-makers find out the relevance among the millions of variables and communicate
concepts and hypotheses.
Data visualization is the graphical representation of information and data. By using visual elements
like charts, graphs, and maps, data visualization tools provide an accessible way to see and
understand trends, outliers, and patterns in data.

Importance of Data Visualization in Tableau


• Visualization helps people to understand things clearly and have a better insight into the topic.
• Visualization helps to predict the future easily and take better decisions
• Data of large volumes can also be spotted easily and quickly
• Data visualization conveys the information in a universal manner
• It makes it simple to share ideas with others
• Data visualization lets people know where they need to do an adjustment in their business to get a
better result.
• It provides scalability
• It makes interpretation easy

What is Tableau?
Tableau provides an easy drag and drop interface that will quickly convert your data into business
insights. It delivers the visualized data in a more appealing format called data visualization
dashboards.

Features of Tableau
The visual analysis in a click: Tableau analyzes data in a logical and easy manner. The data is
converted into visualizations in less development time. Making quick visualizations is a great
advantage of Tableau.
Interactive dashboard: Tableau has a beautiful and interactive dashboard that gives results very
quickly. The dashboard will also show you rich visualizations. The data visualization dashboard
will give you an in-depth knowledge of the data.
Easy use: Some data visualization tools make data analytics hard, but Tableau makes it easier. It
reduces unnecessary complications. Tableau uses a drag and drop interface, which offers a user-
friendly environment. One can use Tableau if he knows the basics of MS Excel.
Direct connection: Tableau connects the users directly to databases, data warehouses and other
sources of data. It does not require any complicated setup.
Deals with big data: Tableau can analyze enormous data effortlessly and visualize it better than
any other DVT.
Publishing and sharing: The dashboard can be published live on the platform from which the
user is accessing it. The results can also be shared live.
Deals with all types of Data: The interface of data visualizations Tableau has a fast data engine
that extracts data from various sources with unequal instincts. All the data is created equal in
Tableau.

Page 14 of
4
Trending in the market: Tableau is growing rapidly in the business analytics market. It is being
used in all industries now. Many top-rated companies like Microsoft, Nokia, and Deloitte use
Tableau to meet their business intelligence requirements.
Interactive visualizations: One of the most important features of Tableau is its ability to create
more beautiful data visualizations types. It produces attractive and functional visualizations that
will help the user in making decisions.
Smart Maps: Tableau also lets you search in the map or lasso data points. It will help to get an
answer to the geographical question “Where ?”
Works across multiple platforms: Tableau can be accessed by the users through desktop,
browser, iPad, or mobile phone. By this feature, it has created a revolution in the data analytics
industry.
Copying between dashboards: Tableau lets you copy the worksheets or any dashboard elements
between different workbooks. Because of this feature, you need not start everything from scratch
if you have different business analysts using the data visualization types software. You can also
have seamless interactions between the dashboards.
64- bit version: Tableau gives you an option to choose between 32 bit or 64-bit versions. The version
downloaded depends on the OS. In a 32 bit OS, you can install only the 32-bit version of Tableau. The 64-
bit version has a lot of memory space and improves the speed.
SAML Authentication: SAML authentication lets a user create a single sign-on in a mixed
infrastructure environment. This will let the Tableau server mix into your core business areas and
internal applications.
Metadata Management: Tableau lets you rename the fields and modify the formats quickly and
easily. You can also create subsets of data by selecting groups of points.

Creating a Dashboard in Tableau


• Connect to your data
Start Tableau, select the data from a variety of data sources to connect.
• Extract The Data:
Choose the dimensions and measures of the data to analyze. Dimensions are the category
type data points such as landing page, source medium, etc. The measures are the number
entries such as visits, bounces, etc. The more dimensions you add, the larger the data set
will get.
[Eg.] Adding a device type will give one row of measures for each device. If the default
data has 10,000 rows, and the hour dimension is added, then 10,000*24 (hours) will
be. So, if hours and mobile device type is added, then 10,000*24*~250 = 60,000,000
will be.
• The Workspace
The workspace is divided into three major sections – data, settings, and visualizations. Two
sets of data on the left side of the screen, dimensions on the top and measures on the bottom
of the screen can be viewed. The columns and rows are located near the top of the screen.

Page 15 of
4
Your First Data Visualization
• Drag and drop the data visualization types icons to the columns and rows sections at the top of the
screen.
• Drag the medium icons into the columns section and visit icons to the rows section.
• This will let to know the data in a visual form.

Three line charts on the same axis comparing visits side by side.

Example:
To calculate the bounce rate, right-click in the measures area and select calculated fields, Then
enter [Bounces]/[Visits].

Page 16 of
4
The same approach can be used to do a variety of calculations. For instance, the code below
could be used to distinguish weekend vs. weekday traffic:

if(DATEPart('weekday',[Date])=1) then 'Weekend' //Sunday


elseif(DATEPart('weekday',[Date])=2) then 'Weekday' //Monday
elseif(DATEPart('weekday',[Date])=3) then 'Weekday' //Tuesday
elseif(DATEPart('weekday',[Date])=4) then 'Weekday' //Wednesday
elseif(DATEPart('weekday',[Date])=5) then 'Weekday' //Thursday
elseif(DATEPart('weekday',[Date])=6) then 'Weekday' //Friday
elseif(DATEPart('weekday',[Date])=7) then 'Weekend' //Saturday end

The above will give you a new dimension that allows you to separate your traffic by
weekend and weekday.

17

You might also like