0% found this document useful (0 votes)
1 views

Module 2e - Data Visualization - Nv

Uploaded by

upneo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Module 2e - Data Visualization - Nv

Uploaded by

upneo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

COMP ED 20 – INTRODUCTION TO ANALYTICS

UP Open University

Module 2d – Data Visualization

Introduction

Data visualization is an important part of data analytics. It facilitates our


understanding and knowledge of the data under investigation. With today’s
computing technology, creating visual presentations of the data under study can be
done with ease and efficiency.

In this module, you will learn the different types of visualization, and how to
visualize numerical and non-numerical data.

Objectives: At the end of this module, you should be able to:

1. Identify and differentiate the different type of visualization,


2. Apply the appropriate visualization for each type of data,
3. Create visualization of numerical and non-numerical data.

Key Concepts and Activities

What is Data Visualization?

Visualization is the process of converting raw data into graphical forms to facilitate in
understanding the characteristics and in interpreting results of analysis. With visualization, we
can see if there are trends, what are the data patterns, we can identify if the data is normally
distributed or not, if outlier values are present in the data set, helps us understand the clusters
in the dataset, and many more. When correctly done, visualization can aid decision makers of
data-driven organizations make fast and accurate decisions.

Visualization can be done at any stage of data analytics process such as during the
exploratory data analysis, model building and in the presentations of results.

Data Visualization are primarily used for analysis and communication.


Visualization like boxplot will help determine the presence of outlier values. Frequency
distribution will help understand the distribution of the datasets. Scatterplot will help
determine the trends and relationships of variables.

Visualizations serve as the communication tool between data experts and the
public. Graphical presentations of data make them easier to comprehend than
columns and rows of data. Visualizations allows multiple presentations of single data
set. This means that certain features in the dataset can be presented as bar chart,
while other attributes can be presented as pie chart.

Types of Visualizing Data


Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

Visualization can be applied to both numerical and non-numerical data. For our
demonstration, we will use the sample data provided in the Orange software.
Run/execute the Orange Data Mining software and click the Examples widget, and
choose Visualization of Data Sets.

Course code: COMP ED 20 Page | 2


Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

To understand about the data set, double click the File Widget Table to know about
the file associated with the File Widget. As shown in the info sheet, the file is about
the Iris Flower Dataset (iris.tab) with five attributes (4 numerical, 1 text) and 150
instances or records.

Double click Data Table widget to view the data set or the Scatter Plot widget to
view the relationship between two of the variables in the data set. The

Some of the popular types of data visualization include the following:

a. Distribution Widget. This tool is used to display or view the distribution of data of a
single variable. This can be done by adding the Distributions Widget and connect it
from the File Widget. Then, double click the Distributions Widget to view the frequency
distribution.

Course code: COMP ED 20 Page | 3


Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

The distribution below shows the distribution of Sepal length of all Iris flowers.
With reference to the horizontal axis, the values of this variable is distributed
from 4 to 8. At the upper right of the chart is the mean sepal length of 5.743
and standard deviation of 0.825. The fitted distribution is set to Normal with a
bin width of 0.5. You can adjust the bin width according to your preference.

To view the distribution of Sepal width, just select the variable Sepal width to
display its distribution.

To display the distributions of sepal length for each type of flower, click the
pull down button Split by and select Iris. The result of this action is shown
below:

Course code: COMP ED 20 Page | 4


Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

b. Box plot. This chart shows the summary value of numeric variables. These summary
values include first quartile, median, third quartile, mean and standard deviation. It also
allows you to show the boxplots of the variable grouped according to its associated
categorical variable. Box plot is a graphical approach for univariate exploratory data
analysis.

Using the same data set, click and drag Box Plot to the Canvas and connect it
from the File Widget.

Double click the Box Plot widget to view the chart of the selected variable.

The center yellow vertical line with value 5.8 refers to the median value, while
5.843 and 0.83 refer to the mean and standard deviation, respectively. Since
the mean value (5.83) is greater than the median value (5.8), this implies that
this variable is right-skewed distribution (or has long right tail). The value 5.1
and 6.4 refer to the 1st quartile and 3rd quartile of the data set.

Course code: COMP ED 20 Page | 5


Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

To view the box-plot of the variable sepal length per type of Iris flower, click
Iris in the subgroups.

c. Line graph. This is used to find trends or patterns on the data over a period of
time. This is one of the easiest to implement and this can be done easily with MS
Excel.
Data type: Both x and y axes are quantitative
Example: humidity over time, income over months

d. Pie Chart. This chart is typically used to show the proportions of values relative to
the total value. This is best done using MS Excel Software application. For example,
assuming that we have 30 records in our dataset and we have 13 male respondents,
and 17 respondents, their proportions or percentages can be viewed graphically
using pie chart as follows:

Course code: COMP ED 20 Page | 6


Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

13, 43%
17, 57%

F M

e. Scatter plot. This is used to plot the points of two variables in a cartesian plane and
to visualize if there is a pattern or relationship on the values of the two variables.
This relationship could be positive, negative or none at all. Both variables at the x
and y -axes should be quantitative data. Scatterplot is used as a graphical method in
conducting multivariate/ bivariate exploratory data analysis.

In this example, let us visualize the relationship between the petal width and
petal length of iris flower.

Since there are three types of Iris flower (Iris-Setosa, Iris-Versicolor, and Iris-
Virginica), we can check the Show Legend to help us understand where
these points belong. Let us also check the Show Regression Line to view
the linear relationship of the variables.

From our scatterplot, the red, green and blue colored dots represent the
intersections in the cartesian plane of the petal width and petal length of Iris-
versicolor, Iris-virginica and Iris-setosa, respectively.

The different colored lines are the regression lines and correlation coefficients (r
values) of the petal length and petal width under each type of iris flower. The r-values
of Iris-virgnica, Iris-versicolor and Iris-setosa are 0.32, 0.79 and 0.31, respectively.

On the other hand, the black line and r = 0.96 represent the overall regression line
and correlation coefficient of the petal length and petal width. The range of r values
is within -1 and +1. An r value

r > 0 indicates positive association between the two variables


r < 0 indicates negative association between the two variables

An r-value that is close to 1 indicates a strong positive linear relationship between the
two variables while an r-value that is close to -1 indicates a strong negative linear
relationship between the two variables. An r-value that is close to zero means that
the relationship between the two variables is linearly weak.

The r value at 0.96 implies that the overall relationship between sepal length and

Course code: COMP ED 20 Page | 7


Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

petal width has a strong positive correlation. In other words, the longer is the sepal
length, the wider is the petal width.

Visualization Tools

There are many visualization tools that are available either offline or online. These
include the following:

1. MS Excel. Aside from its ability to handle columnar data, this application is also
capable of visualizing data and in performing statistical analysis.
2. Tableau. This application is one of the popular visualization software these days.
Its public edition can be downloaded and used for free.
3. Orange Data Mining. Orange is a machine learning and data mining software that
has various features to visualize data. It also has various statistical and machine
learning algorithms. It is open source and uses interactive graphical tools to use its
functions.
4. Rapidminer. This is a machine learning software that could perform various data
analysis and visualization. It also has free version, which allows analysis of up to
10,000 records.
5. Google Charts. This is one of the applications offered by google for free.
6. Datawrapper. This is an online charting tool for creating charts and maps.
7. Infogram. This is an online visualization tool that allows you to create infographics
and reports.
8. Online Data Visualization Websites
Course code: COMP ED 20 Page | 8
Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

 Data hero at Datahero.com


 Raw graphs at Rawgraphs.io
 Data Visual at https://datavisu.al/

Assignment 4. 35 pts.

References/ Supplementary Materials:

1. Klipfolio. What is Data Visualization?


https://www.klipfolio.com/resources/articles/what-is-data-visualization
2. Watch BBC Four. Hans Rosling's 200 Countries, 200 Years, 4 Minutes - The Joy of
Stats. https://www.youtube.com/watch?v=jbkSRLYSojo
3. Watch the video on classifying shapes of distributions:
https://www.khanacademy.org/math/ap-statistics/quantitative-data-ap/describing-
comparing-distributions/v/classifying-distributions?modal=1
4. University of Illinois at Urbana-Champaign. Overview of Visualization.
https://coursera.org/share/a2818b697bd48f44a79b5e9cb5c36fcc
5. University of Illinois at Urbana-Champaign. Charts.
https://coursera.org/share/95376df1b41a909d376ddbe9c4911660
6. Saranya, K. 2019. Benefits and Importance of Data Visualization.
https://www.boldbi.com/blog/data-visualization-importance-and-benefits
7. Import.io. 2019. Types of data visualization charts. https://www.import.io/post/what-
is-data-visualization/
8. Matejka, J. and Fitzmaurice, G. Same Stats, Different Graphs: Generating Datasets
with Varied Appearance and Identical Statistics through Simulated Annealing. URL:
https://damassets.autodesk.net/content/dam/autodesk/www/autodesk-
reasearch/Publications/pdf/same-stats-different-graphs.pdf
9. Kansas State University. Data Visualization. https://guides.lib.k-
state.edu/c.php?g=181742&p=1196015#:~:text=What%20are%20Temporal%20Data
%20Visualizations,to%20one%2Ddimensional%20linear%20visualizations.&text=Tim
eline%20visualizations%20usually%20include%20all,some%20time%20period%20or
%20moment.
10. Statistics Canada. Constructing box and whisker plots.
https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch12/5214889-
eng.htm#:~:text=A%20box%20and%20whisker%20plot%20is%20a%20way%
20of%20summarizing,central%20value%2C%20and%20its%20variability.

Course code: COMP ED 20 Page | 9

You might also like