0% found this document useful (0 votes)

6 views9 pages

Module 2e - Data Visualization - NV

Uploaded by

upneo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views9 pages

Module 2e - Data Visualization - NV

Uploaded by

upneo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

COMP ED 20 – INTRODUCTION TO ANALYTICS

UP Open University

Module 2d – Data Visualization

Introduction

Data visualization is an important part of data analytics. It facilitates our

understanding and knowledge of the data under investigation. With today’s
computing technology, creating visual presentations of the data under study can be
done with ease and efficiency.

In this module, you will learn the different types of visualization, and how to
visualize numerical and non-numerical data.

Objectives: At the end of this module, you should be able to:

1. Identify and differentiate the different type of visualization,

2. Apply the appropriate visualization for each type of data,
3. Create visualization of numerical and non-numerical data.

Key Concepts and Activities

What is Data Visualization?

Visualization is the process of converting raw data into graphical forms to facilitate in
understanding the characteristics and in interpreting results of analysis. With visualization, we
can see if there are trends, what are the data patterns, we can identify if the data is normally
distributed or not, if outlier values are present in the data set, helps us understand the clusters
in the dataset, and many more. When correctly done, visualization can aid decision makers of
data-driven organizations make fast and accurate decisions.

Visualization can be done at any stage of data analytics process such as during the
exploratory data analysis, model building and in the presentations of results.

Data Visualization are primarily used for analysis and communication.

Visualization like boxplot will help determine the presence of outlier values. Frequency
distribution will help understand the distribution of the datasets. Scatterplot will help
determine the trends and relationships of variables.

Visualizations serve as the communication tool between data experts and the
public. Graphical presentations of data make them easier to comprehend than
columns and rows of data. Visualizations allows multiple presentations of single data
set. This means that certain features in the dataset can be presented as bar chart,
while other attributes can be presented as pie chart.

Types of Visualizing Data

Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

Visualization can be applied to both numerical and non-numerical data. For our
demonstration, we will use the sample data provided in the Orange software.
Run/execute the Orange Data Mining software and click the Examples widget, and
choose Visualization of Data Sets.

Course code: COMP ED 20 Page | 2

Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

To understand about the data set, double click the File Widget Table to know about
the file associated with the File Widget. As shown in the info sheet, the file is about
the Iris Flower Dataset (iris.tab) with five attributes (4 numerical, 1 text) and 150
instances or records.

Double click Data Table widget to view the data set or the Scatter Plot widget to
view the relationship between two of the variables in the data set. The

Some of the popular types of data visualization include the following:

a. Distribution Widget. This tool is used to display or view the distribution of data of a
single variable. This can be done by adding the Distributions Widget and connect it
from the File Widget. Then, double click the Distributions Widget to view the frequency
distribution.

Course code: COMP ED 20 Page | 3

Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

The distribution below shows the distribution of Sepal length of all Iris flowers.
With reference to the horizontal axis, the values of this variable is distributed
from 4 to 8. At the upper right of the chart is the mean sepal length of 5.743
and standard deviation of 0.825. The fitted distribution is set to Normal with a
bin width of 0.5. You can adjust the bin width according to your preference.

To view the distribution of Sepal width, just select the variable Sepal width to
display its distribution.

To display the distributions of sepal length for each type of flower, click the
pull down button Split by and select Iris. The result of this action is shown
below:

Course code: COMP ED 20 Page | 4

Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

b. Box plot. This chart shows the summary value of numeric variables. These summary
values include first quartile, median, third quartile, mean and standard deviation. It also
allows you to show the boxplots of the variable grouped according to its associated
categorical variable. Box plot is a graphical approach for univariate exploratory data
analysis.

Using the same data set, click and drag Box Plot to the Canvas and connect it
from the File Widget.

Double click the Box Plot widget to view the chart of the selected variable.

The center yellow vertical line with value 5.8 refers to the median value, while
5.843 and 0.83 refer to the mean and standard deviation, respectively. Since
the mean value (5.83) is greater than the median value (5.8), this implies that
this variable is right-skewed distribution (or has long right tail). The value 5.1
and 6.4 refer to the 1st quartile and 3rd quartile of the data set.

Course code: COMP ED 20 Page | 5

Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

To view the box-plot of the variable sepal length per type of Iris flower, click
Iris in the subgroups.

c. Line graph. This is used to find trends or patterns on the data over a period of
time. This is one of the easiest to implement and this can be done easily with MS
Excel.
Data type: Both x and y axes are quantitative
Example: humidity over time, income over months

d. Pie Chart. This chart is typically used to show the proportions of values relative to
the total value. This is best done using MS Excel Software application. For example,
assuming that we have 30 records in our dataset and we have 13 male respondents,
and 17 respondents, their proportions or percentages can be viewed graphically
using pie chart as follows:

Course code: COMP ED 20 Page | 6

Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

13, 43%
17, 57%

F M

e. Scatter plot. This is used to plot the points of two variables in a cartesian plane and
to visualize if there is a pattern or relationship on the values of the two variables.
This relationship could be positive, negative or none at all. Both variables at the x
and y -axes should be quantitative data. Scatterplot is used as a graphical method in
conducting multivariate/ bivariate exploratory data analysis.

In this example, let us visualize the relationship between the petal width and
petal length of iris flower.

Since there are three types of Iris flower (Iris-Setosa, Iris-Versicolor, and Iris-
Virginica), we can check the Show Legend to help us understand where
these points belong. Let us also check the Show Regression Line to view
the linear relationship of the variables.

From our scatterplot, the red, green and blue colored dots represent the
intersections in the cartesian plane of the petal width and petal length of Iris-
versicolor, Iris-virginica and Iris-setosa, respectively.

The different colored lines are the regression lines and correlation coefficients (r
values) of the petal length and petal width under each type of iris flower. The r-values
of Iris-virgnica, Iris-versicolor and Iris-setosa are 0.32, 0.79 and 0.31, respectively.

On the other hand, the black line and r = 0.96 represent the overall regression line
and correlation coefficient of the petal length and petal width. The range of r values
is within -1 and +1. An r value

r > 0 indicates positive association between the two variables

r < 0 indicates negative association between the two variables

An r-value that is close to 1 indicates a strong positive linear relationship between the
two variables while an r-value that is close to -1 indicates a strong negative linear
relationship between the two variables. An r-value that is close to zero means that
the relationship between the two variables is linearly weak.

The r value at 0.96 implies that the overall relationship between sepal length and

Course code: COMP ED 20 Page | 7

Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

petal width has a strong positive correlation. In other words, the longer is the sepal
length, the wider is the petal width.

Visualization Tools

There are many visualization tools that are available either offline or online. These
include the following:

1. MS Excel. Aside from its ability to handle columnar data, this application is also
capable of visualizing data and in performing statistical analysis.
2. Tableau. This application is one of the popular visualization software these days.
Its public edition can be downloaded and used for free.
3. Orange Data Mining. Orange is a machine learning and data mining software that
has various features to visualize data. It also has various statistical and machine
learning algorithms. It is open source and uses interactive graphical tools to use its
functions.
4. Rapidminer. This is a machine learning software that could perform various data
analysis and visualization. It also has free version, which allows analysis of up to
10,000 records.
5. Google Charts. This is one of the applications offered by google for free.
6. Datawrapper. This is an online charting tool for creating charts and maps.
7. Infogram. This is an online visualization tool that allows you to create infographics
and reports.
8. Online Data Visualization Websites
Course code: COMP ED 20 Page | 8
Calag, VB (2021). Introduction to Analytics. Los Baños: University of the Philippines Open University

 Data hero at Datahero.com

 Raw graphs at Rawgraphs.io
 Data Visual at https://datavisu.al/

Assignment 4. 35 pts.

References/ Supplementary Materials:

1. Klipfolio. What is Data Visualization?

https://www.klipfolio.com/resources/articles/what-is-data-visualization
2. Watch BBC Four. Hans Rosling's 200 Countries, 200 Years, 4 Minutes - The Joy of
Stats. https://www.youtube.com/watch?v=jbkSRLYSojo
3. Watch the video on classifying shapes of distributions:
https://www.khanacademy.org/math/ap-statistics/quantitative-data-ap/describing-
comparing-distributions/v/classifying-distributions?modal=1
4. University of Illinois at Urbana-Champaign. Overview of Visualization.
https://coursera.org/share/a2818b697bd48f44a79b5e9cb5c36fcc
5. University of Illinois at Urbana-Champaign. Charts.
https://coursera.org/share/95376df1b41a909d376ddbe9c4911660
6. Saranya, K. 2019. Benefits and Importance of Data Visualization.
https://www.boldbi.com/blog/data-visualization-importance-and-benefits
7. Import.io. 2019. Types of data visualization charts. https://www.import.io/post/what-
is-data-visualization/
8. Matejka, J. and Fitzmaurice, G. Same Stats, Different Graphs: Generating Datasets
with Varied Appearance and Identical Statistics through Simulated Annealing. URL:
https://damassets.autodesk.net/content/dam/autodesk/www/autodesk-
reasearch/Publications/pdf/same-stats-different-graphs.pdf
9. Kansas State University. Data Visualization. https://guides.lib.k-
state.edu/c.php?g=181742&p=1196015#:~:text=What%20are%20Temporal%20Data
%20Visualizations,to%20one%2Ddimensional%20linear%20visualizations.&text=Tim
eline%20visualizations%20usually%20include%20all,some%20time%20period%20or
%20moment.
10. Statistics Canada. Constructing box and whisker plots.
https://www150.statcan.gc.ca/n1/edu/power-pouvoir/ch12/5214889-
eng.htm#:~:text=A%20box%20and%20whisker%20plot%20is%20a%20way%
20of%20summarizing,central%20value%2C%20and%20its%20variability.

Course code: COMP ED 20 Page | 9

10
No ratings yet
10
7 pages
EXPERIMENT
No ratings yet
EXPERIMENT
16 pages
ML R Experiment1
No ratings yet
ML R Experiment1
10 pages
M1.2 DS
No ratings yet
M1.2 DS
29 pages
Data Mining: Exploring Data Data Mining: Exploring Data: Lecture Notes For Chapter 3 Lecture Notes For Chapter 3
No ratings yet
Data Mining: Exploring Data Data Mining: Exploring Data: Lecture Notes For Chapter 3 Lecture Notes For Chapter 3
34 pages
Measures of Dispersion
No ratings yet
Measures of Dispersion
2 pages
Mini Tab Information
No ratings yet
Mini Tab Information
4 pages
Minitab Graphs
No ratings yet
Minitab Graphs
7 pages
Data Exploration LEC3 AM
No ratings yet
Data Exploration LEC3 AM
59 pages
Visualization
No ratings yet
Visualization
24 pages
Data Mining: Exploring Data: Lecture Notes For Chapter 3
No ratings yet
Data Mining: Exploring Data: Lecture Notes For Chapter 3
21 pages
Wk. 4. Exploring Data (12-05-2021)
No ratings yet
Wk. 4. Exploring Data (12-05-2021)
10 pages
Statistical Data Presentation Tools
0% (1)
Statistical Data Presentation Tools
21 pages
Module 2 Iris Data Set
No ratings yet
Module 2 Iris Data Set
1 page
Discriminant Analysis Example
No ratings yet
Discriminant Analysis Example
19 pages
Canonical Discriminant Analysis
No ratings yet
Canonical Discriminant Analysis
10 pages
STAT243 Chapter 2 - Section 2.3
No ratings yet
STAT243 Chapter 2 - Section 2.3
22 pages
4 - Exploring Data
No ratings yet
4 - Exploring Data
32 pages
Unit 2
100% (1)
Unit 2
30 pages
Subtitle Big Data Coursera 2
No ratings yet
Subtitle Big Data Coursera 2
3 pages
Visualizing Univariate Data Analysis: Concepts
No ratings yet
Visualizing Univariate Data Analysis: Concepts
20 pages
Types of Chart
No ratings yet
Types of Chart
18 pages
Ass 10 DSBDL
No ratings yet
Ass 10 DSBDL
9 pages
EDA AnalysisA
No ratings yet
EDA AnalysisA
15 pages
SWE 335 Slide 07
No ratings yet
SWE 335 Slide 07
29 pages
Lecture 2.1 Data - Exploration
No ratings yet
Lecture 2.1 Data - Exploration
22 pages
Datavisualisation
No ratings yet
Datavisualisation
2 pages
Glossary
No ratings yet
Glossary
9 pages
5 Data Exploration
No ratings yet
5 Data Exploration
41 pages
Data Visualization
No ratings yet
Data Visualization
24 pages
A Complete Guide To The Iris Dataset in R
No ratings yet
A Complete Guide To The Iris Dataset in R
3 pages
Visualization
No ratings yet
Visualization
27 pages
DataScience&Analytics DataVisualiztn
No ratings yet
DataScience&Analytics DataVisualiztn
26 pages
DA Unit 4
No ratings yet
DA Unit 4
30 pages
Chapter 16 Exploring
No ratings yet
Chapter 16 Exploring
7 pages
BT 3041: Analysis and Interpretation of Biological Data
No ratings yet
BT 3041: Analysis and Interpretation of Biological Data
57 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
48 pages
Module2 R Report
No ratings yet
Module2 R Report
6 pages
M2 - Visualization of Categorical and Numerical Data
No ratings yet
M2 - Visualization of Categorical and Numerical Data
20 pages
Unit 2 DS
No ratings yet
Unit 2 DS
36 pages
21L-1803 Data Visual Assignment#3
No ratings yet
21L-1803 Data Visual Assignment#3
26 pages
Analyze Phase
No ratings yet
Analyze Phase
30 pages
Scientific Design Choices in Data Visualization
No ratings yet
Scientific Design Choices in Data Visualization
11 pages
Chart Types
No ratings yet
Chart Types
20 pages
Graphing - Distributions
No ratings yet
Graphing - Distributions
25 pages
Bar Charts NOTES
No ratings yet
Bar Charts NOTES
8 pages
Section 3 - Data Presentation
No ratings yet
Section 3 - Data Presentation
19 pages
Univariate and Multivariate Data Exploration
No ratings yet
Univariate and Multivariate Data Exploration
26 pages
Statictics and Measures of Central Tendency
80% (5)
Statictics and Measures of Central Tendency
46 pages
Datavisualization Interview
No ratings yet
Datavisualization Interview
3 pages
STAT 008 CH 1-3 p.1-37 Lecture Notes
No ratings yet
STAT 008 CH 1-3 p.1-37 Lecture Notes
37 pages
Data Exploration and Visualisation With R: Yanchang Zhao
No ratings yet
Data Exploration and Visualisation With R: Yanchang Zhao
45 pages
M2 - Visualization Across Time, Space, Relationships
No ratings yet
M2 - Visualization Across Time, Space, Relationships
14 pages
03 Temporal, Geospatial Multivariate Data
No ratings yet
03 Temporal, Geospatial Multivariate Data
69 pages
Materi 1 B VDE
No ratings yet
Materi 1 B VDE
18 pages
Unit 2 Chapter 2 Notes - Statistics
No ratings yet
Unit 2 Chapter 2 Notes - Statistics
4 pages
02a EDA and Data Visualization
No ratings yet
02a EDA and Data Visualization
79 pages
Applied - Data - Science MODULE 3 SEM 8
No ratings yet
Applied - Data - Science MODULE 3 SEM 8
41 pages
Unit 4 - Data Visualization
No ratings yet
Unit 4 - Data Visualization
32 pages
Regression Model 1: Square Footage: Variables Entered/Removed
No ratings yet
Regression Model 1: Square Footage: Variables Entered/Removed
4 pages
MCQ For Data Science Users DR Dhananjay Bisen DR Neeraj Sahu DR Brijesh
No ratings yet
MCQ For Data Science Users DR Dhananjay Bisen DR Neeraj Sahu DR Brijesh
17 pages
The A To Z of Machine Learning Your Ulti
No ratings yet
The A To Z of Machine Learning Your Ulti
125 pages
Appropriateness and Limitations of Factor Analysis Methods Utilised in Psychology and Kinesiology-Part 2
No ratings yet
Appropriateness and Limitations of Factor Analysis Methods Utilised in Psychology and Kinesiology-Part 2
13 pages
Sources of Validity Evidence PDF
No ratings yet
Sources of Validity Evidence PDF
14 pages
RESEARCH PAPER NEW Oct 62018
No ratings yet
RESEARCH PAPER NEW Oct 62018
29 pages
Density Based
No ratings yet
Density Based
52 pages
Confirmatory Factor Analysis (CFA) of First Order Factor Measurement Model-ICT Empowerment in Nigeria
No ratings yet
Confirmatory Factor Analysis (CFA) of First Order Factor Measurement Model-ICT Empowerment in Nigeria
8 pages
Data Analysis of RX Sales Using Power Bi
No ratings yet
Data Analysis of RX Sales Using Power Bi
25 pages
Machine Learning in A Nutshell
No ratings yet
Machine Learning in A Nutshell
36 pages
Chapter One 1.1 Background of The Study
No ratings yet
Chapter One 1.1 Background of The Study
30 pages
QBUS5001 (2024S1) Group Assignment (Question)
No ratings yet
QBUS5001 (2024S1) Group Assignment (Question)
5 pages
TOS Prepostmath 9 1st To4th Grading
No ratings yet
TOS Prepostmath 9 1st To4th Grading
2 pages
Linear Regression: Rustom D. Sutaria - Avia Intelligence 2016, Dubai
No ratings yet
Linear Regression: Rustom D. Sutaria - Avia Intelligence 2016, Dubai
3 pages
Regression Analysis
No ratings yet
Regression Analysis
22 pages
Final Dissertation Courage Shoniwa
100% (1)
Final Dissertation Courage Shoniwa
109 pages
Statistic Cheat Sheet
No ratings yet
Statistic Cheat Sheet
3 pages
Outlier Detection Techniques
100% (2)
Outlier Detection Techniques
56 pages
Chapter 3 Research
No ratings yet
Chapter 3 Research
3 pages
Choosing The Right Statistical Test - Types and Examples
No ratings yet
Choosing The Right Statistical Test - Types and Examples
14 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
3 pages
Practicum Report For BBU - Final Edition
100% (3)
Practicum Report For BBU - Final Edition
73 pages
Online Matatu Management Project
No ratings yet
Online Matatu Management Project
44 pages
Planning Data Collection Procedure
No ratings yet
Planning Data Collection Procedure
10 pages
CS402 Data Mining and Warehousing
No ratings yet
CS402 Data Mining and Warehousing
3 pages
Leval of Customer Satisfaction in Bhat Bhateni Super Market
No ratings yet
Leval of Customer Satisfaction in Bhat Bhateni Super Market
7 pages
One-Way ANOVA
No ratings yet
One-Way ANOVA
28 pages
Developing Spreadsheet-Based Decision Support Systems
100% (7)
Developing Spreadsheet-Based Decision Support Systems
1,398 pages
Evaluation of The New York Posture Rating Chart For Assessing Changes in Postural Alignment in A Garment Study
No ratings yet
Evaluation of The New York Posture Rating Chart For Assessing Changes in Postural Alignment in A Garment Study
17 pages
Food and Water Borne Disease
No ratings yet
Food and Water Borne Disease
13 pages