Data Wrangling and Visualization
Data Wrangling and Visualization
DATA VISUALIZATION
Learning Objectives
At the end of this module, learners are expected to:
1. Visualize categorical and numerical variables.
2. Construct and interpret a summary table, bar chart, and pie chart
for categorical variables.
3. Construct and interpret a contingency table and a stacked bar chart
for two categorical variables.
4. Construct and interpret a scatterplot, a bubble plot, histogram, and
a line chart with numerical variables.
5. Calculate and interpret summary measures (Descriptive measures).
6. Use boxplots and z-scores to identify outliers.
The 3 Stages of Business Analytics (Jaggia et al, 2021)
-Optimization
-Simulation
-Regression
-Supervised data mining
-Forecasting
-Data wrangling
-Data visualization
-Unsupervised data mining
Unsupervised Data Mining (Jaggia et al, 2021)
• Unsupervised Data mining techniques is a
clustering method used in data mining.
• It aims to search for patterns and structure
among all the variables.
• Clustering is probably the most common
unsupervised method.
• Clustering is also known as “segmentation” in
the marketing circles.
• Clustering aims to group entities (customers,
companies, cities, or what ever) into similar
clusters based on the values of their
variables.
Business Analytics by Albright and Winston, p.17
Unsupervised Data Mining (Jaggia et al, 2021)
Welcome to Exce
l
Websites to visit:
Excel video training - Office Support (microsoft.com)
My Excel Power
Key tasks in Data Wrangling
2. Data Inspection:
https://github.com/ropensci/plotly/issues/1114
Using the z-scores to check for Outliers:
1. Determine whether 75 is an outlier in a given
distribution with a mean of 60 and standard
deviation is 10.
Solution:
Find the z-score that corresponds to X = 75.
x 75 60
z 1.5
10
X = 75 is not an outlier since it is within the
acceptable interval [-3, 3].
Using the z-scores to check for Outliers: