Visualize Your Data
Visualize Your Data
Kuliah Tamu
Depok, 20 November 2024
https://images.app.goo.gl/P2UNLT2jhiPA1xkF8
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary
2
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary
3
Introduction-Definition Subject Weight Height
n (kg) (cm)
• Variable
1 56 168
• Characteristic or attribute that can assume
different values. 2 63 165
• Data 3 70 170
• Values obtained from measurements or
observations related to the variables.
• Data set
• Collection of data values
4
Introduction-Definition Subject Weight Height
n (kg) (cm)
• Statistics
1 56 168
• Science of conducting studies to:
• Collect, 2 63 165
• Organize, 3 70 170
• Summarize,
Avg 63 166.67
• Analyze,
• Draw conclusion from the data. BMI = 22.9
Healthy BMI range:
18.5 kg/m2 - 25 kg/m2
Conclusion: Normal
5
Introduction-Area of Statistics Subject Weight Height
n (kg) (cm)
• Descriptive statistics
1 56 168
• Collection,
• Organization, 2 63 165
• Summarization, 3 70 170
• Presentation of data, Avg 63 166.67
• Describe a situation BMI = 22.9
Healthy BMI range:
18.5 kg/m2 - 25 kg/m2
Conclusion: Normal
6
Introduction-Area of Statistics
• Inferential statistics
• Generalization from samples to
populations, Geller, Andrew I., et al. (2015)
• Performing estimations,
• Hypothesis tests,
• Determining relationships among
variables,
• Making predictions
https://images.app.goo.gl/jzdKnoYcwSwMBL8i6
7
Introduction-Type of Variables
• Qualitative Variables
• Have distinct categories according to
some characteristic or attribute.
• Gender, religion, geographic locations, https://images.app.goo.gl/Gr2s6dht7ew244is7
blood type
• Quantitative Variables
• Can be counted or measured
• Discrete: #students, #population of a city.
• Continuous: temperature, car speed.
8
Introduction-Type of Variables
• Data Visualization
• Readers seek to understand the visual display
• Written aims for charts
• Writing words down, explaining the aim in very specific terms for the chart.
• State clearly and specifically.
9
Introduction-Type of Data
• Nominal
• Exclusive (nonoverlapping) categories
without order or ranking can be
imposed.
• Ordinal
• Categories that can be ranked, but
precise differences do not exist.
10
Introduction-Type of Data
• Interval
• Ranked data, precise differences do exist;
but no true zero.
• Ratio
• all the characteristics of interval data, a
true zero exists.
11
Introduction-Type of Data
• Interval
• Ranked data, precise differences do exist;
but no true zero.
• Ratio
• all the characteristics of interval data, a
true zero exists.
12
Introduction-Type of Variables
• Data Visualization
13
Introduction-Type of Variables
• Data Visualization
https://images.app.goo.gl/jzdKnoYcwSwMBL8i6 https://images.app.goo.gl/8AkZLHsLjceFaMVM6
14
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary
15
Histogram
• Bar chart Vs. Histogram
16
Histogram
• For showing the data frequency distribution.
• Raw data organization in table form uses
classes and frequency.
• Categorical frequency distribution
• Data that can be placed in specific categories.
• Nominal or ordinal
• Grouped frequency distribution
• Larger data range,
• Grouped into classes that are more than one
unit in width Ed Swires-Hennessy, 2014
17
Histogram
• Class limits
• Lower-class limit
• The smallest data value that can be included in the
class.
• Upper-class limit
• The largest data value that can be included in the
class.
• Class boundaries
• Numbers used to separate the classes so that
there are no gaps in the frequency distribution.
Ed Swires-Hennessy, 2014
18
Histogram
• Frequency distribution
• Class width
• Subtracting the lower (or upper) class limit of one
class from the lower (or upper) class limit of the
next class.
• Proper classes & proper wide
• 5-20 classes (small vs. large)
• Recommend using odd number of class
• Mutually exclusive or non-overlapping
• Continuous
• Exhaustive or covers all data.
• Equal in width.
Ed Swires-Hennessy, 2014
19
Histogram
• Display the data by using contiguous
vertical (or horizontal) bars.
• Unless the frequency of a class is 0 of
various heights (or lengths).
• Represent the frequencies of the classes
Ed Swires-Hennessy, 2014
20
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia
https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9
21
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia
https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9
22
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary
23
Position and Boxplot
• Measures of Population
• Function
• Locate the relative position of a data
value in the dataset/distribution
• Standard score
• Percentiles
• Deciles
• Quartiles
Bluman, 2017
24
Position and Boxplot
• Quartiles
• Quartiles divide the distribution into four equal groups, denoted by Q1, Q2,
Q3
• Quartiles can be used as a rough measure of variability
• Interquartile range (middle 50% of the data values)
Bluman, 2017
25
Position and Boxplot
• Quartiles
Bluman, 2017
26
Position and Boxplot
• Quartiles
Bluman, 2017
27
Position and Boxplot
• Quartiles
Bluman, 2017
28
Position and Boxplot
• Outliers
• An extremely high or an extremely low data value when compared with the
rest of the data values.
• Strongly affect the mean and standard deviation
• Sources:
• Measurement or observational error
• A recording error
• From a subject outside the defined population.
• Value that occurred by chance
Bluman, 2017
29
Position and Boxplot
• Outliers
https://images.app.goo.gl/tQyqq3Knsxs
LiPNo8
Bluman, 2017
30
Position and Boxplot
• Box Plot/Box & Whisker Plot
• Graphical display that simultaneously describes several important features of
a data set,
• Center,
• Spread,
• Departure from symmetry/skewness,
• Unusual observations or outliers
• Displays
• Three quartiles,
• Minimum, & the maximum of the data on a rectangular box aligned either horizontally or
vertically.
• The box encloses the interquartile range.
31
Position and Boxplot
Q1 - Q1 - Q3 + Q3 +
3IQR 1.5IQR 1.5IQR 3IQR
Bluman, 2017
32
Position and Boxplot
• Box plot corresponds to histogram
• Histogram only shows mode,
dispersion, skewness.
• Box plot provides more Box plot for compressive strength data
Bluman, 2017
33
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia
https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9
34
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia
https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9
35
Position Related Visualization
• Interpretation
• Too much variability at plant 2
• Plants 2 & 3 need to improve the
quality index performance
Box plot for compressive strength data
Bluman, 2017
36
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia
https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9
37
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary
38
Outline
• Chart Interpretation
• Histogram
• Pie Chart
• Pictogram
• Pareto Diagram
• Summary
39
• A chart must represent the data clearly and
accurately.
• Before starting to create a chart, have a clear and
specific written aim.
Summary • Histogram is a graph for showing data distribution.
• To visualize the position of the data we can use
boxplot.
40
1. Collect the data from these variables
a) Sex
b) Body height
c) Body weight
2. Show the distributions of weight and height from
Exercises the dataset.
a) Distribution for all population
b) Distribution from each sex group
3. Show the important features of the data set
using visualization and compare between sex
groups.
4. Interpret based on the charts you draw.
41
1. Swires-Hennessy, E. (2014). Presenting data: how to
communicate your message effectively. John Wiley & Sons.
2. Tague, N. R. (2004). The Quality Toolbox, American
Society for Quality.
3. Davis, J. C., & Sampson, R. J. (2022). Statistics and data
References analysis in geology. New York: Wiley.
42
Thank you
43