0% found this document useful (0 votes)
5 views43 pages

Visualize Your Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views43 pages

Visualize Your Data

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Visualize your Data

Ir. Twin Yoshua R. Destyanto, M.Sc., Ph.D.


Industrial Engineering Department

Kuliah Tamu
Depok, 20 November 2024

https://images.app.goo.gl/P2UNLT2jhiPA1xkF8
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary

2
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary

3
Introduction-Definition Subject Weight Height
n (kg) (cm)
• Variable
1 56 168
• Characteristic or attribute that can assume
different values. 2 63 165
• Data 3 70 170
• Values obtained from measurements or
observations related to the variables.
• Data set
• Collection of data values

4
Introduction-Definition Subject Weight Height
n (kg) (cm)
• Statistics
1 56 168
• Science of conducting studies to:
• Collect, 2 63 165
• Organize, 3 70 170
• Summarize,
Avg 63 166.67
• Analyze,
• Draw conclusion from the data. BMI = 22.9
Healthy BMI range:
18.5 kg/m2 - 25 kg/m2
Conclusion: Normal

5
Introduction-Area of Statistics Subject Weight Height
n (kg) (cm)
• Descriptive statistics
1 56 168
• Collection,
• Organization, 2 63 165
• Summarization, 3 70 170
• Presentation of data, Avg 63 166.67
• Describe a situation BMI = 22.9
Healthy BMI range:
18.5 kg/m2 - 25 kg/m2
Conclusion: Normal

6
Introduction-Area of Statistics
• Inferential statistics
• Generalization from samples to
populations, Geller, Andrew I., et al. (2015)

• Performing estimations,
• Hypothesis tests,
• Determining relationships among
variables,
• Making predictions

https://images.app.goo.gl/jzdKnoYcwSwMBL8i6

7
Introduction-Type of Variables
• Qualitative Variables
• Have distinct categories according to
some characteristic or attribute.
• Gender, religion, geographic locations, https://images.app.goo.gl/Gr2s6dht7ew244is7

blood type
• Quantitative Variables
• Can be counted or measured
• Discrete: #students, #population of a city.
• Continuous: temperature, car speed.

8
Introduction-Type of Variables
• Data Visualization
• Readers seek to understand the visual display
• Written aims for charts
• Writing words down, explaining the aim in very specific terms for the chart.
• State clearly and specifically.

9
Introduction-Type of Data
• Nominal
• Exclusive (nonoverlapping) categories
without order or ranking can be
imposed.
• Ordinal
• Categories that can be ranked, but
precise differences do not exist.

10
Introduction-Type of Data
• Interval
• Ranked data, precise differences do exist;
but no true zero.
• Ratio
• all the characteristics of interval data, a
true zero exists.

11
Introduction-Type of Data
• Interval
• Ranked data, precise differences do exist;
but no true zero.
• Ratio
• all the characteristics of interval data, a
true zero exists.

12
Introduction-Type of Variables
• Data Visualization

https://images.app.goo.gl/yZsZN49By5ZUuF5d7 https://images.app.goo.gl/yZsZN49By5ZUuF5d7 https://images.app.goo.gl/6vbg24iiwE4HEKas6

13
Introduction-Type of Variables
• Data Visualization

https://images.app.goo.gl/jzdKnoYcwSwMBL8i6 https://images.app.goo.gl/8AkZLHsLjceFaMVM6

14
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary

15
Histogram
• Bar chart Vs. Histogram

Bar chart Histogram


Categorical data Continuous data
• Each category is • A measured quantity
independent of the • The numbers can be a
others value in a range
Bars for each category Bars are placed next to each
are usually separated by other, to show the
spaces continuous nature of the
variable distribution Ed Swires-Hennessy, 2014

16
Histogram
• For showing the data frequency distribution.
• Raw data organization in table form uses
classes and frequency.
• Categorical frequency distribution
• Data that can be placed in specific categories.
• Nominal or ordinal
• Grouped frequency distribution
• Larger data range,
• Grouped into classes that are more than one
unit in width Ed Swires-Hennessy, 2014

17
Histogram
• Class limits
• Lower-class limit
• The smallest data value that can be included in the
class.
• Upper-class limit
• The largest data value that can be included in the
class.
• Class boundaries
• Numbers used to separate the classes so that
there are no gaps in the frequency distribution.

Ed Swires-Hennessy, 2014

18
Histogram
• Frequency distribution
• Class width
• Subtracting the lower (or upper) class limit of one
class from the lower (or upper) class limit of the
next class.
• Proper classes & proper wide
• 5-20 classes (small vs. large)
• Recommend using odd number of class
• Mutually exclusive or non-overlapping
• Continuous
• Exhaustive or covers all data.
• Equal in width.
Ed Swires-Hennessy, 2014

19
Histogram
• Display the data by using contiguous
vertical (or horizontal) bars.
• Unless the frequency of a class is 0 of
various heights (or lengths).
• Represent the frequencies of the classes

Ed Swires-Hennessy, 2014

20
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia

https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9

21
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia

https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9

22
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary

23
Position and Boxplot
• Measures of Population
• Function
• Locate the relative position of a data
value in the dataset/distribution
• Standard score
• Percentiles
• Deciles
• Quartiles
Bluman, 2017

24
Position and Boxplot
• Quartiles
• Quartiles divide the distribution into four equal groups, denoted by Q1, Q2,
Q3
• Quartiles can be used as a rough measure of variability
• Interquartile range (middle 50% of the data values)

Bluman, 2017

25
Position and Boxplot
• Quartiles

Bluman, 2017

26
Position and Boxplot
• Quartiles

Bluman, 2017

27
Position and Boxplot
• Quartiles

Bluman, 2017

28
Position and Boxplot
• Outliers
• An extremely high or an extremely low data value when compared with the
rest of the data values.
• Strongly affect the mean and standard deviation
• Sources:
• Measurement or observational error
• A recording error
• From a subject outside the defined population.
• Value that occurred by chance

Bluman, 2017

29
Position and Boxplot
• Outliers

outlier IQR = Q3 − Q1 outlier


<Q1 −1.5(IQR) >Q3 +1.5(IQR)

https://images.app.goo.gl/tQyqq3Knsxs
LiPNo8

Bluman, 2017

30
Position and Boxplot
• Box Plot/Box & Whisker Plot
• Graphical display that simultaneously describes several important features of
a data set,
• Center,
• Spread,
• Departure from symmetry/skewness,
• Unusual observations or outliers
• Displays
• Three quartiles,
• Minimum, & the maximum of the data on a rectangular box aligned either horizontally or
vertically.
• The box encloses the interquartile range.

31
Position and Boxplot

Q1 - Q1 - Q3 + Q3 +
3IQR 1.5IQR 1.5IQR 3IQR

Bluman, 2017

32
Position and Boxplot
• Box plot corresponds to histogram
• Histogram only shows mode,
dispersion, skewness.
• Box plot provides more Box plot for compressive strength data

information: median, quartiles,


skewness, outlier

Bluman, 2017

33
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia

https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9

34
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia

https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9

35
Position Related Visualization
• Interpretation
• Too much variability at plant 2
• Plants 2 & 3 need to improve the
quality index performance
Box plot for compressive strength data

Comparative box plots of a quality index at three plants

Bluman, 2017

36
Histogram
• 125 times of airborne measurement
• Total radiation
• Istrian Peninsula Croatia

https://images.app.goo.gl/zWXZ5SaJ38p6ct2a9

37
Outline
• Introduction
• Histogram
• Position and Boxplot
• Summary

38
Outline
• Chart Interpretation
• Histogram
• Pie Chart
• Pictogram
• Pareto Diagram
• Summary

39
• A chart must represent the data clearly and
accurately.
• Before starting to create a chart, have a clear and
specific written aim.
Summary • Histogram is a graph for showing data distribution.
• To visualize the position of the data we can use
boxplot.

40
1. Collect the data from these variables
a) Sex
b) Body height
c) Body weight
2. Show the distributions of weight and height from
Exercises the dataset.
a) Distribution for all population
b) Distribution from each sex group
3. Show the important features of the data set
using visualization and compare between sex
groups.
4. Interpret based on the charts you draw.
41
1. Swires-Hennessy, E. (2014). Presenting data: how to
communicate your message effectively. John Wiley & Sons.
2. Tague, N. R. (2004). The Quality Toolbox, American
Society for Quality.
3. Davis, J. C., & Sampson, R. J. (2022). Statistics and data
References analysis in geology. New York: Wiley.

42
Thank you

43

You might also like