05.1 Data Organization PRESENTATION
05.1 Data Organization PRESENTATION
3
Presentation of Data
• Data obtained from experiments and investigations are
often large and needs to be condensed into a suitable form
for extracting any meaningful information.
• It is typical to group the data and present them in tabular
or graphical form.
• Graphical presentations are often the most effective way of
communicating the information.
Raw data:
71 91 63 99 93 88 95 63 67
76 65 68 82 81 83 61 77 100
87 68 85 60 98 89 80 78 82
8
90-100
Frequency
6
80-89
4
70-79
2
60-69 0
60-69 70-79 80-89 90-100
Grade Range
4
Pareto Diagrams
• A diagram that contains both a bar chart and a line graph.
• The bar chart represents the individual values, and often in
descending order.
• The cumulative total, or percentage (%), is represented by
the line graph.
• The purpose of the Pareto diagram is to highlight the most
important among a set of factors.
Fault Frequency
Power fluctuations 6
Unstable controller 22
Operator error 13
Worn tool 2
other 5
5
Dot Diagrams
3 6 -2 4 7 4 3
• Dot plots are usually used for small data sets. They are
useful for highlighting clusters, gaps, skews in distribution,
and outliers.
38.9 58.0 96.3 122.2 155.6 333.3 3408.0
6
Frequency Distributions
• A frequency distribution is a table that divides a set of data
into a suitable number of classes (or categories), showing
also the number of items belonging to each category.
• This grouping often highlights some important features of
the data.
• Once the data have been grouped, each observation has lost
its identity in the sense that its exact value is no longer
known.
• The first step in constructing a frequency distribution
consists of deciding how many classes to use and choosing
the class limits for each class.
• Use between 5 and 15 different classes.
• The different classes should:
• Not overlap
• Accommodate all the data
• Have the same width
Raw data:
245 333 296 304 276 336 289 234 253 292
366 323 309 284 310 338 297 314 305 330
266 391 315 305 290 300 292 311 272 312
315 355 346 337 303 265 278 276 373 271
308 276 364 390 298 290 308 221 274 343
Classes Frequency
206 – 245 3
246 – 285 11
286 – 325 23
326 – 365 9
366 – 405 4
7
Frequency Distributions
Classes Frequency
(205,245] 3
(245,285] 11
(285,325] 23
(325,365] 9
(365,405] 4
8
Cumulative Distributions
• An alternative form of distributions into which data are grouped.
• A cumulative “less-than-or-equal-to” distribution shows the total
number of observations that are less than or equal to the given
values.
• A cumulative “less-than” distribution is when the class includes
the left-hand endpoint but not the right-hand endpoint.
• A cumulative “greater-than” distribution are similarly
constructed by adding the frequencies, one by one, starting at the
end of the frequency distribution.
Classes Frequency Cumulative Cumulative
(≤) (≥)
206 – 245 3 3 50
246 – 285 11 14 47
286 – 325 23 37 36
326 – 365 9 46 13
366 – 405 4 50 4
Percentage Distributions
• Comparing distributions can be easily done if they are each
converted in percentage distributions.
• This is accomplished by dividing each class frequency by the total
frequency (or number of observations) and multiply by 100.
• The result of the percentage of data that falls into each class of the
distribution.
Classes Frequency Frequency (%) Cumulative
(≤)
206 – 245 3 6% 6%
246 – 285 11 22% 28%
286 – 325 23 46% 74%
326 – 365 9 18% 92%
366 – 405 4 8% 100%
9
Graphs of Frequency Distributions
Classes Cumulative
(205,245] 3
(245,285] 14
(285,325] 37
(325,365] 46
(365,405] 50
10
Stem-and-Leaf Displays
• The previous methods involved the grouping of large sets of
data to present them in a manageable form.
• This entailed some loss of information.
• To avoid the loss of information, the following stem-and-
leaf display can be used to keep track of the last digits of
the readings within each class.
• The stem is the left-hand column which contains the tens
digits.
• Each number to the right of the vertical line is a leaf.
• For example, the first row corresponds to the data 12, 17,
and 15.
Raw data:
29 44 12 53 21 34 39 25 48 23
17 24 27 32 34 15 42 21 28 37 10 – 19 2 7 5
20 – 29 9 1 5 3 4 7 1 8
Classes Frequency 30 – 39 4 9 2 4 7
10 – 19 3 40 – 49 4 8 2
20 – 29 8 50 – 59 3
}
30 – 39 5
}
40 – 49 3
stem leaves
50 – 59 1
12
Descriptive Measures (Example 1)
11 9 17 19 4 15
13
Descriptive Measures
• The greater the variance, the larger the overall data range.
• The calculation of variance uses squares and thus weights
outliers more heavily than data very near the mean.
• The standard deviation of the observations is the square
root of the variance. It is more commonly used than the
variance since it can be expressed in the same units as the
observation.
14
Descriptive Measures (Example 2)
S2=0.055
S=0.235 15
Quartiles and Percentiles
16
Quartiles and Percentiles (Example 3)
Find the 1st quartile, 2nd quartile, 3rd quartile, and the 93rd
percentile for the following ordered data.
221 234 245 253 265 266 271 272 274 276
276 276 278 284 289 290 290 292 292 296
297 298 300 303 304 305 305 308 308 309
310 311 312 314 315 315 323 330 333 336
337 338 343 346 355 364 366 373 390 391
Q1=278
Q2=304.5
Q3=323 17
P0.93=366
Boxplots
• The summary information
contained in the quartiles is
highlighted in a graphic display
called a boxplot.
• The center half of the data,
extending from the 1st to the 3rd
quartile, is represented by a
rectangle.
• The median is identified by a bar
within the box.
• A line extends from the 3rd
quartile to the maximum, and
another line extends from the 1st
quartile to the minimum of the
data set.
18
Descriptive Measures (Grouped Data)
19