Unit 2 Part 2
Unit 2 Part 2
•A histogram is a plot that lets you discover, and show, the underlying frequency
distribution (shape) of a set of continuous data.
•This allows the inspection of the data for its underlying distribution (e.g., normal
distribution), outliers, skewness, etc.
•With bar charts, each column represents a group defined by a categorical variable; and
with histograms, each column represents a group defined by a continuous, quantitative
variable.
Histogram
50
45
40
35
No. of Workers
30
25
20
15
10
5
0
0-10 10--20 20-30 30-40 40-50
Wages in Rs
Box Plot
1. A box plot is a method for graphically depicting groups of numerical data through
their quartiles.
2. Box plots may also have lines extending from the boxes (whiskers) indicating variability
outside the upper and lower quartiles, hence the terms box-and-whisker plot and box-
and-whisker diagram.
3. Outliers may be plotted as individual points.
Types:
• Standard Box Plot
• Variable width box plot
• Notched box plot
• Variable width box plots
Standard box plot
Displays :
1. Quartiles Q1,Q3
2. Median , M
3. Max and Min
4. Outliers
x
Outliers
x
Maximum
Q3
Whiskers Median
Q1
Minimum
Important note
•Data sets can sometimes contain outliers that are suspected to be anomalies (perhaps
because of data collection errors).
•If outliers are present, the whisker on the appropriate side is drawn to
Q1- 1.5 * IQR
and
Q3+1.5 * IQR
rather than the data minimum or the data maximum.
•Small circles or unfilled dots are drawn on the chart to indicate where suspected outliers
lie. Filled circles are used for known outliers.
Variable width box plot
Displays :
1. Quartiles Q1,Q3
2. Median , M
3. Max and Min
4. Outliers
5. Sample size
n=100 n=50
Notched box plot
Median
Notch
Min 35.0625(29.8)
Q1 54.225
Med 58.05
Q3 67
Max 86.1625(81.3)
Run Chart
•A run chart, also known as a run-sequence plot is a graph that displays observed data in a
time sequence.
•Often, the data displayed represent some aspect of the output or performance of a
manufacturing or other business process. It is therefore a form of line chart
•By collecting and charting data over time, you can find trends or patterns in the process.
Because they do not use control limits, run charts cannot tell you if a process is stable.
No of complaints
Days
Run chart rules for interpretation
Component Number 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
Out of roundness
Sample 1 4 5 7 6 8 7 7 9 4 6 10 9 9 9 4 3 9 8 3
Out of roundness
Sample 2 4 2 2 4 4 3 3 4 1 3 3 2 3 2 2 2 5 5 2
Example
3 3 4 4 4 5 6 6 7 7 7 8 8 9 9 9 9 9 10
1 2 2 2 2 2 2 2 3 3 3 3 3 4 4 4 4 5 5
Q1 4 2
For 95% confidence interval
Md 7 3 Notch = ± (1.57X IQR)/n0.5
Q3 9 4 1.25XIQR
Sm=
IQR 5 2 1.35X n0.5
Notch= ± Sm X C
Width of Notch :
Sample 1 = ± (1.57X 5)/19 0.5 = ± 1.80
Sample 2 = ± (1.57X 2)/19 0.5 = ± 0.72
Example
Construct a run chart for the following data showing a process parameter. Comment
whether the process shows a common or special causes for variation. Has there been any
significant trend? Offer your comments.
1 0.2 13 0.37
2 0.36 14 0.24
3 0.32 15 0.42
4 0.38 16 0.26
5 0.23 17 0.42
6 0.37 18 0.28
7 0.38 19 0.68
8 0.22 20 0.4
9 0.24 21 0.21
10 0.26 22 0.39
11 0.27 23 0.3
12 0.3
Example
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
Stem and Leaf plot
A Stem and Leaf Plot is a special table where each data value is split into a "stem" (the first
digit or digits) and a "leaf" (usually the last digit).
100 1 1
100 11 22 33 55
120 103 112 126 110 2 2
110 33 66 99 99 99
102 142 119 119 120
120 00 00 11 22 55 66 66 77 99
-0.3 1 2 3 4
-0.2 0 1 5 6 7 8
-0.1 3 4 5 5 6 9 9
0 0 0
0.1 2 3 6 8 8 9 9
0.2 4 4 5 8 9 9
0.3
0.4 0 4 8 9
Key
0.1 2 = 0.12
Example 3
Normal Probability Plot
•The normal probability plot is a graphical technique for assessing whether or not a data set
is approximately normally distributed.
•The data are plotted against a theoretical normal distribution in such a way that the points
should form an approximate straight line.
•There are two ways to assess.
1. On Normal Distribution Probability Paper
2. On regular graph paper
Normal Distribution Probability Paper
Normality
SD (0.84-0.5)
Mean (0.5)
mathisfun.com
Test for Normality and estimate the parameters from sample given below
176 192
191 201
214 190
220 183
205 185
xj f(t)= (j-0.5)/n
j X axis Y axis
1 176 0.05
2 183 0.15
3 185 0.25
4 190 0.35
5 191 0.45
6 192 0.55
7 201 0.65
8 205 0.75
9 214 0.85
10 220 0.95
170 180 190 200 210 220
NPP on regular graph paper
Procedure:
•Arrange your x-values in ascending order.
•Calculate
fi = (i-0.375)/(n+0.25)
where i is the position of the data value in the ordered list and n is the
number of observations.
•Find the z-score for each fi
•Plot your x-values on the horizontal axis and the corresponding z-score
on the vertical axis.
Test for Normality and estimate the parameters from sample given below using
regular graph paper
176 192
191 201
214 190
220 183
205 185
xi
i X axis fi=(i-0.375)/(n+0.25) Z Value
1 176 0.060 -1.55
Random Variable
Example
A soft drink bottler is studying the internal pressure strength of 1 litre glass bottles. A
random sample of 16 bottles is tested and pressure strengths are obtained. The data
collected is shown below. Plot this data on regular graph paper. Does it seem reasonable to
conclude that pressure strength is normally distributed