Control Charts For Non-Normal Data: Illustrative Example From The Construction Industry Business
Control Charts For Non-Normal Data: Illustrative Example From The Construction Industry Business
Abstract: - Statistical Process Control charts widely used in industry and services by quality professionals
require that the quality characteristic being monitored is normally distributed. If, in contrast, the distribution of
this characteristic is not normal, any conclusion drawn from control charts on the stability of the process may
be misleading and erroneous. In this paper, an alternative approach has been suggested that is based on the
identification of the best distribution that would fit the data. Specifically, the Johnson distribution was used as a
model to normalize real field data that showed departure from normality. Real field data from the construction
industry was used as a case study to illustrate the proposed analysis.
Key-Words: - Statistical Process Control, Shewhart control charts, non-normal data, Johnson System of
distributions
ISBN: 978-960-474-372-8 71
Mathematical and Computational Methods in Science and Engineering
ISBN: 978-960-474-372-8 72
Mathematical and Computational Methods in Science and Engineering
3 Mathematical Formulation of the has the flexibility to match any feasible set of values
for the mean, variance, skewness, and kurtosis
Johnson`s Distributions coefficients. With this system, the skewness and
As stated earlier, when process data exhibit non-
kurtosis also uniquely identify the appropriate form
normal distribution, it is erroneous to draw
for the (g) function.
standards control charts for process improvement
and perform capability analysis. The practical
3.1 Johnson's Translation System:
solution is to transform the data and drive them
Johnson proposed three normalizing transformations
towards normality, using common and well
having the general form:
established probability distributions, such as Box-
Cox, log-normal or the Johnson distribution. Such
X −µ
an approach has been used in the open literature. Z = γ +σ f ,.........(1)
Basically the Johnson transformation computes an λ
optimal transformation function from three flexible
Where f (.) denotes the transformation function,
Z is a standard normal random variable γ and σ
distribution families (SU, SB, and SL). This makes
this transformation more powerful than other
distribution (Sherill and Johnson, 2009). are shape parameters, λ is a scale parameter and µ
is a location parameter. Without loss of generality, it
is assumed that σ 0 and λ 0 .
The first transformation proposed by Johnson
defines the lognormal system of distributions
denoted by S L :
X −µ *
Z = γ + σ ln = γ + σ ln ( X − µ ) , X µ , ...(2)
λ
X −µ
Z = γ + σ ln = µ X µ + λ , ..........(3)
µ +λ − X
σ X − µ X − µ −1 X − µ
2
Z = γ + σ ln + + 1 = γ + σ sinh ,
λ λ λ
Where: a and b are shape parameters, µ is a
location parameter, and g (x) is a function ................................ − ∞ X +∞.........................................(4)
defining the Johnson system of families,
determined as: S
The U curves are unbounded and cover the t
and normal distributions, among others.
ln ( x ) , for the lognormal family,
( )
ln x + x + 1 , for the unbounded family,
2 3.2 Johnson's Family of Distributions:
The Johnson family of distributions is made up of
g ( x) = x
ln , for the bounded family,
three distributions, Johnson U , Johnson S B and
S
1- x
x , for the normal family.
lognormal. It covers any specified average,
standard deviation, skewness and kurtosis. Together
As discussed in [Johnson, 1949], the above system
they form 4-parameter family distributions that
ISBN: 978-960-474-372-8 73
Mathematical and Computational Methods in Science and Engineering
cover the entire skewness-kurtosis region other than Table 1 – Data for compressive strength for Ready
S Mixed Concrete (Kgf/cm2)
the impossible region. The Johnson U distribution
covers the area above the lognormal curve and the Sample Cylinder 1 Cylinder 2 Cylinder 3
Johnson S B covers the area below the normal curve. 1 353.8 363 360.6
A family of distributions is several distributions 2 357.8 358.7 370.9
combined so that they cover a well defined region in 3 365.2 360 356.6
a skewness and kurtosis plot (lognormal family of 4 340.4 335.2 330.1
distributions, negative lognormal and normal 5 359.6 358.1 351.2
distributions,..). Readers can find detailed 6 368.1 366.7 369.3
developments about the Johnson family of 7 357.9 355.0 350.6
distributions in reference books (Gerald and 8 337.8 352.6 361.6
Samuel, 1967). 9 359.1 349.2 363.7
This family of distributions is usually 10 361.1 358.2 358.3
parameterized as a function of skewness and 11 358.3 345.7 341.7
kurtosis. Skewness is a measure of non symmetry in 12 357.3 359.2 356.9
the data, so for a normal distribution it takes the 13 352.6 363.1 374.6
value of zero. Negative values for the skewness 14 360.8 356.2 352.7
indicate that data are skewed left, and positive 15 347.5 339.8 354.3
values indicate that data are skewed right. On the
16 358.2 359.5 353.9
other hand, kurtosis is a measure of whether the data
17 375.2 372.5 370.2
are peaked or flat relative to a normal distribution.
18 357.5 359.5 348.9
The kurtosis for a normal distribution is 3.0. A
kurtosis value larger than 3.0 indicates a “peaked” 19 343.2 355.8 362.4
distribution and a kurtosis value less than 3.0 20 362.1 356.6 359.1
indicates a “flat” distribution. Thus, both can be 21 365.2 362 359.4
seen as measures of shape of the distributions. 22 361.3 346.8 339.0
X=356.66
characteristic was the compressive strength
(kgf/cm2) of concrete as defined by international 350
ISBN: 978-960-474-372-8 74
Mathematical and Computational Methods in Science and Engineering
distributions. It is very obvious that the exponential Real field data from the construction industry
distribution is a poor model for the concrete data. was used as a case study to illustrate the analysis.
The Johnson distribution would be an alternative for The assumption of normality when the data were not
the model (Kilink et al, 2012, Sherill and Johnson, normally distributed led to conclude that the
2009). The transformed data by the Johnson system monitored process was out of statistical control,
are illustrated in figure 4, where it can be seen that indicating that some special causes are present in the
this distribution shown as a mixture would be the process, which would require some intervention
best model of these concrete data. From this figure, from management on the process to get rid of the
it can be seen that within the interval percentile special cause of variation to occur again. This would
ranging from 1.054 to 98.94, would be the best fit of certainly cost the organization some cost. However,
the data. Normality within this interval can be when the data were transformed and brought to
guaranteed. These correspond to the lower control normality through Johnson transformations, and the
limit and the upper control limit for the normalized new control limits calculated, the new control chart
data which are UCL=375.2 (kgf/cm2) and indicated no sign of special causes of variation.
LCL=330.1 (kgf/cm2). These control limits will be
used as the new control limits for the X chart as Figure 4 – Probability Plots for the Johnson
shown in figure (5). It is clearly shown that the Transformed data of Concrete Strength
control chart with the new control limits indicate
totally the opposite of the early conclusion drawn Johnson Transformation for RMC350
P r obability P lot for Or iginal Data Select a T r ansfor mation
from the standard control chart. The process is 99.9
N 66
0.5
50 0.2
Figure 3 – Probability Plots for the Concrete 0.1 Ref P
10
Compressive Strength 0.0
1 0.2 0.4 0.6 0.8 1.0 1.2
0.1 Z Value
320 340 360 380 (P-Value = 0.005 means <= 0.005)
Probability Plot for RMC350
Goodness of F it Test P r obability P lot for T r ansfor med Data
Normal - 95% C I Exponential - 95% C I
99.9
N 66
99.9 99.9 Normal 99 AD 0.353
99 90 A D = 1.245 P-Value 0.455 P-V alue for Best F it: 0.454685
P-V alue < 0.005 90 Z for Best F it: 0.5
90 50
Best Transformation Ty pe: SU
Percent
P er cent
P er cent
90 99
P er cent
ISBN: 978-960-474-372-8 75
Mathematical and Computational Methods in Science and Engineering
Acknowledgment
The present research work has been undertaken
within the Bin Laden Research Chair on Quality and
Productivity Improvement in the Construction
Industry funded by the Saudi Bin Laden
Constructions Group; this is gratefully
acknowledged. The opinions and conclusions
presented in this paper are those of the authors and
do not necessarily reflect the views of the
sponsoring organization.
References:
[1] Nicholas R. Farnum., Using Johnson Curves to
Describe Non-Normal Process Data, Quality
Engineering, Vol. 9, No. 2, December, 1996,
329-336.
[2] Chou.Y, A.M. Polansky, and R.L. Mason,
Transforming Non-normal Data to Normality in
Statistical Process Control", Journal of Quality
Technology, Vol. 30, April, 1998pp 133-141.
[3] Sherill, R. W. and Johnson, L. A., Calculated
Decisions, Quality Progress, Vol. 42 (1), 2009,
pp. 30-35.
[4] Derya, K and Canan, H., Control Charts for
Skewed Distributions: Weibull, Gamma and
Lognormal, Metodoloski zvezki, Vol. 9, N. 2,
2012 pp. 95-106.
[5] Johnson, N.L., Systems of frequency curves
generated by methods of translation,
Biometrika, Vol. 36, 1949, 149-176.
[6] Hahn J. Gerald and Shapiro S. Samuel,
Statistical models in Engineering, John Wiley
and Sons, 1967.
[7] Johnson, N. L., Kotz, S., and Balakrishnan, N.,
Continuous Univariate Distributions, Second
Edition, New York: John Wiley & Sons., 1994.
[8] Kilink, K, Celik, A., Tuncan, M., Tuncan, A.,
Arslan, G. and Arioz, O., Statistical
distributions of in situ microcore concrete
strength, Construction & Building Materials,
Vol. 26 Issue 1, Jan2012, p393-403.
[9] ACI Committee 214, Evaluation of Strength
Test Results of Concrete (ACI 214R-02), 2005.
ISBN: 978-960-474-372-8 76