Regression Control Chart For Two Related Variables: A Forgotten Lesson
Regression Control Chart For Two Related Variables: A Forgotten Lesson
net/publication/323369901
CITATIONS READS
3 2,222
1 author:
SEE PROFILE
All content following this page was uploaded by Fazel Hayati, PhD on 04 March 2019.
Fazel Hayati
School of Business,
Edgewood College,
1000 Edgewood College Drive Madison,
WI 53711-1997, USA
Email: [email protected]
Abstract: The purpose of this paper is to introduce the regression control chart
for two related variables and to furnish rules for a plan of action. Cases are
presented to show the effectiveness of the regression control chart.
A Regression control chart for two related variables integrates linear regression
and Shewhart quality control chart theory and methods in a coherent way. The
regression control chart combines two robust methods as the basis of analysis:
the linear regression model and the Shewhart control chart theory. Other
researchers are encouraged to investigate related variables using a regression
control chart as there is a lack of published work in this area.
1 Introduction
The practice and application of conventional Shewhart control charts in the industrial and
service processes have a rich history with extensive published literature. However,
conventional control charts deal with techniques and applications where only one
variable characteristic is to be analysed and controlled. But when a factual and logical
relationship between two variables exists, a conventional control chart may not be
sufficient to analyse the variation since variation may enter the process in two ways and a
control chart using one variable accounts for one source of variation only. To study the
source of variation from the second variable a regression control chart can be used. The
regression control chart combines two robust methods as the basis of analysis: linear
regression model and Shewhart control chart theory. The purpose of this paper is to
introduce regression control chart for two related variables and to furnish rules and plan
of action for management decision making under different circumstances. This research
relies on Shewhart control chart theory and further builds on previous works of Wallis
and Roberts (1956) and Mandel (1969) as research in this area has received very little
attention in over six decades since its introduction. Cases are presented to show the
effectiveness of regression control chart. We hope this paper will reignite interest in
regression control chart as an effective method for quality improvement and management
decision making.
Walter A. Shewhart developed control charts in the 1920’s at Bell Laboratories and
published his work in his classic book Economic Control of Quality of Manufactured
Products (Shewhart, 1931). Shewhart control charts have been proven to be an effective
way to improve industrial and service processes. One of Shewhart’s greatest
contributions is to characterise the sources of process variation into two categories:
chance causes of variation and assignable causes of variation. The eminent statistician
Edwards Deming expanded on an understanding of variation as the basis for management
decision making (Deming 1986, 1994). For pedagogical reasons, Deming replaced
Shewhart’s definitions with common causes of variation and special causes of variation.
A process that exhibits variation from common causes is said to be in statistical control, a
predictable process within limits. Conversely, a process that exhibits variation due to
special causes is considered not in statistical control and is therefore unpredictable.
Figure 1 shows a conventional control chart.
The distinction between the two sources of variation is paramount in process analysis
as they signify two different types of action. Variation due to common causes will require
consideration of many sources and their interaction. When variation due to the special
cause is present, the action is specific and focused on the special cause. The confusion of
the two sources of variation can lead to two types of errors. The error I occurs when one
reacts to a data value as if it came from a special cause when in fact it came from
common causes of variation. Error II occurs when one reacts to a data value as if it came
from common causes of variation when in fact it was due to a special cause. In order to
distinguish between the two sources of variation and to facilitate the appropriate action,
Shewhart provided the framework by establishing the three sigma limits ( X ± 3σ x ) as the
boundaries to define the types of variation. Three-sigma limits are the process boundaries
used to distinguish between these two types of variation. When data values behave
randomly within the three sigma limits it represents common cause variation, that is, a
constant system of causes. Any value outside the three sigma limits is deemed to be
264 F. Hayati
caused by special cause(s) of variation. Therefore, the three sigma limits provide a
balance to minimise the chances of making errors I or II. The basis of the three sigma
limits, Shewhart argued, are mathematical theory, empirical evidence and practical
experience (Shewhart, 1931).
Although Shewhart suggested a value beyond the three sigma limits as a signal of special
causes of variation, other signals have been suggested (Western Electric Quality Control
Handbook, 1985; Nelson, 1986; Wheeler and Chambers, 1992).
Individual and Moving Range (also known as ImR or XmR) control chart is most
suitable when the data is collected periodically, e.g., daily or monthly. The method used
to estimate the three-sigma limits, lower control limit (LCL) and upper control limit
(UCL), for the individual values are LCL = X − 2.66 mR and UCL = X + 2.66 mR
1 1
where X =
n ∑ x is the average of individual values, and mR =
n −1 ∑mR is the
average moving range. For the moving range chart, UCL = 3.26 mR and LCL is set to
zero. In cases where the average moving range is affected by very large moving range
values, it is recommended to use median moving range ( mR j ). The control limits for
individual and median moving range control chart is calculated as X ± 3.14 mR j . For a
detailed study of individual and moving range control chart, the reader is encouraged to
review Wheeler and Chambers (1992) and Wheeler (1995).
Although Shewhart examined the relationship between two variables from the viewpoint
of quality and control in his book Economic Control of Quality of Manufactured Product
(Shewhart, 1931) the literature shows very little follow up. Using the line of regression
and three standard deviations ( s y = σ y 1 − r 2 ), Shewhart studied the relationship
between the depth of penetration in inches against the depth of sapwood in inches
(Shewhart, 1931). Later, a similar method was presented in the Western Electric Quality
Control Handbook (1985) for placing control lines on a regression line.
DiPaola (1945) is the earliest paper showing the relationship between two variables
on the assembly line using regression line and three standards of error as control limits.
He showed how the relationship between variables is used for quality control instead of
traditional attribute control chart. Later other authors presented cases to deal with various
Regression control chart for two related variables 265
aspects of this topic: Bemesdefer (1953) related the ratio of two variables to percent
cracks in disc manufacturing process using regression and three sigma limits, Jackson
(1956) related two variables by proposing a joint control chart by analysing the bivariate
control region, Sandon (1956) used a regression control chart for use in personnel
selection where candidates for a job were evaluated based on ‘scores’ referring to certain
characteristics and ‘marks’ which are the results of a common examination. Weis (1957)
related two variables using a two-way control chart by using the coefficient of correlation
on various samples from two products. Mansfield and Wein (1958) proposed a regression
control chart for the cost by first using multiple regression followed by a traditional
control chart for the residuals. These papers, for the most part, can be described as
dealing mostly with simultaneous control of two correlated variables or in some cases
they may be called residual control charts.
Wallis and Roberts (1956) presented the first case where they examined a factual and
rational relationship between two variables, duration of travel and travel expense, using
regression line and control limits. Their work contains many elements of regression
control chart. A more extensive treatment of regression control chart was given by
Mandel (1969) where the results of a study of manhours required to process pieces of
mail at 74 post offices were examined using regression control chart. Mandel’s paper is
the last major work published on this topic. Although regression control chart for two
related variables is an effective way of analysing two related process variables and a
useful aid to management decision making, its development and application in the
literature has been neglected.
It is noted that the literature is rich in various methods in multivariate control charts
since first appeared in the work by Hotteling (Hotelling, 1947). Readers interested in
multivariate control charts may review Jackson (1959), Woodall and Neube (1985), Ryan
(1989) and Lowery and Montgomery (1995). Our extensive literature search of refereed
journals has resulted no published work on the topic presented in this paper since Mandel
(1969), albeit, there has been considerable research in various aspects of control chart for
multivariate analysis. It is reiterated here that the emphasis in this paper is the application
of regression control chart to two related variables.
Regression modelling and analysis dates back to Sir Francis Galton and it has been used
by researchers for over a century with wide application in agriculture, biological sciences,
medicine, experimental design, industrial and service processes. Although there are many
approaches to various regression analyses, this paper deals with linear regression between
two variables. In linear regression analysis, the aim is to determine and measure the
relationship between two variables – the dependence of a variable Y on the independent
variable X . The regression line, y = a + bx also known as the least square line is
generally estimated and correlation analysis compares the values of the dependent
variable for different values of the independent variable using the coefficient of
correlation, r .
In regression analysis, a distinction must be made between association and causation
when two variables are compared. In association, using a scatter plot, the aim is first to
investigate if the paired values of the two variables move in the same direction:
266 F. Hayati
• It is judged that the relationship to be studied is factual, logical and rational. That is,
there exists dependency between the two variables; relationship means dependency
in this context. The value of independent variable y is dependent on the magnitude
of the dependent variable x.
• There is an interest in studying the impact of variation from the independent variable
on the characteristic of the dependent variable under consideration.
• Coefficient of correlation (r ) can be used to show the degree of relationship. The
correlation coefficient is a statistical measure of the strength of the linear relationship
between two variables.
• Estimate the line of regression of y on x; that is, y = a + bx.
Next, Shewhart control chart theory can be used to calculate and place the control
limits:
• Determine the standard error of estimate, σ e (others have used symbols Sy and Se)
where σ e = σ y 1 − r 2 . σ y is the standard deviation of the dependent variable y and
r is the coefficient of correlation between y and x.
• Place parallel control limits at ±3σ e from the regression line; that is, y ± 3σ e . The
area between the limits is considered the allowable variation from the independent
variable.
• Points within the control limits are considered as common cause variation. Any
values above or below the parallel control limits are deemed assignable (special)
causes, therefore, warrant investigation.
The method to estimate regression line and the control limits in this section serve to
present the foundations and for pedagogical reasons. Any reliable software package will
facilitate the calculation of these values less laboriously.
It is imperative not to draw any conclusions regarding the relationship between the
two variable characteristics until the coefficient of correlation is tested for significance.
For the purpose of this paper the guidelines suggested in the Western Electric Quality
Regression control chart for two related variables 267
Control Handbook (1985) are used to test the level of significance between the two
variables as follows:
Figure 2 shows the regression control chart of COGS against revenue. As the chart
indicates all values are in control and no special causes of COGS present. It can be said
that in this case the regression control chart is used as a performance indicator and the
three sigma limits give the range of variation allowed in cost that is expected for any
monthly revenue value and we can show the range of cost expected from a certain
revenue amount. For example, if the monthly revenue is $5,000,000 then the company
should expect a COGS value between $2,937,430 and $4,590,130. The $1,652,700 range
($4,590,130 – $2,937,430) may appear too large, however, the management must accept
this range of variation for a cost under current operating conditions and to focus on
process improvement to narrow the variation.
268 F. Hayati
Figure 2 Regression control chart of COGS against revenue (see online version for colours)
Three Sigma Limits: Control Limits = ( 70.0 + 0.9 × Bed Size ) ± 129.6 .
Figure 3 shows the regression control chart of survey hours against bed size. As the chart
shows special cause variation is present: three values are on or above UCLs. This
indicates some surveys are atypical and must be investigated. Further examination
showed that the survey numbers 8, 44 and 52 are facilities that have received “marks”;
that is, complaints registered against the facility. In such circumstances usually,
additional time is required to complete the survey. In this case, parallel control limits
provide the boundaries to detect out of control surveys where excessive hours were used.
Regression control chart for two related variables 269
These exceptional values may also offer management opportunities for improvement.
This case demonstrates that the regression control chart can be used for management by
exception. The exceptions, in this case, were due to excessive use of resources and it
appears to be reasonable due to “marks.” However, there are situations when the
exception is unfavorable and must be investigated and eliminated.
Figure 3 Regression control chart of survey hours against bed size (see online version
for colours)
Figure 4 Scatter plot of processing time in weeks against applications per week (a), processing
time in weeks against total week amount (b), and processing time in weeks against
applications in process (c)
Regression control chart for two related variables 271
Next, regression line was developed and three sigma limits were set parallel to the
regression line. The following values and expressions were used for the regression
control chart:
Equation of the line:
Processing TimeinWeeks = 1.92 + 0.0263 × Applications in Process
Figure 5 shows the regression control chart of Process Time in Weeks against
Applications in Process. The control chart indicates 19 values well within the
control limits and one value on the UCL. The regression control limits demonstrate the
wide range of variation contributed by the number of applications in the process, a major
factor affecting the application process time. The staff in operations management pointed
out that this makes sense as the backlog of applications in process increases the process
time will increase unless either additional resources are added to clear the backlog or
process improvement initiatives are undertaken to find the causes of the backlog.
One value, Week 8 with 54 applications in the process produced a value that is on the
UCL and it may be considered an exception. One possible reason given for this exception
may be due to vacation scheduling hence fewer staff were available to review and process
applications during week 8 causing a larger backlog.
Figure 5 Regression control chart of processing time in weeks against applications in process
(see online version for colours)
272 F. Hayati
5 Discussion
When there is an interest in the relationship between two variables, there are two sources
of variation entering and affecting the process: variation due to the characteristics of the
dependent variable and variation due to the characteristics of independent variable. A
conventional control chart, although very effective in analysing variation, only considers
the variation from one characteristic, it disregards the impact of variation from the other.
A regression control chart as presented in this paper is an effective way to account for
allowable variation for the other source.
Regression control chart is based on (1) using a regression model to establish
relationship and (2) using conventional quality control chart methodology to characterise
variation as the basis of economic decision making. Conventional control chart monitors
only one variable and the variation from a second variable is ignored. In many cases, we
are interested in the contribution of the variation from the second variable to characterise
the variation in the first variable. Hence, the novelty of regression control chart is that it
is an improvement on a conventional quality control chart. This paper taps into a simple
regression model to make an improvement on the conventional control chart when two
related variables are of interest. It is noted that the parallel control limits in regression
control chart act as tolerance limits and the width of the control limits indicate the
amount of variation expected from a value of the independent variable and it is the basis
of management decision as how to interpret the variation. Although in simple regression
model prediction bands are based on statistical sampling theory, control limits in
regression control chart operationally define the difference between common cause and
special cause variation for management decision making. Cases in this paper clearly
show that a decision made based on conventional control chart may be inadequate when
analysed from the regression control chart view.
When investigating the relationship between two variable characteristics, there may
be situations that the control chart of one variable characteristic is in control; however,
regression control chart is not in control due to correlation and the influence of other
variable characteristics. And, there may be situations that control chart of one variable
characteristic may show values outside the control limits indicating special cause,
however, regression control chart may indicate control because justified variation
contributed from the other variable characteristic. Consider the travel expense and
duration of the travel data from Wallis and Roberts (1956). The data represents the
expenses of 100 consecutive trips by sales engineers of an organisation and the
management is monitoring travel expenses. Figure 6 shows an Individual and Moving
Range control chart of expenses (moving range chart not shown). As the control chart
indicates, trip numbers 1, 33 and 67 are outside the UCL indicating special causes.
Generally, such special causes are investigated without considering any other
contributing factors. However, this control chart only monitors variation in travel
expenses and it makes such analysis and investigation inadequate. The variation from the
travel duration is a significant source of variation affecting travel expense; therefore,
regression control chart is the appropriate method where the travel duration for each trip
is also included in the analysis. A regression control chart is developed to demonstrate
the relationship between travel expense and travel duration. This chart will allow for
variation affecting travel expense.
Regression control chart for two related variables 273
Figure 6 Individual control chart of expenses (see online version for colours)
Figure 7 shows the regression control chart of travel expense against travel duration for
the same dataset. This chart allows for the duration of the trip and the parallel three sigma
limits indicate the allowable range of expenses for various trip durations. As the chart
indicates, trip numbers 1, 33, 67 are no longer out of control. The reason is that the
regression control chart allows for the variation in the travel duration; that is, the duration
of the travel explains the variability in atypical expenses. Interestingly, regression control
chart shows the trip number 99 is out of control that was in control in the conventional
control chart. This trip must be investigated and reasons for such high expense identified.
This example emphasises the usefulness of regression control chart in minimising type I
or II errors.
Figure 7 Regression control chart of travel expenses against travel duration (see online version
for colours)
274 F. Hayati
The choice of three standards of error, similar to three sigma limits for conventional
control chart, is based on Shewhart’s reliance on Tchebycheff’s inequality that more than
( )
1
100 1 − 2 % of the values may be expected to lie within the band y ± tσ e . For t = 3 the
t
control limits will bracket 99.73% of the variation (Shewhart, 1931). The three sigma
limits indicate the allowable variation for dependent variable y given a value for the
independent variable x under current performance. Some authors have used two sigma
limits based on preference; however, three sigma limits are the most effective from a
theoretical and practical viewpoint. Three sigma limits provide the basis for the best
economic decision by minimising the chances of making error types I or II. However,
limits narrower than three sigma may be used if the stricter criterion is needed to detect
special causes particularly when the chance of missing a special cause is extremely
costly.
In order to make this point clear, it is instructive to use a dataset from Shewhart
(1931), Page 42, to develop a regression control chart. The dataset is reproduced
in Table 1. Shewhart presents the tensile strength in pounds per square inch (psi) and
hardness in Rockwells ‘E’ for 60 aluminium die-casting specimens (n = 60).
Figure 8 shows the regression control chart for this dataset. The control limits are drawn
at two sigma (2σ e ) and three sigma (3σ e ) parallel lines. At three sigma level
all values are within upper and LCLs indicating the presence of common cause variation;
that is, allowable variation in tensile strength due to hardness. At three sigma limits, any
action will be aimed at improving the system. At two sigma level, there will be four
specimens that are outside the limits; specimens 11 and 56 above the control limits and
specimens 17 and 21 below the control limits. Hence, at two sigma limits the chances of
committing error type I is markedly increased causing needlessly looking for special
causes.
A comparison here is made between parallel control limits and prediction bands for a
line of regression. It is noted that prediction bands are wider as observations deviate from
the mean, hence, showing wider prediction bands at extreme values. Considering extreme
values may be potential special causes, we recommend parallel control limits which
provide stricter limits to detect extreme values. We have made a comparison of multiple
cases using both parallel control limits and curved prediction limits. Figure 7 shows
regression control chart for travel expenses against duration of travel dataset with one
value above the UCL. Figure 9 shows the regression model with prediction bands at
99.73% level. As the figure shows the prediction bands show slight curve as the lines are
farther away from the centre of the data. The prediction bands detect the same out of
control value as parallel control limits. Once again, parallel control limits simply provide
narrower limits to detect extreme values than curved regression intervals, perhaps a more
favoured condition in business processes.
We also compared curved prediction bands (solid line) to parallel control limits
(dash line) for the travel expense dataset as shown in Figure 10. As the figure
shows prediction limits deviate slightly from the parallel control limits. However,
this deviation is about 3% wider at the mean and about 10% at extreme values. We are
particularly interested in the stricter rule that the parallel limits can provide, particularly
at the extreme values in the event special causes are present. We believe this would
Regression control chart for two related variables 275
Figure 8 Regression control chart of tensile strength against hardness (see online version
for colours)
Figure 9 Regression model with prediction bands at 99.73% (see online version for colours)
Another reason, for the choice of parallel limits, is its simplicity in computation and
interpretation, a practical interest. Parallel control limits are used in conventional control
charts as the boundaries of detecting outlier (special causes). Hence, in this case, we have
tapped into conventional control charts to set the limits.
Regression control chart for two related variables 277
Figure 10 Comparison of prediction band (solid line) to parallel control limits (dash line)
(see online version for colours)
Conventional control charts and regression control charts share many similarities in
theory and structure while they differ in a few major respects. Although regression
control chart is an extension of conventional control chart it offers many advantages over
conventional control chart under certain conditions; namely,
• Better understand sources of variation when more than one characteristic is
considered. The control limits show what variability to expect in one characteristic
given a value of the second characteristics.
• Conventional control chart monitors variation about a constant process average y .
Regression control chart monitors variation about a changing average of y = a + bx .
• Scatter plot along with regression line obviates the relationship between the two
characteristics.
• The chart and parallel control limits will furnish a rational basis to detect exceptional
values. It establishes rules for management by exception. Out of control points offer
opportunities for improvement.
• It minimises overreaction to variation inherent in the system. When variation occurs
due to common causes, accept the variation.
Applying regression control chart, among other advantages, provides several rules that
can facilitate management decision-making. Using regression control chart provides
some guidelines for:
• monitoring and predicting performance range
• identify management by exceptions
• rational basis for establishing economic standards of quality and productivity
• minimising overall variability of the process
• minimising cost of inspection and rejection
278 F. Hayati
• many times related variables may be in control but regression control chart is not
indicating the system is out of control
• measuring gains and losses
• forecasting and deviations from budgeted goals
• scheduling based on production capacity
• regression control chart can be used as a tool for management by exception
• establish work standards and measures of performance.
In conclusion, this paper has shown regression control chart for two related variables to
be a robust method for process analysis and quality improvement. Regression control
chart can be used extensively in service, manufacturing, research and healthcare
processes. Although regression control chart was briefly introduced and applied over five
decades ago we hope this paper will reignite interest in this effective method and other
researchers are encouraged to investigate related variables using regression control chart
as there is a lack of published work in this area.
Acknowledgements
The author would like to thank the reviewers for their helpful comments and suggestions.
References
Bemesdefer, J.L. (1953) ‘Using statistics to improve contact quality’, Industrial Quality Control,
pp.16–22.
Deming, W.E. (1986) Out of the Crisis, Cambridge, MIT Press, Massachusetts.
Deming, W.E. (1994) The New Economics for Industry, Government, Education,
Vol. 2, MIT Press, Cambridge, Massachusetts.
DiPaola, P.P. (1945) ‘Use of correlation in quality control’, Industrial Quality Control, Vol. 2,
No. 1, pp.10–14.
Hotelling, H. (1947) ‘Multivariate quality control–illustrated by the air testing of sample
bombsights’, in Eisenhart, C., Hastay, M.W. and Wallis, W.A. (Eds.): Techniques of
Statistical Analysis, New York, McGraw Hill, pp.111–184.
Jackson, J.E. (1956) ‘Quality control method for two related variables’, Industrial Quality Control,
pp.4–8.
Jackson, J.E. (1959) ‘Quality control methods for several related variables’, Technometrics, Vol. 1,
No. 5, pp.359–377.
Lowery, C.A. and Montgomery, D.C. (1995) ‘A review of multivariate control charts’, IIE
Transactions, Vol. 27, pp.800–810.
Mandel, B.J. (1969) ‘The regression control chart’, Journal of Quality Technology, pp.1–9.
Mansfield, E. and Wein, H.H. (1958) ‘A regression control chart for cost’, Journal of the Royal
Statistical Society: Series C (Applied Statistics), Vol. 2, No. 1, pp.48–57.
Mosteller, F. and Tukey, J.W. (1977) Data Analysis and Regression, A Second Course in Statistics,
Addison-Wesley Publishing Company, Reading, Massachusetts.
Nelson, L.S. (1986) Technical Aids, Collected from the Journal of Quality Technology, American
Society for Quality Control, ASQ Press, Milwaukee, Wisconsin.
Regression control chart for two related variables 279
Ryan, T.P. (1989) Statistical Methods for Quality Improvement, John Wiley, New York.
Sandon, F. (1956) ‘A regression control chart for use in personnel selection’, Journal of the Royal
Statistical Society: Series C (Applied Statistics), Vol. 5, No. 1, pp.20–31.
Shewhart, W.A. (1931) Economic Control of Quality of Manufactured Products, D. Van Nostrand
Company, Inc., New York.
Wallis, W.A. and Roberts, H.V. (1956) Statistics a New Approach, The Free Press, Glencoe.
Weis, P.E. (1957) ‘An application of a two-way X-Bar chart’, Industrial Quality Control, Vol. 14,
No. 6, pp.23–27.
Western Electric Quality Control Handbook (1985) Statistical Quality Control Handbook,
Vol. 11, AT&T, Indianapolis.
Wheeler, D.J. (1995) Advanced Topics in Statistical Process Control, SPC Press, Inc., Knoxville,
TN.
Wheeler, D.J. and Chambers, D.S. (1992) Understanding Statistical Process Control, 2nd ed.,
SPC Press, Knoxville, TN.
Woodall, W.H. and Neube, M.M. (1985) ‘Multivariatie CUSUM quality control procedures’,
Technometrics, Vol. 27, pp.285–292.