Intro To Essential Stats With Python
Intro To Essential Stats With Python
Prepared by:
Dr. Gokhan Bingol ([email protected])
October 18, 2024
Statistical tests and machine learning (ML) techniques are both valuable tools for process engineers and
with the ever-rising popularity of ML process engineers need the necessary skills for descriptive &
inferential statistics and supervised/unsupervised modeling methods (Pinheiro & Patetta, 2021).
Statistical tests are well-established methods that provide interpretable results and clear hypotheses
testing (Montgomery 2012) and are particularly useful for determining the significance of relationships
when working with small datasets (Box et al. 2005). On the other hand, ML techniques excel in
handling complex datasets with intricate relationships that may not be easily captured by traditional
statistics (Hastie et al. 2009); however, they require more data and computational resources and their
"black-box" nature, particularly deep learning methods, might lack the interpretability needed in
process engineering decisions (Rudin 2019)1.
Given the vast variety of statistical methods available, the current edition focuses on the fundamental
statistical tests most relevant for process engineers. The target audience of the current work is engineers
and therefore this document assumes the reader already has some background in calculus, statistics and
statistical distributions. Furthermore, at least a basic level of understanding of Python is required.
This work heavily uses the following Python packages: numpy and scisuit2. The design of scisuit’s
statistical library is inspired by R3 and therefore the knowledge gained from the current work can be
conveniently adapted to R, which is a popular software in data science realm.
1 Rudin C (2019). Stop explaining black box machine learning models for high stakes decisions and use interpretable
models instead. Nature Machine Intelligence, 1(5), 206-215.
2 scisuit at least v1.4.0. Unless otherwise stated, alternate name np is used for numpy
3 https://www.r-project.org
2. Fundamentals
2.1. Point Estimation
A point estimate is a single value (i.e., mean, median, proportion, ...) based on sampled data to
represent a plausible value of a population characteristics (Peck et al. 2016).
Example 2.1
Out of 7421 US College students 2998 reported using internet more than 3 hours a day. What is the
proportion of all US College students who use internet more than 3 hours a day? ( Adapted from Peck et
al. 2016).
Solution:
2998
The solution is straightforward: p= =0.40
7421
Based on the statistics it is possible to claim that approximately 40% of the students in US spend more
than 3 hours a day using the internet. Please note that based on the survey result, we made a claim
about the population, students in US. ■
Now that we made an estimate based on the survey, we should ask ourselves: “How reliable is this
estimate?”. We know that if we had another group of students, the percentage might not have been 40,
maybe it would be 45 or 35. There are no perfect estimators but we expect that on average the
estimator should gives us the right answer.
E(Θ) = θ (2.1)
If X has binomial distribution with the parameters n and p, show that the sample proportion, X/n is an
unbiased estimator of p.
Before we proceed with the solution, let’s refresh ourselves with a simple example: Suppose we
conduct an experiment where we flip a coin 10 times. We already know that the probability of getting
heads (success) is 𝑝=0.5. However, we want to estimate p by flipping the coin and calculating the
sample proportion, X/n. If we flip the coin 10 times and get X=6 heads, p=0.6. However, after many
experiments p will be found as 0.5. Therefore, X/n is an unbiased estimator.
E ( Xn )= 1n E ( X )= 1n⋅np= p
therefore, X/n is an unbiased estimator of p. ■
Example 2.3
[ ]
n
1
Prove that E ∑ ( X − X̄ )2 is unbiased estimator of population variance (σ2).
n−1 i=1 i
Solution:
[ ]
n
1
E (S 2 )=E ∑ ( X − X̄ )2
n−1 i=1 i
[ ]
n
1
=
n−1
E ∑ (( X i−μ)−( X̄ −μ))2
i=1
[ ]
n
1
=
n−1
E ∑ (( X i−μ)2−( X̄ −μ)2 )
i=1
[∑ ]
n
1
= E ( X i −μ)2−n( X̄ −μ)2
n−1 i=1
σ2
Note that E ( X i −μ)2=σ 2 and E ( X̄ −μ)2= . Putting the knowns in the last equation,
n
=
1
n−1 ( σ2
n⋅σ 2−n⋅ =σ 2
n )
Therefore, E(S2) is an unbiased estimator of population variance. ■
There are other properties of estimators: i) minimum variance, ii) efficiency, iii) consistency, iv)
sufficiency, and v) robustness. Interested readers can refer to textbooks on mathematical statistics
(Devore et al. 2021; Larsen & Marx 2011; Miller & Miller 2014).
Suppose you want to estimate the SAT scores of students. For that purpose, a randomly selected 500
students have been given an SAT test and a mean value of 461 is obtained ( adapted from Moore et al.
2009). Although it is known that the sample mean is an unbiased estimator of the population mean (μ),
we already know that had we sampled another 500 students, the mean could (most likely would) have
been different than 461. Therefore, how confident are we to claim that the population mean will be 461.
Suppose that the standard deviation of the population is known (σ=100). We know that if we repeat
100
sampling 500 samples, the mean of these samples will follow the N(μ, =4.5) curve. Let’s
√ 500
demonstrate this will a simple script:
Script 2.1
import scisuit.plot as plt
from scisuit.stats import rnorm
aver = []
for i in range(1000):
sample = rnorm(n=500, mean= 461, sd= 100)
aver.append(sum(sample)/500)
plt.hist(data=aver, density=True)
plt.show()
It is seen that the interval (447.5, 474.5) represents
almost all possible mean values. Therefore we are
99.7% (3σ) confident (confidence level) that the
population mean will be in this interval. Note also
that, as a natural consequence our confidence level
decreases as the interval length decreases.
461−3×4.5=447.5
461+3×4.5=474.5
A way to quantify the amount of uncertainty in a point estimator is to construct a confidence interval
(Larsen & Marx 2011). The definition of confidence interval is as follows: “... an interval computed
from sample data by a method that has probability C of producing an interval containing the true value
of the parameter.” (Moore et al. 2009). Peck et al. (2016) gives a general form of confidence interval as
follows:
Note that the estimated standard deviation of the statistic is also known as standard error. In other
words, when the standard deviation of a statistic is estimated from the data (because the population’s
standard deviation is not known), the result is called the standard error of the statistic (Moore et al.
2009).
Example 2.4
Solution:
We already know that Abraham DeMoivre showed that when X is a binomial random variable and n is
large the probability can be approximated as follows:
b
( X −np 1
)
∫
2
lim P a≤ ≤b = e−z /2 dz
n→∞ √ np(1− p) √2 π a
To establish an approximate 100(1−α)% confidence interval,
[
P −z α /2≤
X −np
√ np(1− p) ]
≤z α /2 =1−α
[ ]
X /n− p
P −z α /2≤ ≤z α /2 =1−α
√ ( X /n)(1− X /n)
n
[ k
n
−z α /2
√(k /n)(1−k /n) k
n
, + z α /2
n √
(k /n)(1−k /n)
n
, ]
■
If a 95% confidence interval to be established, then zα/2 would be ≈1.96.
Script 2.2
alpha1 = 0.05
alpha2 = 0.01
print(qnorm(alpha1/2), qnorm(1-alpha1/2))
print(qnorm(alpha2/2), qnorm(1-alpha2/2))
-1.95996 1.95996
-2.57583 2.57583
Note that if a 95% confidence interval (CI) yields an interval (0.52, 0.57), it is tempting to say that
there is a probability of 0.95 that p will be in between 0.52 and 0.57. Larsen & Marx (2011) and Peck
et al. (2016) warns against this temptation. A close look at Eq. (2.2) reveals that from sample to sample
the constructed CI will be different. However, in the long run 95% of the constructed CIs will contain
the true p and 5% will not. This is well depicted in the figure (Figure 9.4 at pp. 471) presented by Peck
et al. (2016).
Note also that a 99% CI will be wider than a 95% CI. However, the higher reliability causes a loss in
precision. Therefore, Peck et al. (2016) remarks that many investigators consider a 95% CI as a
reasonable compromise between reliability and precision.
Confidence intervals and statistical tests are the two most important ideas in the age of modern
statistics (Kreyszig et al. 2011). The confidence interval is carried out when we would like to estimate
population parameter. Another type of inference is to assess the evidence provided by data against a
claim about a parameter of the population (Moore et al. 2009). Therefore, after carrying out an
experiment conclusions must be drawn based on the obtained data. The two competing propositions are
called the null hypothesis (H0) and the alternative hypothesis (H1) (Larsen & Marx 2011).
We initially assume that a particular claim about a population (H 0) is correct. Then based on the
evidence from data we either reject H0 and accept H1 if there is compelling evidence or accept H0 in
favor of H1 (Peck et al. 2016).
An example from Larsen & Marx (2011) would clarify the concepts better: Imagine as an automobile
company you are looking for additives to increase gas mileage. Without the additives, the cars are
known to average 25.0 mpg with a σ=2.4 mpg and with the addition of additives, it was found
(experiment involved 30 cars) that the mileage increased to 26.3 mpg.
Now, in terms of null and alternative hypothesis, H0 is 25 mpg and H1 claims 26.3 mpg. We know that
if the experiments were carried out with another 30 cars, the result would be different (lower or higher)
than 26.3 mpg. Therefore, “is an increase to 26.3 mpg due to additives or not?”. At this point we
should rephrase our question: “if we sample 30 cars from a population with μ=25.0 mpg and σ=2.4,
what are the chances that we will get 26.3 mpg on average?”. If the chances are high, then the additive
is not working; however, if the chances are low, then it must be due to the additives that the cars are
getting 26.3 mpg.
Let’s evaluate this with a script (note the similarity to Script 2.1):
Script 2.3
aver = []
for i in range(10000):
sample = rnorm(n=30, mean= 25, sd= 2.4)
aver.append(sum(sample)/30)
P
( 26.50−25.0
2.4 / √ 30 )
=0.0003
If for example the test statistics yield Z=1.37 and we are carrying out a two-sided test, the p-value
would be, P(Z≤−1.37 or Z≥1.37) where Z has a standard normal distribution.
z=1.37
A simpler definition is given by Miller & Miller (2014): “… the lowest level of significance at which
the null hypothesis could have been rejected”. Let’s rephrase Miller & Miller (2014) definition: once a
level of significance is decided (e.g. α=0.05), if the computed p-value is less than the α, then we reject
H0. For example, in the gasoline additive example, p-value was computed as 0.0003 and if α=0.05, then
since p< α, we reject H0 in favor of H1 (i.e., additive has effect).
A bakery claims on its packages that its cookies are 8 g. It is known that the standard deviation of the 8
g packages of cookies is 0.16 g. As a quality control engineer, you collected 25 packages and found that
the average is 8.091 g. Is the production process going alright? (adapted from Miller & Miller 2014).
Solution:
8.091−8
The test statistic: z= =2.84
0.16/ √ 25
From a population with known mean (μ) and variance (σ), a random sample of size n is taken
(generally n≥30) and the sample mean ( x̄ ) calculated. The test statistic:
x̄−μ
Z= (3.1)
σ /√n
Example 3.1
A filling process is set to fill tubs with powder of 4 g on average. For this filling process it is known
that the standard deviation is 1 g. An inspector takes a random sample of 9 tubs and obtains the
following data:
Weights = [3.8, 5.4, 4.4, 5.9, 4.5, 4.8, 4.3, 3.8, 4.5]
Solution:
4.6−4
Test statistic: Z= =1.8,
1/ √ 9
Since 1.8 is in the range of -1.96<Z<1.96, we cannot reject the null hypothesis, therefore the filling
process works fine (i.e. there is no evidence to suggest it is different than 4 g).
Is it over-filling?
Now, we are going to carry out 1-tailed z-test and therefore acceptance region is Z<1.645. Since the test
statistic is greater than 1.645, we reject the null hypothesis and have evidence that the filling process is
over-filling.
Script 3.1
import scisuit.plot as plt
from scisuit.stats import test_z
data = [3.8, 5.4, 4.4, 5.9, 4.5, 4.8, 4.3, 3.8, 4.5]
result = test_z(x=data, sd1=1, mu=4)
print(result)
N=9, mean=4.6, Z=1.799
p-value = 0.072 (two.sided)
Confidence interval (3.95, 5.25)
Since p>0.05, we cannot reject H0.
Script 3.1 requires minor change to analyze whether it is over-filling or not. We will set the parameter,
namely alternative, to “greater” whose default value was set to “two.sided”.
Script 3.2
result = test_z(x=data, sd1=1, mu=4, alternative="greater")
print(result)
p-value = 0.036 (greater)
Confidence interval (4.052, inf)
Since p<0.05, we reject the null hypothesis in favor of alternative hypothesis.
In essence, two-sample is very similar to one-sample z-test such that we take n1 and n2 samples from
two populations with means (μ1 and μ2) and variances (σ1 and σ2). Therefore, the test statistic is
computed as:
( )
1
σ 12 σ 22 2 (3.2)
+
n1 n2
Example 3.2
A survey has been conducted to see if studying over or under 10 h/week has an effect on overall GPA.
For those who studied less (x) and more (y) than 10 h/week the GPAs were:
x=[2.80, 3.40, 4.00, 3.60, 2.00, 3.00, 3.47, 2.80, 2.60, 2.0]
y = [3.00, 3.00, 2.20, 2.40, 4.00, 2.96, 3.41, 3.27, 3.80, 3.10, 2.50].
respectively. It is known that the standard deviation of GPAs for the whole campus is σ=0.6. Does
studying over or under 10 h/week has an effect on GPA? (Adapted from Devore et al. 2021)
Solution:
We have two groups (those studying over and under 10 h/week) from the same population (whole
campus) whose standard deviation is known (σ=0.6).
We will solve this question directly using a Python script and the mathematical computations are left as
an exercise to the reader.
Script 3.3
x = [2.80, 3.40, 4.00, 3.60, 2.00, 3.00, 3.47, 2.80, 2.60, 2.0]
y = [3.00, 3.00, 2.20, 2.40, 4.00, 2.96, 3.41, 3.27, 3.80, 3.10, 2.50]
mu = 0
sd1, sd2 = 0.6, 0.6
Then, comes the question: What effect does replacing σ with S have on Z ratio? ( Larsen & Marx 2011).
In order to answer this question, let’s demonstrate the effect of replacing σ with S on Z ratio with a
script:
Script 4.1
import numpy as np
import scisuit.plot as plt
from scisuit.stats import dnorm, rnorm
N=4
sigma, mu = 1.0, 0.0 #stdev and mean of population
z, t = [], []
for i in range(1000):
sample = rnorm(n=N)
aver = sum(sample)/N
plt.layout(nrows=2, ncols=1)
plt.subplot(row=0, col=0)
plt.scatter(x=x, y=y)
plt.hist(data=z, density=True)
plt.title("Population Std Deviation")
plt.subplot(row=1, col=0)
plt.scatter(x=x, y=y)
plt.hist(data=t, density=True)
plt.title("Sample Std Deviation")
plt.show()
x̄−μ x̄−μ
Fig 4.1: (top) vs
σ /√n S /√n
Note that in Script (4.1), N was intentionally chosen a small value (N=4). It is recommended to change
N to a greater number, such as 10, 20 or 50 in order to observe the effect of large samples.
Let x̄ and s be the mean and standard deviation of a random sample from a normally distributed
population. Then,
x̄−μ
t= (4.1)
s/√n
has a t distribution with df=n-1. Here s is the sample’s standard deviation and computed as:
n
1
s2= ∑ ( X − X̄ )2
n−1 i=1 i
(4.2)
Example 4.1
In 2006, a report revealed that UK subscribers with 3G phones listen on average 8.3 hours/month full-
track music. The data for a random sample of size 8 for US subscribers is x=[5, 6, 0, 4, 11, 9, 2, 3]. Is
there a difference between US and UK subscribers? (Adapted from Moore et al. 2009).
Solution:
Script 4.2
from statistics import stdev
from scisuit.stats import qt
x=[5, 6, 0, 4, 11, 9, 2, 3]
n = len(x)
df = n-1 #degrees of freedom
aver = sum(x)/n
stderr = stdev(x)/sqrt(n) #standard error
Script 4.3
from scisuit.stats import test_t
x=[5, 6, 0, 4, 11, 9, 2, 3]
result = test_t(x=x, mu=8.3)
print(result)
One-sample t-test for two.sided
N=8, mean=5.0
SE=1.282, t=-2.575
p-value =0.037
Confidence interval: (1.97, 8.03)
Since p<0.05 we reject H0 and claim that there is statistically significant difference between US and
UK subscribers. [If in test_t function H1 was set to “less” instead of “two.sided” then p=0.018. Therefore, we
would reject the H0 in favor of H1, i.e. US subscribers indeed listen less than UK’s. ]
n n
∑ ( X i− X̄ ) +∑ (Y i−Ȳ )2
2
(4.3)
S 2P = i=1 i=1
n+m−2
X̄ −Ȳ −( μ X −μY )
T n+m−2=
√ 1 1 (4.4)
Sp +
n m
Student surveys are important in academia. An academic who scored low on a student survey joined
workshops to improve “enthusiasm” in teaching. X and Y are survey scores from his fall and spring
semester classes which he selected to have the same demographics.
X= [3, 1, 2, 1, 3, 2, 4, 2, 1]
Y = [5, 4, 3, 4, 5, 4, 4, 5, 4]
Is there a difference in scores of both semester? (Adapted from Larsen & Marx 2011).
Solution:
1. The variance of the populations are not known, therefore z-test cannot be applied.
2. It is reasonable to assume equal variances since the X and Y have the same demographics.
Script 4.4
from scisuit.stats import test_t
x = [3, 1, 2, 1, 3, 2, 4, 2, 1]
y = [5, 4, 3, 4, 5, 4, 4, 5, 4]
result = test_t(x=x, y=y, varequal=True)
print(result)
Two-sample t-test assuming equal variances
n1=9, n2=9, df=16
s1=1.054, s2=0.667
Pooled std = 0.882
t = -5.07
p-value = 0.0001 (two.sided)
Confidence interval: (-2.992, -1.230)
Since p<0.05, the difference between the scores of fall and spring are statistically significant.
4.2.2. Unequal Variances
Similar to section 4.2.1, we are drawing random samples of size n1 and n2 from normal distributions
with means μX and μY, but with standard deviations σX and σY, respectively.
n n
∑ ( X i− X̄ )2 ∑ (Y i−Ȳ )2 (4.5)
S 12= i=1 and S 22= i=1
n1−1 n2−1
X̄ −Ȳ −( μ X −μY )
t=
√ s12 s 22
+
n1 n2
(4.6)
In 1938 Welch5 showed that t is approximately distributed as a Student’s t random variable with df:
( )
2
s12 s 22
+
n1 n2
df = (4.7)
s14 s 24
+
n12 (n1−1) n22 (n2−1)
Example 4.3
A study by Larson and Morris6 (2008) surveyed the annual salary of men and women working as
purchasing managers subscribed to Purchasing magazine. The salaries are (in thousands of US dollars):
Men = [81, 69, 81, 76, 76, 74, 69, 76, 79, 65]
Women = [78, 60, 67, 61, 62, 73, 71, 58, 68, 48]
Is there a difference in salaries between men and women? (Adapted from Peck et al. 2016)
5 https://www.jstor.org/stable/2332010
6 Larson PD & Morris M (2008). Sex and Salary: A Survey of Purchasing and Supply Professionals, Journal of
Purchasing and Supply Management, 112–124.
Solution:
1. Z-test cannot be applied because the variance of the populations are not known.
2. Although the samples were selected from the subscribers of Purchasing magazine, Larson and
Morris (2008) considered two populations of interest, i.e. male and female purchasing
managers. Therefore, equal variances should not be applied.
Script 4.5
from scisuit.stats import test_t
Men = [81, 69, 81, 76, 76, 74, 69, 76, 79, 65]
Women = [78, 60, 67, 61, 62, 73, 71, 58, 68, 48]
result = test_t(x=Women, y=Men, varequal=False)
print(result)
Two-sample t-test assuming unequal variances
n1=10, n2=10, df=15
s1=8.617, s2=5.399
t = -3.11
p-value = 0.007 (two.sided)
Confidence interval: (-16.7, -3.1)
Since p<0.05, there is statistically significant difference between salaries of each group.
In essence a paired t-test is a two-sample t-test as there are two samples. However, the two samples are
not independent as one of the factors in the first sample is paired in a meaningful way with a particular
observation in the second sample (Larsen & Marx 2011; Peck et al. 2016).
The equation to compute the test statistics is similar to one-sample t-test, Eq. (4.1):
x̄−μ
t= (4.8)
s/√n
where x̄ and s are mean and standard deviation of the sample differences, respectively. The degrees of
freedom is: df=n-1.
Example 4.4
In a study where 6th grade students who had not previously played chess participated in a program in
which they took chess lessons and played chess daily for 9 months. Below data demonstrates their
memory test score before and after taking the lessons:
Pre = [510, 610, 640, 675, 600, 550, 610, 625, 450, 720, 575, 675]
Post = [850, 790, 850, 775, 700, 775, 700, 850, 690, 775, 540, 680]
Is there evidence that playing chess increases the memory scores? (Adapted from Peck et al. 2016).
Solution:
2. Pre- and post-test scores are not independent since they were applied to the same subjects.
Script 4.6
from scisuit.stats import test_t
Pre = [510, 610, 640, 675, 600, 550, 610, 625, 450, 720, 575, 675]
Post = [850, 790, 850, 775, 700, 775, 700, 850, 690, 775, 540, 680]
result = test_t(x=Post, y=Pre, paired=True)
print(result)
Paired t-test for two.sided
N=12, mean1=747.9, mean2=603.3, mean diff=144.6
t =4.564
p-value =0.0008
Confidence interval: (74.9, 214.3)
Since p<0.05, there is statistical evidence that playing chess indeed made a difference in increasing the
memory scores.
If the parameter, namely alternative, was set to “less”, then p=0.99. Therefore, we would reject the
alternative hypothesis (Post<Pre
Post<Pre). However, on the other hand, alternative was set to “greater” then
p=0.0004, therefore we would reject the H0 and accept H1 (Post>Pre
Post>Pre).
5. F-Test for Population Variances
Assume that a metal rod production facility uses two machines on the production line. Each machine
produces rods with thicknesses μX and μY which are not significantly different. However, if the
variabilities are significantly different, then some of the produced rods might become unacceptable as
they will be outside the engineering specifications.
In Section (4.2), it was shown that there are two cases for two-sample t-tests: whether variances were
equal or not. To be able to choose the right procedure, Larsen & Marx (2011) recommended that F test
should be used prior to testing for μX=μY.
Let’s draw random samples from populations with normal distribution. Let X1, … , Xm be a random
sample from a population with standard deviation σ1 and let Y1, …, Yn be another random sample from a
population with standard deviation σ2. Let S1 and S2 be the sample standard deviations. Then the test
statistic is:
S 12 / σ 1
F= (5.1)
S 22 / σ 2
Example 5.1
α-waves produced by brain have a characteristic frequency from 8 to 13 Hz. The subjects were 20
inmates in a Canadian prison who were randomly split into two groups: one group was placed in
solitary confinement; the other group was allowed to remain in their own cells. Seven days later,
α-wave frequencies were measured for all twenty subjects are shown below:
non-confined = [10.7, 10.7, 10.4, 10.9, 10.5, 10.3, 9.6, 11.1, 11.2, 10.4]
confined = [9.6, 10.4, 9.7, 10.3, 9.2, 9.3, 9.9, 9.5, 9, 10.9]
Using a box-whisker plot, let’s first visualize the data as shown in Fig. (5.1).
Script 5.1
from scisuit.stats import test_f, test_f_Result
nonconfined = [10.7, 10.7, 10.4, 10.9, 10.5, 10.3, 9.6, 11.1, 11.2, 10.4]
confined = [9.6, 10.4, 9.7, 10.3, 9.2, 9.3, 9.9, 9.5, 9, 10.9]
result = test_f(x=confined, y=nonconfined)
print(result)
F test for two.sided
df1=9, df2=9, var1=0.357, var2=0.211
F=1.696
p-value =0.443
Confidence interval: (0.42, 6.83)
Since p>0.05, we cannot reject H0 (σ1=σ2). Therefore, there is no statistically significant difference
between the variances of two groups.
6. Analysis of Variance (ANOVA)
In Section (4.2) we have seen that when exactly two means needs to be compared, we could use two-
sample t-test. The methodology for comparing several means is called analysis of variance (ANOVA).
When there is only a single factor with multiple levels, i.e. color of strawberries subjected to different
power levels of infrared radiation, then we can use one-way ANOVA. However, besides infrared power
if we are also interested in different exposure times, then two-way ANOVA needs to be employed.
There are 3 essential assumptions for the test to be accurate (Anon 2024)7:
A similarity comparison of two-sample t-test and ANOVA is given by Moore et al. (2009). Suppose we
are analyzing whether the means of two different groups of same size are different. Then we would
employ two-sample t-test with equal variances (due to assumption #2):
t=
X̄ −Ȳ
=
n
2 √
( X̄ −Ȳ )
(6.1)
√
1 1 Sp
Sp +
n n
n
( X̄ −Ȳ )2
2 2 (6.2)
t =
S 2p
7 https://online.stat.psu.edu/stat500/lesson/10/10.2/10.2.1
If we had used ANOVA, the F-statistic would have been exactly equal to t2 computed using Eq. (6.2). A
careful inspection of Eq. (6.2) reveals couple things:
1. The numerator measures the variation between the groups (known as fit).
2. The denominator measures the variation within groups (known as residual), see Eq. (4.3).
H0: μ1=μ2=...=μ n
(6.3)
Ha: At least two of the μ 's are different
Therefore the basic idea is, to test H0, we simply compare the variation between the means of the
groups with the variation within groups. A graphical example adapted from Peck et al. (2016) can
cement our understanding:
Let k be the number of populations being compared [in Fig. (6.1) k=3] and n1, n2, …, nk be the sample sizes:
N = n1 + n2 + …+ nk
T
x̄=
N
where df=k-1
where df = N-k
6. Mean squares:
SS TR SS Error
MS TR = and MS Error =
k−1 N −k
MS TR
F= (6.4)
MS Error
Before proceeding with an example on ANOVA, let’s further investigate Eq. (6.4). Remember that F
distribution is the ratio of independent chi-square random variables and is given with the following
equation:
U /m
F= (6.5)
V /n
where U and V are independent chi-square random variables with m and n degrees of freedom.
The following theorem establishes the link between Eqs. (6.4 & 6.5):
Theorem: Let Y1, Y2, …, Yn be random sample from a normal distribution with mean μ and variance σ2.
Then,
n
(n−1) S 2 1
2
= 2 ∑ (Y i −Ȳ )2 (6.6)
σ σ i=1
has a chi-square distribution with n-1 degrees of freedom. A proof of Eq. (6.6) is given by Larsen &
Marx (2011) and is beyond the scope of this study.
Using Eq. (6.6), now it is easy to see that when sum of squares of treatment (or error) is divided by σ, it
will have a chi-square distribution. Therefore Eq. (6.4) is indeed equivalent to Eq. (6.5) and therefore
gives an F distribution with df1=k-1 and df2=N-k.
Example 6.1
In most of the integrated circuit manufacturing, a plasma etching process is widely used to remove
unwanted material from the wafers which are coated with a layer of material, such as silicon dioxide. A
process engineer is interested in investigating the relationship between the radio frequency power and
the etch rate. The etch rate data (in Å/min) from a plasma etching experiment is given below:
Does the RF power affect etching rate? (Adapted from Montgomery 2012)
Solution:
Before attempting any numerical solution, let’s first visualize the data using box-whisker plot generated
with a Python script:
Script 6.1
import scisuit.plot as plt
#create a 2D array
data = np.array([rf_160, rf_180, rf_200, rf_220]) #see Script (6.1)
ss_tr, ss_error = 0, 0
for dt in data:
n = len(dt) #size of each sample
ss_tr += n*(np.mean(dt)-grandmean)**2
ss_error += (n-1)*np.var(dt, ddof=1) #note ddof=1, the sample variance
print(f"F={Fvalue}, F-critical={Fcritical}")
F=66.8, F-critical=3.24
Since the computed F-value is considerably greater than F-critical, we can safely reject H 0. Using
scisuit’s
scisuit built-in aov function:
Script 6.3
aovresult = aov(rf_160, rf_180, rf_200, rf_220)
print(aovresult)
One-Way ANOVA Results
Source df SS MS F p-value
Treatment 3 66870.55 22290.18 66.80 2.8829e-09
Error 16 5339.20 333.70
Total 19 72209.75
Since p<0.05, we can reject H0 in favor of H1.
Now, had we not plotted Fig. (6.2), we would not be able to see why H0 has been rejected. As a matter
of fact, among other reasons due to overlap in whiskers and boxes or outliers a box-whisker plot does
not always clearly show whether H 0 will be rejected. Therefore, we need to use post hoc tests along
with ANOVA. There are several tests8 for this purpose, here we will be using Tukey’s test 9. Continuing
from Script (6.3):
In one-way ANOVA, the populations were classified according to a single factor; whereas in two-way
ANOVA, as the name implies, there are two factors, each with different number of levels. For example,
a baker might choose 3 different baking temperatures (150, 175, 200°C) and 2 different baking times
(45 and 60 min) to optimize a cake recipe. In this example we have two factors (baking time and
temperature) each with different number of levels (Devore et al. 2021; Moore et al. 2009).
Moore et al. (2009) lists the following advantages for using two-way ANOVA:
1. It is more efficient (i.e., less costly) to study two factors rather than each separately,
2. The variation in residuals can be decreased by the inclusion of a second factor,
3. Interactions between factors can be explored.
8 https://en.wikipedia.org/wiki/Post_hoc_analysis
9 https://en.wikipedia.org/wiki/Tukey%27s_range_test
In order to analyze a data set with two-way ANOVA the following assumptions must be satisfied (Field
2024; Moore 2012):
Let’s start from #5 and take a look at what it means balanced or unbalanced. In ANOVA or design of
experiments, a balanced design has equal number of observations for all possible combinations of
factor levels. For example10, assume that the independent variables are A, B, C with 2 levels. Table
(6.1) shows a balanced design whereas Table (6.2) shows an unbalanced design of the same factors
(since the combination [1, 0, 0] is missing).
0 0 0 0 0 0
0 0 1 0 1 0
0 1 0 0 1 0
0 1 1 0 0 1
1 0 0 0 1 0
1 0 1 1 0 1
1 1 0 1 1 0
1 1 1 1 1 1
Note that if Table (6.1) was re-designed such that each row displayed a factor level (0 or 1) and each
column displayed a factor (A, B or C) then there would be no empty cells in that table. If the data
includes multiple observations for each treatment, the design includes replication.
replication
10 https://support.minitab.com/en-us/minitab/help-and-how-to/statistical-modeling/anova/supporting-topics/anova-
models/balanced-and-unbalanced-designs/
Example 6.2
A study by Moore and Eddleman11 (1991) investigated the removal of marks made by erasable pens on
cotton and cotton/polyester fabrics. The following data compare three different pens and four different
wash treatments with respect to their ability to remove marks on. The response variable is based on the
color change and the lower the value the more marks were removed.
Table 6.3: Effect of washing treatment and different pen brands on color change
Wash 1 Wash 2 Wash 3 Wash 4
Is there any difference in color change due either to different brands of pen or to the different washing
treatments? (Adapted from Devore et al. 2021)
Solution:
The data satisfies the requirements to be analyzed with two-factor ANOVA, since:
1. There are two independent factors (pen brands and washing treatment),
2. The independent variables consist of discrete levels (e.g., brand #1, #2 and #3)
3. There are no empty cells (data is balanced),
4. There are no replicates (interaction cannot be explored),
5. Observations are independent.
Once a table similar to Table (6.3) is prepared, finding the F-values for both factors is fairly
straightforward if a spreadsheet software is used.
11 Moore MA, Eddleman VL (1991). An Assessment of the Effects of Treatment, Time, and Heat on the Removal of
Erasable Pen Marks from Cotton and Cotton/Polyester Blend Fabrics. J. Test. Eval.. 19(5): 394-397
Averages of treatments (μtreatments) = [0.803, 0.337, 0.423, 0.3]
4
SS treatment 0.48
SS treatment =∑ ( μtreatments [i]−T )2×3=0.48 and MS treatment = = =0.16
i=1 df 4−1
SS Error 0.087
SS Error =∑ ∑ ( μij −T )−SS Treatment −SS brand =0.087 and MS Error = = =0.014
df (3−1)×(4−1)
MS treatment 0.16
F treatment = = =11.05
MS Error 0.014
MS brand 0.06
F brand = = =4.15
MS Error 0.014
Although the solution is straightforward, it is still cumbersome and error-prone; therefore, it is best to
use functions dedicated for this purpose:
Script 6.4
brand = [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3]
treatment = [1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4]
removal = [0.97, 0.48, 0.48, 0.46, 0.77, 0.14, 0.22, 0.25, 0.67, 0.39, 0.57, 0.19]
Example 6.3
A process engineer is testing the effect of catalyst type (A, B, C) and reaction temperature (high,
medium, low) on the yield of a chemical reaction. She designs an experiment with 3 replicates for each
combination as shown in the following data. Do both catalyst type and reaction temperature have an
effect on the reaction yield?
Catalyst = [A, A, A, A, A, A, A, A, A, B, B, B, B, B, B, B, B, B, C, C, C, C, C, C, C, C, C]
Temperature = [L, L, L, M, M, M, H, H, H, L, L, L, M, M, M, H, H, H, L, L, L, M, M, M, H, H, H]
%Yield = [85, 88, 90, 80, 82, 84, 75, 78, 77, 90, 92, 91, 85, 87, 89, 80, 83, 82, 88, 90, 91, 84, 86, 85, 79, 80, 81]
Solution:
If one wishes to use a spreadsheet for the solution, a table of averages needs to be prepared as shown
below:
Catalyst L M H
A 87.667 82 76.667
B 91 87 81.667
C 89.667 85 80
After preparing the above-shown table, a methodology similar to Example (6.2) can be followed.
Let’s solve the question directly by using scisuit’s
scisuit built-in function:
Script 6.5
from scisuit.stats import aov2
Catalyst = ["A", "A", "A", "A", "A", "A", "A", "A", "A",
"B", "B", "B", "B", "B", "B", "B", "B", "B",
"C", "C", "C", "C", "C", "C", "C", "C", "C"]
Temperature = ["L", "L", "L", "M", "M", "M", "H", "H", "H",
"L", "L", "L", "M", "M", "M", "H", "H", "H",
"L", "L", "L", "M", "M", "M", "H", "H", "H"]
Yield = [85, 88, 90, 80, 82, 84, 75, 78, 77, 90, 92, 91,
85, 87, 89, 80, 83, 82, 88, 90, 91, 84, 86, 85, 79, 80, 81]
1. Regression: When data shows a significant degree of error or “noise” (generally originates from
experimental measurements), we want a curve that represents the general trend of the data.
2. Interpolation: When the noise in data can be ignored (generally originates from tables), we
would like a curve(s) that pass directly through each of the data points.
In terms of mathematical expressions, interpolation (Eq. 7.1) and regression (Eq. 7.2) can be shown as
follows:
Y =f ( X ) (7.1)
Y =f ( X )+ϵ (7.2)
Peck et al. (2016) used the terms deterministic and probabilistic relationships for Eq. (7.1) and Eq.
(7.2), respectively. Therefore a probabilistic relationship is actually a deterministic relationship with
noise (random deviations).
To further our understanding on Eq. (7.2), a simple example from Larsen & Marx (2011) can be
helpful: Consider a tooling process where the initial weight of the sample determines the finished
weight of the steel rods. For example, in a simple experiment if the initial weight was measured as
2.745 g then the finished weight was measured as 2.080 g. However, even if the initial weight is
controlled and is exactly 2.745 g, in reality the finished weight would fluctuate around 2.080 g. and
therefore, with each x (independent variable) there will be a range of possible y values (dependent
variable), which Eq. (7.2) exactly tells us.
7.1. Simple Linear Regression
When there is only a single explanatory (independent) variable, the model is referred to as “simple”
linear regression. Therefore, Eq. (7.2) can be expressed as:
Y =β 0 + β 1 x +ϵ (7.3)
where regardless of the x value, the random variable ε is assumed to follow a N(0, σ) distribution.
where the notation Y|x* should be read as the value of Y when x=x*, i.e., the mean value of Y when
x=x*. Note also that Eq. (7.4) tells us something important that the population regression line is the
line of mean values of Y.
The following assumptions are made for a linear model (Larsen & Marx, 2011):
1. fY|x(y) is a normal probability density function for all x (i.e., for a known x value, there is a
probability density function associated with y values)
3. For all x-values, the distributions associated with fY|x(y) are independent.
Example 7.1
Suppose that the relationship between applied stress (x) and time to fracture (y) is given by the simple
linear regression model with β0=65, β1=-1.2, and σ=8. What is the probability of getting a fracture value
greater than 50 when the applied stress in 20? (Adapted from Devore et al. 2021)
Solution:
y=65−1.2 x=65−1.2×20=41
Note that if this was a curve fitting problem in nature, then whenever the stress value was 20, the
fracture time would have always been equal to 41. However, since Eq. (7.2) tells us that random
deviations are involved, this cannot be the case. We already know that the random deviations, namely ε,
follows a normal distribution. Therefore, it becomes straightforward to compute the probability:
50−41
P(Z > )=P(Z >1.125)=1− pnorm(1.125)=0.13 ■
8
In Example (7.1), the coefficients, namely β0 and β1, of the regression line was given. However, in
practice we need to estimate these coefficients. It should be noted that there are two commonly 12 used
methods for estimating the regression coefficients (please note that we use the word, estimate):
estimate
12 https://support.minitab.com/en-us/minitab/help-and-how-to/statistical-modeling/reliability/supporting-topics/estimation-
methods/least-squares-and-maximum-likelihood-estimation-methods/
The residual sum of squares (RSS) also known as sum of squares of error (SSE):
n
RSS=∑ e i2=e 12 +e 22 +...+e 2n (7.6)
i=1
If the coefficients of the best line passing through the data points are β0 and β1 then:
n
L=RSS=∑ ( y i −β 0 −β 1 x)2 (7.7)
i=1
Dropping the constants -2 and 2 from both equations and simply rearranging the terms yields:
n n
∑ y i=n β 0 + β 1 ∑ x i
i=1 i=1
n n n
∑ x i⋅y i=β 0 ∑ x i + β 1 ∑ x i2
i=1 i=1 i=1
We have two equations and two unknowns, therefore it is possible to solve this system of equations.
Here, one can use the elimination method; however, Cramer’s rule provides a direct solution. Let’s
solve for β1 and leave β0 as an exercise:
| ∑ yi
|
n
∑ xi ∑ xi yi
β^1=
|∑ xi
|
n
∑ x i ∑ x i2
If one takes the determinants in numerator and denominator, then:
n ∑ x i y i −( ∑ x i )( ∑ y i )
β^1= 2 (7.8)
n ∑ x i2−( ∑ x i )
β1 can be further simplified if a notation Sxy and Sxx and Sxy are defined as:
1 2
S xx =∑ ( x i − x̄)2=∑ x i2− ( ∑ x i )
n
1 2
S yy =∑ ( y i − ȳ)2=∑ y i2− ( ∑ y i )
n
1
S xy =∑ ( x i − x̄)( y i − ȳ)=∑ x i y i − ( ∑ x i )( ∑ y i )
n
S
β^1= xy (7.9)
S xx
n
1
σ^2= ∑ (Y i −Y^ i )2 (7.11)
n i=1
Example 7.2
Suppose you have been tasked with finding the probability of heads (H) and tails (T) of an unknown
coin. You flipped the coin for 3 times and the sequence is HTH. What is the probability, p? (Adapted
from Larsen & Marx)
Solution:
Therefore, based on the probability model the function is that defines the sequence HTH is:
2
p X (k )= p (1− p)
Using calculus, it can easily be computed that the value that maximizes the probability model is:
p=2/3. ■
Now, instead of the sequence HTH (Example 7.2) we have data pairs (x1, y1), (x2, y2), … , (xn, yn)
obtained from a random experiment. Furthermore, it is known that the yi’s are normally distributed with
mean β0+β1xi and variance σ2 (Eqs. 7.4 & 7.5).
( ) ,−∞< x <∞
2
−1 x−μ
⋅
1 2 σ (7.12)
f Z ( z)= e
√2 π σ
Replacing x and μ in Eq. (7.12) with yi and Eq. (7.4), respectively, yields the probability model for a
single data pair:
( )
2
−1 y i −β 0 −β 1 x i
⋅
1 2 σ (7.13)
f Z ( z)= e
√2 π σ
For n data pairs, the maximum likelihood function is:
( )
2
n −1 y i −β 0 −β 1 x i
⋅
1 (7.14)
L=∏
2 σ
e
i=1 √2 π σ
In order to find MLE of β0 and β1 partial derivatives with respect to β0 and β1 must be taken. However,
Eq. (7.14) is not easy to work with as is. Therefore, as suggested by Larsen and Marx (2011), taking the
logarithm will make it more convenient to work with.
n
1
2∑
−2lnL=n⋅ln (2 π )+n ln (σ 2 )+ ( y i −β 0 −β 1⋅x i )2 (7.15)
σ i=1
Taking the partial derivatives of Eq. (7.15) with respect to β0 and β1 and solving the resulting set of
equations similar to as shown in section (7.1.1) will yield Eqs. (7.9 & 7.10).
2. β^0 and β^1 are unbiased, therefore, E ( β^0 )=β 0 and E ( β^1 )=β 1
σ2
3. Var ( β^1 )= n
∑ ( x i− x̄)2
i=1
n
σ 2 ∑ x i2
4. Var ( β^0 )= n
i=1
n ∑ ( x i − x̄)2
i=1
Proof of #2:
In section (2.1.1), it was mentioned that to be an unbiased estimator, E(Θ) = θ must be satisfied. In the
case of β^1, we need to show that E ( β^1 )=β 1. If Eq. (7.8) is divided by n, the following equation is
obtained:
∑ x i y i− 1n (∑ x i )(∑ y i )
β^1= (I)
∑ x − 1n (∑ x i )
2 2
i
β^1=
∑ x i y i− x̄ ∑ y i (II)
∑ x i2−n x¯2
Rearranging the terms in the numerator:
β^1=
∑ y i ( x i− x̄) (III)
∑ x i2−n x¯2
Note that due to the assumption of the linear model, in Eq. (III) except yi, the other terms can be treated
as constant. Therefore, replacing the expected value of yi with Eq. (7.4) gives:
E ( β^1 )=
∑ ( β 0 + β 1 x i )( x i− x̄) (IV)
∑ x i2−n x¯2
Expanding the terms in the numerator:
β ∑ ( x i − x̄)+ β 1 ∑ ( x i − x̄) x i
E ( β^1 )= 0 (V)
∑ x i2−n x¯2
Noting that the first term in the numerator equals to 0 and the remaining terms in the numerator (except
β1) equals to the denominator, the proof is completed.
Example 7.3
It seems logical that riskier investments might offer higher returns. A study by Statman et al. (2008)13
explored this by conducting an experiment. One group of investors rated the risk (x) of a company’s
stock on a scale from 1 to 10, while a different group rated the expected return (y) on the same scale.
This was done for 210 companies, and the average risk and return scores were calculated for each. Data
for a sample of ten companies, ordered by risk level, is given below:
x = [4.3, 4.6, 5.2, 5.3, 5.5, 5.7, 6.1, 6.3, 6.8, 7.5]
y = [7.7, 5.2, 7.9, 5.8, 7.2, 7, 5.3, 6.8, 6.6, 4.7]
How does the risk of an investment related to its expected return? (Adapted from Devore et al. 2021)
Solution:
Script 7.1
import scisuit.plot as plt
x = [4.3, 4.6, 5.2, 5.3, 5.5, 5.7, 6.1, 6.3, 6.8, 7.5]
y = [7.7, 5.2, 7.9, 5.8, 7.2, 7, 5.3, 6.8, 6.6, 4.7]
plt.scatter(x=x, y=y)
plt.show()
13 Statman M, Fisher KL, Anginer D (2008). Affect in a Behavioral Asset-Pricing Model. Financial Analysts Journal,
64-2, 20-29.
It is seen that there is a weak inverse relationship
between the perceived risk of a company’s stock
and its expected return value.
Fig. (7.2) shows that there is no convincing relationship between risk and expected return of an
investment. Let’s take a look if this is numerically the case. Continuing from Script (7.1):
Script 7.2
from scisuit.stats import linregress
result = linregress(yobs=y, factor=x)
print(result)
Simple Linear Regression
F=1.85, p-value=0.211, R2=0.19
Have we carried out a reliable analysis, i.e., is there no relationship between risk and expected returns?
Devore et al. (2021) suggested that with small number of observations, it is possible not to detect a
relationship because when the sample size is small hypothesis tests do not have much power. Also note
that the original study uses 210 observations where Statman et al. (2008) concluded that risk is a useful
predictor of expected return, although the risk only accounted for 19% of expected returns. ■
7.2. Multiple Linear Regression
Suppose the taste of a fruit juice is related to sugar content and pH. We wish to establish an empirical
model, which can be described as follows:
y=β 0 + β 1 x 1 + β 2 x 2 +ϵ (7.16)
where y is the response variable (taste) and x1 and x2 are independent variables (sugar content and pH).
Unlike simple linear regression (SLR) model, where only one independent variable exists, in multiple
linear regression (MLR) problems at least 2 independent variables are of interest to us. Therefore, in
general, the response variable maybe related to k independent (regressor) variables. The model is:
This model describes a hyperplane and the regression coefficient, βj, represents the expected change in
response to per unit change in xj when all other variables are held constant (Montgomery 2012). If one
enters the data in a spreadsheet, it would generally be in the following format:
y x1 x2 … xk
y is the response variable and x are the
y1 x11 x12 … x1k regressor variables. It is assumed that n>k.
k
y=β 0 + ∑ β j x ij +ϵ i , i=1 , 2 ,... , n (7.18)
j=1
For example, for the 1st row (i=1) in Table (7.1), Eq. (7.18) yields, y1 = β0 + β1·x11 + β2·x12 +… + βk·x1k.
To find the regression coefficients, we will use a similar approach presented in section (7.1.1), such that
the sum of the squares of errors, εi, is minimized. Therefore,
( )
n k 2
L=∑ y i −β 0 −∑ β j x ij (7.19)
i=1 j=1
where the function L will be minimized with respect to β0, β1, …, βk which then will give the least
square estimators, β^1 , β^2 , .. , β^k . The derivatives with respect to β0 and βj are:
| ( )
n k
∂L
=−2 ∑ y i − β^0 −∑ β^ j x ij (7.20-a)
∂ β0 β^0 , β^1 ,... , β^k i=1 j=1
| ( )
n k
∂L
=−2 ∑ y i − β^0 −∑ β^ j x ij x ij (7.20-b)
∂βj β^0 , β^1 ,... , β^k i=1 j=1
After some algebraic manipulation, Eq. (7.20) can be written in matrix notation as follows:
[ ][ ][ ]
n ∑ xi 1 ∑ xi 2 ... ∑ x ik β^0 ∑ yi
∑ xi 1 ∑ x i21 ∑ x i 1 x i 2 ... ∑ x i 1 x ik β^1 ∑ xi 1 yi (7.21)
⋮ ⋮ ⋮ ... ⋮ ⋮ ⋮
∑ x ik ∑ x ik x i 1 ∑ x ik x i 2 ... ∑ x ik2 β^
k ∑ ik yi
x
X⋅β= y (7.22)
Note that since X is an i by k matrix, therefore not square, the inverse does not exist and therefore the
equation cannot be solved. The least-squares approach to solving Eq. (7.22) is by multiplying with
transpose of X:
A process engineer who was tasked to improve the viscosity of a polymer, among the several factors,
chose two process variables: reaction temperature and feed rate. She ran 16 experiments and collected
the following data:
Temperature = [80, 93, 100, 82, 90, 99, 81, 96, 94, 93, 97, 95, 100, 85, 86, 87]
Feed Rate = [8, 9, 10, 12, 11, 8, 8, 10, 12, 11, 13, 11, 8, 12, 9, 12]
Viscosity = [2256, 2340, 2426, 2293, 2330, 2368, 2250, 2409, 2364, 2379, 2440, 2364, 2404, 2317, 2309, 2328]
Explain the effect of feed rate and temperature on polymer viscosity. (Adapted from Montgomery 2012).
Solution:
The solution involves several computations which can be performed by using a spreadsheet or by using
Python with numpy library. Step by step solution for the coefficients can be found in the textbook from
Montgomery (2012). We will be skipping all these steps and directly solve it using scisuit’s builtin
linregress function.
Script 7.3
from scisuit.stats import linregress
#input values
temperature = [80, 93, 100, 82, 90, 99, 81, 96, 94, 93, 97, 95, 100, 85, 86, 87]
feedrate = [8, 9, 10, 12, 11, 8, 8, 10, 12, 11, 13, 11, 8, 12, 9, 12]
viscosity = [2256, 2340, 2426, 2293, 2330, 2368, 2250, 2409, 2364, 2379, 2440, 2364, 2404, 2317, 2309, 2328]
Based on Eq. (7.24), the p-value tells us that at least one of the two variables (temperature and feed
rate) has a nonzero regression coefficient. Furthermore, analysis on individual regression coefficients
show that both temperature and feed rate have an effect on polymer’s viscosity.
According to Larsen & Marx (2011), applied statisticians find residual plots to be very helpful in
assessing the appropriateness of fitting. Continuing from Script (7.3), let’s plot the residuals:
Script 7.4
import scisuit.plot as plt
import scisuit.plot.gdi as gdi
#x=Fits, y=Residuals
plt.scatter(x=result.Fits, y= result.Residuals)
plt.show()