Engineering Math Class Note II-1
Engineering Math Class Note II-1
We approach the study of statistics by using a four-step process to learn from data
1
Then, you can use inferential statistics to formally test hypotheses and make estimates
about the population. Finally, you can interpret and generalize your findings.
Nature of Variables and Types of Data
Variables can be classified as qualitative or quantitative.
1. Qualitative variables are variables that can be placed into distinct categories,
according to some characteristics or attributes. For example, categorization
according to gender (male or female) then, variable gender is qualitative and it
takes categorical data, let us say 1 or 2. Other examples are religious preference,
ability level and geographical locations.
2. Quantitative variables are numerical and can be ordered or ranked. For example,
the variable age is numerical and people can be ranked in order according to the
value of their ages. Other examples are heights, weights and body temperatures.
Quantitative can be further classified into two groups: discrete and continuous.
(a) Discrete variables can be assigned values such as 0,1,2,3 and are said to be
countable. Examples of discrete variables are the number of children in a family,
the number of students in the classroom etc. Thus, discrete variables assume
values that can be counted.
(b) Continuous variables, by comparison can assume all value in an interval between
any two specific values. Temperature is a continuous variable since it can assume
all values between any two given temperatures. Data could also be categorized as
purely numerical and not purely numerical. For example, data involving the
number of people and their mode of collecting information. Data can also assume
nominal level (assigning A, B, C), or ordinal (ordered or ranked), interval (precise
difference exist) or ratio.
Statistical Test are Parametric and Non-Parametric Tests
In the literal meaning of the terms, a parametric statistical test is one that makes
assumptions about the parameters (defining properties) of the population
distribution(s) from which one's data are drawn, while a non-parametric test is one that
makes no such assumptions.
Parametric Tests
Parametric statistics consists of the parameters like mean, standard deviation, variance,
etc. Thus, it uses the observed data to estimate the parameters of the distribution. Data
are often assumed to come from a normal distribution with unknown parameters.
Parametric tests are those that assume that the sample data comes from a population that
follows a probability distribution. Thus, parametric statistics include the normal
distribution, the student’s t test and a one-way Analysis of Variance (ANOVA).
Non-Parametric Tests
A parametric test requires a sample to be normally
distributed. A nonparametric test does not rely on parametric assumptions like
normality. A nonparametric test is a statistical test that makes minimal assumptions
about the underlying distribution of the data. It's often used when the assumptions of
parametric tests, like normality, are not met. Nonparametric tests are also called
"distribution-free" tests because they don't rely on a specific distribution. Examples
include: Chi-Square Test, Wilcoxon Signed Rank Test, Mann-Whitney Test and Kruskal-
Wallis Test.
2
Regression analysis can be used to identify the line or curve which provides the best fit
through a set of data points. This curve can be useful to identify a trend in the data,
whether it is linear, parabolic, or of some other form.
Regression allows researchers to predict or explain the variation in one variable based
on another variable.
• The variable that researchers are trying to explain or predict is called the response
variable. It is also sometimes called the dependent variable because it depends on
another variable.
• The variable that is used to explain or predict the response variable is called the
explanatory variable. It is also sometimes called the independent variable because
it is independent of the other variable.
Applications
There are four broad classes of applications of regression analysis.
• Descriptive or explanatory: interest may be on describing “What factors influence
variability in dependent variable?” For example, factor contributing to higher
sales among company’s sales force.
• Predictive, for example setting normal quota or baseline sales. We can also use
estimated equation to determine “normal” and “abnormal” or outlier
observations.
• Comparing Alternative theoretical explanations:
- Consumers use reference price in comparing alternatives,
- Consumers use specific price points in comparing alternatives.
• Decision purpose:
- Estimating variable and fixed costs having calibrated cost function.
- Estimating sales, revenues and profits having calibrated demand function.
- Setting optimal values of marketing mix variables.
- Using estimated equation for “What if” analysis.
Regression Line
Plotting the data points below in a graph
N X Y
1 2 3
2 4 5
3 8 7
4 11 7.5
5 14 8
6 18 9
7 21 12
8 24 14
9 25 17
10 28 19
We have
3
DATA PLOT
20
18 y = 0.5481x + 1.6545
16 R2 = 0.9214
14
12
10
8
6
4
2
0
0 5 10 15 20 25 30
In the “Scatterplots and Correlation” in the chart above, it can be explained that the
scatterplot of two variables that are strongly related tends to describe a line. Software
can be used to compute this line precisely.
• This process is called a regression analysis.
• A regression line is a straight line that describes how a response variable y
changes as an explanatory variable x change.
• A regression line can be used to predict the value of y for a given value of x.
• Regression analysis identifies a regression line.
• The regression line shows how much and in what direction the response variable
changes when the explanatory variable changes.
Least-Squares Regression Line
• The regression line is obtained by applying what is called the least-squares
computation procedure.
• On the graph presented earlier, it was explained that individual points are located
near the line, but very few points, if any, are located exactly on the line.
• To obtain the best approximation of the data, the line is placed in the location
where the distance from all the points to the line is minimal.
• In other words, to predict y, the regression line needs to be as close as possible to
the data points in the vertical (y) direction.
• Some of the points are above the line and some are below the line. If the
differences from these points to the points on the line are computed, some
differences will be positive while others will be negative. Direction is not
important, so the differences are squared (to eliminate the negatives).
• This method is called the least-squares computation procedure because it aims to
minimize the squared distances between each of the points and the line.
The line is represented by the straight-line equation below
𝑌 = 𝑎0 + 𝑎1 𝑋
This is called the regression line of Y on X. The subscripts a0 and a1 are determined from
equation 1, 2, 3 and 4 below
ƩY = 𝑎0 N + 𝑎1 ƩX (1)
ƩXY = 𝑎0 ƩX + 𝑎1 ƩX2 (2)
4
Solving equations 1 and 2 above simultaneously, provides solution for the coefficients a0
and a1. They can also be calculated from equations 3 and 4 below
5
- Partial correlation: analysis recognizes more than two variables but considers
only two variables keeping the other constant.
- Total correlation: is based on all the relevant variables, which is normally not
feasible.
Linear correlation: Correlation is said to be linear when the amount of change in one
variable tends to bear a constant ratio to the amount of change in the other. The graph of
the variables having a linear relationship will form a straight line.
Non-Linear correlation: The correlation would be nonlinear if the amount of change in
one variable does not bear a constant ratio to the amount of change in the other variable.
CORRELATION COEFFICIENT
Coefficient of Correlation denoted by ‘r’: The coefficient of correlation ‘r’ measures the
degree of linear relationship between two variables say x and y. This was developed by
Karl Pearson. The correlation coefficient, r, ranges between 0 and 1 or between 0 and -1.
The closer it is to unity, the stronger the correlation and vice versa. The correlation
coefficient, r, is calculated from the equation below
𝑁(ƩXY) ― (ƩX) (ƩY)
𝑟=
[𝑁ƩX2 ― (ƩX)2 ] [𝑁ƩY2 ― (ƩY)2 ]
COEFFICIENT OF DETERMINATION
The convenient way of interpreting the value of correlation coefficient is to use of square
of
coefficient of correlation which is called Coefficient of Determination (R2).
Suppose: r = 0.9, then R2 = 0.81, meaning that 81% of the variation in the dependent
variable
has been explained by the independent variable.
The maximum value of R2 is 1 because it is possible to explain all of the variation in y but
it
is not possible to explain more than all of it.
Coefficient of Determination = Explained variation / Total variation
Spearman’s Rank Coefficient of Correlation
When statistical series in which the variables under study are not capable of quantitative
measurement but can be arranged in serial order, in such situation Pearson’s correlation
coefficient cannot be used in such case. Spearman’s Rank correlation can be used.
• This method is useful where we can give the ranks and not the actual data
(qualitative term)
• This method is to use where the initial data in the form of ranks
6ƩD2
𝑟𝑠 = 1 ―
𝑁(𝑁 2 ― 1)
R = Rank correlation coefficient
D = Difference of rank between paired item in two series.
N = Total number of observations.
The value of rank correlation coefficient, R ranges from -1 to 1
• If R = 1, then there is complete agreement in the order of the ranks and the ranks
are in the same direction
• If R = -1, then there is complete agreement in the order of the ranks and the ranks
are in the opposite direction
• R = 0, then there is no correlation
Example 1
Fiddling with Phone 20 5 8 10 13 7 13 5 25 14
6
GPA 2.35 3.80 3.50 2.75 3.25 3.40 2.90 3.50 2.25 2.75
The table above shows the survey on the connection between Grade Point average (GPA)
and the numbers of hours spend fiddling with phones per week conducted on 10
University of Uyo students
(a) Determine the regression line (equation) that represents expected effect of phone
fiddling on GPA
(b) Determine the Correlation and Rank correlation coefficients of the relationship
and make your inference on the relationship
Solution
Completing the statistics table gives
S/N X Y X2 Y2 XY
1 20 2.35 400 5.5225 47
2 5 3.8 25 14.44 19
3 8 3.5 64 12.25 28
4 10 2.75 100 7.5625 27.5
5 13 3.25 169 10.5625 42.25
6 7 3.4 49 11.56 23.8
7 13 2.9 169 8.41 37.7
8 5 3.5 25 12.25 17.5
9 25 2.25 625 5.0625 56.25
10 14 2.74 196 7.5076 38.36
Ʃ10 120 30.44 1822 95.1276 337.36
The regression line (equation) that represents expected effect of phone fiddling on GPA
is the same as Y on X, with the straight-line equation
𝑌 = 𝑎0 + 𝑎1 𝑋
Where
(ƩY) (ƩX2 ) ― (ƩX)(ƩXY) (30.44) (1822) ― (120)(337.36) 14978.48
𝑎0 = = = = 3.93
𝑁ƩX2 ― (ƩX)2 10(1822) ― (120)2 3820
𝑁(ƩXY) ― (ƩX)(ƩY) 10(337.36) ― (120)(30.44) ―279.20
𝑎1 = = = = ―0.073
𝑁ƩX2 ― (ƩX)2 10(1822) ― (120)2 3820
𝑌 = 3.93 ― 0.073𝑋
The Correlation coefficient, r is given by
𝑁(ƩXY) ― (ƩX) (ƩY) 10(337.36) ― (120) (30.44)
𝑟= =
[𝑁ƩX2 ― (ƩX)2 ] [𝑁ƩY2 ― (ƩY)2 ] [10(1822) ― (120)2 ] [10(95.13) ― (30.44)2 ]
(3373.6) ― (3652.80) ―279.20
= = ―0.97
[3820][10(24.71] 307.23
The Rank Correlation coefficient, rs is given by
S/ Ranked Ranked
FWP Position Rank GPA Position Rank D D2
N FWP GPA
1 5 1 1.5 2.25 1 1 9 2 7 49
7
4 8 4 4 2.75 4 3.5 5 3.5 1.5 2.25
6 13 6 6.5 3.25 6 6 3 7 -4 16
9 20 9 8 3.50 9 8.5 10 1 9 81
Ʃ312.5
Example 2
The table below shows the final grade obtained in mathematics and physics by 10
students selected at random from a large group of students.
Mathematics (X) 75 80 93 65 87 71 98 68 84 77
Physics (Y) 82 78 86 72 91 80 95 72 89 74
(a)(i) Determine the straight-line equation of the relationship, using X as the independent
variable
(ii) What would be the expected grade in physics for a student who scored 75 in
mathematics?
(iii) If a student scored 95 in physics, what grade is expected in mathematics?
S/N X Y X2 Y2 XY
1 75 82 5625 6724 6150
2 80 78 6400 6084 6240
3 93 86 8649 7396 7998
4 65 72 4225 5184 4680
5 87 91 7569 8281 7917
6 71 80 5041 6400 5680
7 98 95 9604 9025 9310
8 68 72 4624 5184 4896
9 84 89 7056 7921 7476
10 77 74 5929 5476 5698
Ʃ 10 798 819 64722 67675 66045
𝑌 = 𝑎0 + 𝑎1 𝑋
8
Where
(ƩY) (ƩX2 ) ― (ƩX)(ƩXY) (819) (64722) ― (798)(66045)
𝑎0 = =
𝑁ƩX2 ― (ƩX)2 10(64722) ― (798)2
53007318 ― 52703910
=
647220 ― 636804
303408
= = 29.13
10416
𝑁(ƩXY) ― (ƩX)(ƩY) 10(66045) ― (798)(819) 660450 ― 653562
𝑎1 = = =
𝑁ƩX2 ― (ƩX)2 10(64722) ― (798)2 647220 ― 636804
6888
= = 0.66
10416
𝑌 = 29.13 + 0.66𝑋
(ii) The expected grade in physics for a student who scored 75 in mathematics:
𝑌 = 29.13 + 0.66(75) = 78.63
(iii) The expected mathematics grade of a student who scored 95 in physics:
𝑌 ― 29.13 95 ― 29.13
𝑋= = = 99.80
0.66 0.66
(b)(i) Determine the coefficient of correlation of the score relationship
AX RX AY RY X ARX Y ARY D D2
65 1 72 1.5 75 4 82 2 2 4
68 2 72 1.5 80 6 78 4 2 4
71 3 74 3 93 9 86 7 2 4
75 4 78 4 65 1 72 1.5 -0.5 0.25
77 5 80 5 87 8 91 9 -1 1
80 6 82 6 71 3 80 5 -2 4
84 7 86 7 98 10 95 10 0 0
87 8 89 8 68 2 72 1.5 0.5 0.25
93 9 91 9 84 7 89 8 -1 1
98 10 95 10 77 5 74 3 2 4
22.5
9
Sampling theory is a study of relationship existing between a population and sample data
drawn from the population. This is useful in estimating unknown population quantities
like mean and variance often called population parameters or simply parameters from a
knowledge of corresponding sample quantities like sample mean and variance often
called sample statistics or simply statistics. Sampling theory is used in determining
whether the observed differences between samples are due to chance variation or
whether they are really significant.
Basically, a study of inferences made concerning a population by using sample drawn
from it, together with the accuracy of such inferences by using probability theory, is called
statistical inference
Statistical Inference: is to draw conclusions about the Population on the basis of
information available in the sample which has been drawn from the population by a
random sampling technique/ procedure. There are two branches Statistical Inference
namely ESTIMATION & TESTING OF HYPOTHESIS.
Basic Definitions:
Population: Any collection of individuals under study is said to be Population (Universe).
The individuals are often called the members or the units of the population which may be
physical objects or measurements expressed numerically or otherwise.
Sample: A part or small section selected from the population is called a sample and
process of such selection is called sampling.
(The fundamental object of sampling is to get as much information as possible of the
whole universe by examining only a part of it. An attempt is thus made through sampling
to give the maximum information about parent universe with the minimum effort).
Parameters: Statistical measurements such as Mean, Variance, standard deviation, etc. of
the population are called parameters.
Hypothesis: is a statement given by an individual. Usually, it is required to make decisions
about populations on the basis of sample information. Such decisions are called Statistical
Decisions. In attempting to reach decisions it is often necessary to make assumption
about population involved. Such assumptions, which are not necessarily true, are called
statistical hypothesis.
Sampling Process
1. Random Samples and Random Numbers
In order that the conclusions of sampling theory and statistical inference be valid,
samples must be chosen so as to be a representative of a population.
A study of sampling methods and of the related problems that arises is called the Design
of the Experiment.
One method in which a representative sample may be obtained is by a process called
random sampling: which gives each member of the population an equal chance of been
included in the sample selection. One technique of obtaining random sample is to assign
numbers to each member of the population and write the numbers on pieces of paper,
place them in an urn, mix thoroughly and then draw numbers from the urn. Another
method is to use a table of random numbers specifically constructed for such purposes.
2. Sampling With Replacement and Without Replacement
If we draw a number from the urn, we have the choice of replacing or not replacing the
number into the urn before a second drawing. In the first case, the number can come up
again and again, whereas in the second it can only come up once.
Sampling where each member of the population may be chosen more than once is called
sampling with replacement. while
10
Sampling where each member of the population cannot be chosen more than once is
called sampling without replacement
Populations are either finite or infinite. If we draw 10 balls successively without
replacement from an urn containing 100 balls, we are sampling from a finite population.
However, if we toss a coin 50 times and count the number of heads, we are sampling from
an infinite population.
SAMPLING DISTRIBUTION
Consider all possible samples of size N that can be drawn from a given population (either
w or w/o replacement). For each sample, we compute a statistic (such as mean and
standard deviation) that will vary from sample to sample. In this manner we obtain a
distribution of the statistic that is called its sampling distribution.
If, for example, the particular statistic used is the sample mean, then the distribution is
called the sampling distribution of means. Similarly, we could have sampling
distributions of standard deviations, variance, median, proportions, etc
Sampling Distribution of Means
Supposed that all possible samples of size N that can be drawn from finite population of
size Np > N. If we denote the mean and standard deviation of the sampling distribution
of means by 𝜇𝑋 𝑎𝑛𝑑 𝜎𝑋 and the population mean and standard deviation by μ and σ
respectively,
𝜎 𝑁𝑃 ― 𝑁
𝜇𝑋 = 𝜇 𝑎𝑛𝑑 𝜎𝑋 =
𝑁 𝑁𝑃 ― 1
If the population is infinite of if sampling is with replacement, the above results reduce to
𝜎 𝑁𝑃 ― 𝑁
𝜇𝑋 = 𝜇 𝑎𝑛𝑑 𝜎𝑋 =
𝑁 𝑁𝑃 ― 1
Sampling Distribution of Proportions
Supposed that a population is finite and that the probability of occurrence of an event
(called success) is p, while the probability of non-occurrence of an event is q = p – 1. For
example, the population may be all tosses of a fair coin in which the probability of the
event ‘head’ is p = ½. Consider all possible samples of sizes N drawn from this population,
and for each sample determine the proportion P of successes. In the case of the coin, P
would be the proportion of head turning up in N tosses. We obtain a sampling distribution
of proportions whose mean μp and standard deviation σp are given by
𝑝𝑞 𝑝(1 ― 𝑝)
𝜇𝑝 = 𝜇 𝑎𝑛𝑑 𝜎𝑝 = =
𝑁 𝑁
Sampling Distribution of Difference of Sums
Supposed that we are given two populations. For each sample of size N1 drawn from the
first population, let us compute a statistic S1; this yields a sampling distribution for
statistic S1 whose mean and standard deviation we donate by μs1 and σs1, respectively.
Similarly, for each sample of size N1 drawn from the second population, let us compute a
statistic S2; this yields a sampling distribution for statistic S1 whose mean and standard
deviation we donate by μs2 and σs2, respectively, from all possible combinations of these
samples from the two populations we can obtain a distribution of differences, S1 – S2,
which is called sampling distribution of difference the statistics. The mean and standard
deviation of this sampling distribution, denoted respectively by μs1 - S2 and σs1-s2 are given
by
11
𝜇𝑆𝑆―𝑆𝑆 = 𝜇𝑆𝑆 ― 𝜇𝑆𝑆 𝑎𝑛𝑑 𝜎𝑆𝑆―𝑆𝑆 = 𝜎2𝑆𝑆 + 𝜎2𝑆𝑆
(provided that the samples chosen do not in any way depend on each other: that is, the
samples are independent)
If S1 and S2, are sample means from two populations; whose mean we denote by 𝑋1 𝑎𝑛𝑑
𝑋2
respectively, then the sample distribution of the differences of means is given for infinite
population with means and standard deviations (μ1, σ1) and (μ1, σ1), respectively by
𝜎12 𝜎22
𝜇𝑋𝑋―𝑋𝑋 = 𝜇𝑋𝑋 ― 𝜇𝑋𝑋 = 𝜇1 ― 𝜇2 𝑎𝑛𝑑 𝜎𝑋𝑋―𝑋𝑋 = 𝜎2𝑋𝑋 + 𝜎2𝑋𝑋 = +
𝑁1 𝑁2
STATISTICAL DECISION THEORY
Statistical Decisions
Most times in practice, we are required to make decisions about populations on the basis
of sample information. Such decisions are called statistical decisions.
Statistical Hypothesis or Test Hypothesis
Oftentimes, in an attempt to reach decisions, it is useful to make assumptions (or guesses)
about the population involved. Such assumptions, which may or may not be true, are
called statistical hypothesis. They are generally statements about the probability
distribution of the population.
we want to determine whether a claim is true or false. Such a claim is called a hypothesis.
• Null hypothesis: A specific hypothesis to be tested in an experiment. The null
hypothesis is usually labeled H0. A hypothesis which is tested under the
assumption that it is true is called a null hypothesis
• Alternative hypothesis: A hypothesis that is different from the null hypothesis,
which we usually want to show, is true (thereby showing that the null hypothesis
is false). The alternative hypothesis is usually labeled H1. (The hypothesis against
which we test the null hypothesis, is an alternative hypothesis).
• If the alternative involves showing that some value is greater than or less than a
number, there is some value c that separates the null hypothesis rejection region
from the fail to reject region. This value is known as the critical value.
The null hypothesis is tested through the following procedure:
1. Determine the null hypothesis and an alternative hypothesis.
2. Pick an appropriate sample.
3. Use measurements from the sample to determine the likelihood of the null
hypothesis.
Test: Test is a rule through which we test the null hypothesis against the given alternative
hypothesis.
12
• Type I error: If the null hypothesis is true but the sample mean is such that the null
hypothesis is rejected, a Type I error occurs. The probability that such an error will
occur is the, α risk.
• Type II error: If the null hypothesis is false but the sample mean is such that the null
hypothesis cannot be rejected, a Type II error occurs. The probability that such an
error will occur is called the, β risk.
The wrong decision of rejecting a null hypothesis H0, when it is true is called the Type I
Error i.e. we reject H0 when it is true. Similarly, the wrong decision of accepting the null
hypothesis H0 when it is not true is called the Type II Error i.e. we accept H0 when H1 is
true.
Level of Significance: The probability level below which we reject the hypothesis is called
level of significance. The levels of significance usually employed in testing of hypothesis
are 5% and 1%.
P-Value: The p-value is the level of marginal significance within a statistical hypothesis
test representing the probability of the occurrence of a given event. The p-value is used
as an alternative to rejection points to provide the smallest level of significance at which
the null hypothesis would be rejected.
Critical Region and Acceptance Region: A region (corresponding to a statistic t) is called
the sample space. The part of sample space which amounts to rejection of null hypothesis
H0, is called critical region or region of rejection.
Probability Distribution Function
Probability Distribution refers to the function that gives the probability of all possible
values of a random variable. It shows how the probabilities are assigned to the different
possible values of the random variable.
Common types of probability distributions Include:
• Binomial Distribution.
• Bernoulli Distribution.
• Normal Distribution.
• Geometric Distribution.
Probability Distribution Graph
The graph that plots the Probability Distribution Functions are called the Probability
Distribution graphs. These graphs help us to visualize the probability distribution around
a random variable and help us to easily find the required solution.
The sum of all the probabilities in any discrete distribution is one and for a continuous
distribution of random variables the area under the graph is equal to 1. The distribution
graph of the continuous distribution function is added below, where X (the random
variable) lies between a and b: Pr(a < X < b). It is made using the Probability Density
Function.
13
Example 1: Let a pair of a fair dice be tossed and let X denote the some of the point
obtained. Plot the graph of P(X) against X
Solution:
X 2 3 4 5 6 7 8 9 10 11 12
P(X) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
p(X)0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 2 4 6 8 10 12 14
X
P(X) is called a probability density function or simply density function, and when such
function is given, we say that a continuous probability distribution for X has been defined.
The variable X is often called a continuous random variable
Example 2: The number of old people living in houses on a randomly selected city block
is described by the following probability distribution.
No. of Adults X 3 4 5 6
Probability P(X) 0.5 0.25 0.1 ?
What is the probability that 6 or more old people live in a randomly selected house?
Solution:
Sum of all the p(probability) is equal to 1
Probability that six or more old peoples live in a house
= 1 ― (0.50 + 0.25 + 0.10) = 1 ― 0.85 = 0.15
Thus, probability that six or more old peoples live in a house is equal to 0.15
14
THE BINOMIAL DISTRIBUTION
If p is the probability that an even will happen in any single trial (called the probability of
success) and q = 1 – p is the probability that it will fail to happen in any single trial (called
the probability of failure), then the probability that the event will happen exactly X times
N trials (that is, X successes and N – X failures will occur) is given by
𝑁 𝑁!
𝑝(𝑋) = 𝑋 𝑝 𝑋 𝑞 𝑁―𝑋 = 𝑝 𝑋 𝑞 𝑁―𝑋 where X = 0, 1, 2, …., N; N! = N(N – 1)(N – 2)
𝑋!(𝑁 ― 𝑋)!
…. 1; and 0! = 1.
Example 1: What is the probability of getting exactly 2 heads in 6 tosses of a fair coin?
Solution: Using the binomial distribution formular, N = 6; X = 2, p = ½ and q = ½
2 6―2 2 6―2 6
6 1 1 6! 1 1 6×5×4×3×2 1 15
𝑝(𝑋) = 2 = = = Example 2:
2 2 2!4! 2 2 (2)(4 × 3 × 2) 2 64
What is the probability of getting at least 4 heads in 6 tosses of a fair coin?
Solution: Using the binomial distribution formular, N = 6; X = 4, 5, 6, p = ½ and q = ½
4 6―4 5 6―5 6 6―6
6 1 1 6 1 1 6 1 1
𝑝(𝑋) = 4 + 5 + 6
2 2 2 2 2 2
15 6 1 11
= + + = Alternative formular for the binomial distribution is given as
64 64 64 64
𝑁 𝑁
(𝑝 + 𝑞) 𝑁 = 𝑞 𝑁 + 1 𝑞 𝑁―1 𝑝 + 2 𝑞 𝑁―2 𝑝 2 + … + 𝑝 𝑁
𝑁 𝑁
Where 1, 1 , 2 …. 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠
Some properties of the binomial distribution are
Mean: μ = Np; Variance σ2 = Npq and standard deviation 𝜎 = 𝑁𝑝𝑞
Example 3: In 100 tosses of a fair coin the mean number of heads is μ = Np = 100(½) =
50; the variance σ2 = Npq = 100(½)(½) = 25 and the standard deviation 𝜎 = 25 = 5
NORMAL DISTRIBUTION
A normal distribution is the continuous probability distribution with a probability
density function that gives you a symmetrical bell curve. Simply put, it is a plot of the
probability function of a variable that has maximum data concentrated around one point
and a few points taper off symmetrically towards two opposite ends.
A normal distribution has a probability distribution that is centred around the mean. This
means that the distribution has more data around the mean. The data distribution
decreases as you move away from the center. The resulting curve is symmetrical about
the mean and forms a bell-shaped distribution.
15
Consider the below graph which shows the probability distribution of heights in a class:
The normal distribution is the most widely known and used of all distributions. Because
the
normal distribution approximates many natural phenomena so well, it has developed
into a
standard of reference for many probability problems.
16
The Normal Distribution Curve
A normal distribution, sometimes called the bell curve, is a distribution that occurs
naturally in many situations. For example, the bell curve is seen in general tests like the
SAT and GRE. The bulk of students will score the average (C), while smaller numbers of
students will score a B or D. An even smaller percentage of students score an F or an A.
This creates a distribution that resembles a bell (hence the nickname). The bell curve is
symmetrical. Half of the data will fall to the left of the mean; half will fall to the right.
Using the equation for a normal density function could be difficult and tedious. Thus, the
equation below called the standard score is used
𝑥―𝜇 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡 ― 𝑚𝑒𝑎𝑛
𝑧= →𝑧 = 𝑠𝑐𝑜𝑟𝑒 =
𝜎 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛
17
The z-score is used to tell how far from the mean the data point is. You calculate it using
the mean and standard deviation, so it can also be said that the Z-Score is how many
standard deviations below or above the mean the data is.
The z-score is used to standardize your normal distribution. Using the z-score, you can
convert each data point into a value in terms of mean and standard deviation, effectively
converting the graph into a scaled-down version. The z-score tells you how far each data
point is from the mean in steps of standard deviation. So, with the mean and standard
deviation, you can plot all points on our graph.
Examples
1. A summary of the daily travel time of a person commuting from work is given below.
The values are in minutes. Calculate the Mean, Standard Deviation, and Z-Score.
Now, subtract the mean from each data point and find the variance and standard
deviation.
18
The Z-Score tells us where the data point falls relative to other points. The z-score will
tell you how far away from the mean a point is in steps of your standard deviation. Now,
calculate the z-score for each point
The negative values tell you that the point lies below the mean and positive values imply
that the point is above the mean. Multiplying each value with the standard deviation will
give the difference between mean and datapoint
𝑋 = 𝑍𝜎 + 𝜇
19
We now consult the Normal Distribution table for standard scores
2. The sample of scores in an Engineering course exam were 20, 15, 26, 32, 18, 28, 35, 14,
26, 22, and 17 out of 60 marks. The lecturer felt the exam must have been really hard and
decided to standardize all the scores so that failure would be from scores 1 standard
deviation (1σ) below the mean. What are the scores of the students who would fail if the
standard deviation is 6.6?
Equation for standard score is given as:
𝑥―𝑥
𝑧=
𝑠𝑑
20
z-value for 2.05 = 0.4798
The area to the right of z = 2.05 is α = 0.5000 – 0.4798 = 0.0202
z-value for -1.44 = 0.4251
The area to the left of z = -1.44 is α = 0.5000 – 0.4251 = 0.0749
To the right of z = 2.05 or to the left of z = -1.44 is 0.0202 + 0.0749 = 0.0951
4. A survey conducted on 1000 middle school students on time spent on social media
weekly has a normal distribution with mean of 20 hours and standard deviation of 5.0
hours. Determine the percentage and number of students
• Who spend less than 25 hour per week
𝑥 ― 𝑥 25 ― 20 5
𝑧= = = = 1.00
𝑠𝑑 5 5
z-value for 1.00 = 0.3413
Lest than 25 hours per week is α = 0.5000 + 0.3413 = 0.8413
Representing 84.13% of the population
Thus, the number of students = 0.8413 × 1000 = 841 students
• Who spend over 30 hour per week
𝑥 ― 𝑥 30 ― 20 10
𝑧= = = = 2.00
𝑠𝑑 5 5
z-value for 2.00 = 0.4772
Over 30 hours per week is α = 0.5000 – 0.4772 = 0.0228
Representing 2.3% of the population
Thus, the number of students = 0.023 × 1000 = 23 students
• What is the remaining population?
1000 – (841 + 23) = 136 students
5. If the inner diameter of nuts produced by a company is normally distributed with mean
0.5 inches and standard deviation 0.005 inches. The nuts are considered defective if their
inner diameter is less than 0.490 inches or greater than 0.510 inches. Find the percentage
defective nuts. If the company produces 20 million nuts per annum, and the cost of
producing one nut is ₦12.50k. Calculate the annual losses due to defective nuts if all
defective nuts are counted as complete waste.
𝑥―𝑥
𝑧=
𝑠𝑑
0.490 ― 0.500 ―0.01
𝑓𝑜𝑟 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑖𝑛𝑛𝑒𝑟 𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟, 𝑧 = = = ―2.00
0.005 0.005
0.510 ― 0.500 0.01
𝑓𝑜𝑟 𝑡ℎ𝑒 𝑙𝑎𝑟𝑔𝑒𝑟 𝑖𝑛𝑛𝑒𝑟 𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟, 𝑧 = = = 2.00
0.005 0.005
z value at 2.00 = 0.4772
To the left = to the right of z = 2, with an area α = 2(0.5000 – 0.4772) = 0.0456
The percentage defective nuts are 0.0465 × 100 = 4.56%
The annual loss incurred as a result of defective nuts = 20,000,000 × 0.0456 × ₦12.50
= ₦11,400,000
Testing of Normal Distributions
For a Two-Tailed Test
• The acceptance region for α = 0.05 significant level lies between – 1.96 < z <
1.96 (0.95 acceptance region) and (0.025×2 critical/rejection region)
• The acceptance region for α = 0.01 significant level lies between – 2.58 < z <
2.58 (0.99 acceptance region) and (0.005×2 critical/rejection region)
For a One-Tailed Test
21
• The acceptance region for α = 0.05 significant level lies between – 1.645 < z <
1.645 (0.95 acceptance region) and (0.025×2 critical/rejection region)
• The acceptance region for α = 0.01 significant level lies between – 2.33 < z <
2.33 (0.99 acceptance region) and (0.005×2 critical/rejection region)
22
SMALL SAMPLING THEORY
Student’s t Test
Student’s t-test, is a statistical method of testing hypotheses about the mean of a small
sample drawn from a normally distributed population when the population standard
deviation is unknown
Examples
1. Using the Student’s t – test; infer at 0.05 significant level if a washer producing machine
(for 0.050 inches washer thickness) is in proper working condition if the thickness of
randomly selected samples of washers produced lately by the machine are (0.050, 0.053,
0.051, 0.048, 0.050, 0.058, 0.046, 0.058, 0.056 and 0.060) with standard deviation of
0.003 inches.
The mean washer thickness, μ
0.050 + 0.053 + 0.051 + 0.048 + 0.050 + 0.058 + 0.046 + 0.058 + 0.056 + 0.060
𝜇=
10
= 0.053
𝑥―𝜇
𝑡= 𝑁―1
𝑠𝑑
0.053 ― 0.050 0.003
= 10 ― 1 = 9 = 9 = 3.00
0.003 0.003
A two-tailed test, at 0.05 s. f. level, 1 – 0.05/2 = 0.025, that is –t.975 to t.975
v = N – 1 = 10 – 1 = 9
Thus 9t.975 = 2.26, which is less than 3.00, therefore we reject the hypothesis that the
machine is in a good working condition at α = 0.05.
At 0.01 s. f. level, 1 – 0.01/2 = 0.005, that is –t.995 to t.995
v = N – 1 = 10 – 1 = 9
Thus 9t.995 = 3.25, which is greater than 3.00, therefore we accept the hypothesis that the
machine is in a good working condition at α = 0.01.
2. A machine used by a company to produce plates of thickness 0.050 in, was checked for
efficiency. A sample of 15 plates chosen at random gave a mean thickness of 0.053 inches
and standard deviation of 0.003 inches. Test the hypothesis that the machine is efficient
at 0.05 and 0.01 significant levels respectively.
𝑥―𝜇
𝑡= 𝑁―1
𝑠𝑑
0.053 ― 0.050 0.003
= 10 ― 1 = 9 = 9 = 3.00
0.003 0.003
A two-tailed test, at 0.05 s. f. level, 1 – 0.05/2 = 0.025, that is –t.975 to t.975
v = N – 1 = 15 – 1 = 14
Thus 9t.975 = 2.14, which is less than 3.00, therefore we reject the hypothesis that the
machine is in a good working condition at α = 0.05.
At 0.01 s. f. level, 1 – 0.01/2 = 0.005, that is –t.995 to t.995
v = N – 1 = 15 – 1 = 14
Thus 9t.995 = 2.98, which is less than 3.00, we also reject the hypothesis that the machine
is in a good working condition at α = 0.01
23
Analysis of Variance (ANOVA)
Examples
Solution
Manipulating the given table as shown below
𝑇𝑦𝑝𝑒 𝐵 𝑇𝑦𝑝𝑒 𝐶 𝑇𝑦𝑝𝑒 𝐷
𝑇𝑦𝑝𝑒 𝐴 ― 60; ― 60; ― 60; ― 60
2 3 4
Reduces the statistics values to a uniform table as shown below
A 8 12 17 -18 -7 12 144
C 0 22 4 15 12 53 2809
E 4 5 10 8 -7 20 400
VB = 1/bƩX2j- (T)2/ab a – 1 = 5 – 1 SB =
=4 VB/DF =
= 1/5 × 3874 – 658.2/4 = α0.01 No sd
(54)/(5)(5) 164.55
=
= 658.16 4.43
Vw = V – VB a(b - 1) = SW =
5(5 - 1) = VW/DF =
= 2541.36 - 658.16 20 1883.2/20
=1883.2 = 94.16
24
Chi Square (χ2)
The Chi-Square test is a statistical procedure for determining the difference between
observed and expected data. This test can also be used to determine whether it correlates
to the categorical variables in our data. It helps to find out whether a difference between
two categorical variables is due to chance or a relationship between them
A chi-square test is a nonparametric statistical test that is used to compare observed and
expected results. The goal of this test is to identify whether a disparity between actual
and predicted data is due to chance or to a link between the variables under
consideration. As a result, the chi-square test is an ideal choice for aiding in our
understanding and interpretation of the connection between our two categorical
variables
Examples
1. Out of 256 visual artists surveyed to find out their zodiac sign, the results were: Aries
(29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio
(20), Sagittarius (23), Capricorn (18), Aquarius (20), and Pisces (23). Using the Chi-
square at 0.01 and 0.05 respectively; test the hypothesis that zodiac signs are evenly
distributed across visual artists.
Expected
Zodiac Observed (O) (O-E) (O-E)2 (O-E)2/E
(E)
Aries 29 21 8 64 3.0476
Taurus 24 21 3 9 0.4286
Gemini 22 21 1 1 0.0476
Cancer 19 21 -2 4 0.1905
Leo 21 21 0 0 0.0000
Virgo 18 21 -3 9 0.4286
Libra 19 21 -2 4 0.1905
Scorpio 20 21 -1 1 0.0476
Sagittarius 23 21 2 4 0.1905
Capricorn 18 21 -3 9 0.4286
Aquarius 20 21 -1 1 0.0476
Pisces 23 21 2 4 0.1905
5.2381
Degree of freedom: v = k – 1; 12 – 1
At 0.01 significance level, χ2 = 24.7 and 19.7 at 0.05 significance level. Since both values
are greater than the calculated, we accept the hypothesis that the zodiac signs are evenly
distributed across visual artists.
2. The table below shows the outcome of 500 tosses of a pair of dice. Using the Chi-square,
test if the outcomes are fair enough at α = 0.05 and 0.01 respectively
2 3 4 5 6 7 8 9 10 11 12
15 35 49 58 65 76 72 60 35 29 6
25
3 2 0.05556 27.7778 35 7.2222 52.16049 1.8778
4 3 0.08333 41.6667 49 7.3333 53.77778 1.2907
5 4 0.11111 55.5556 58 2.4444 5.975309 0.1076
6 5 0.13889 69.4444 65 -4.4444 19.75309 0.2844
7 6 0.16667 83.3333 76 -7.3333 53.77778 0.6453
8 5 0.13889 69.4444 72 2.5556 6.530864 0.0940
9 4 0.11111 55.5556 60 4.4444 19.75309 0.3556
10 3 0.08333 41.6667 35 -6.6667 44.44444 1.0667
11 2 0.05556 27.7778 29 1.2222 1.493827 0.0538
12 1 0.02778 13.8889 6 -7.8889 62.23457 4.4809
10.3456
v = k – 1 = 11 – 1 = 10
At α = 0.05, and v = 10, χ2 = 18.3, and at 0.01, α = 0.05, and v = 10, χ2 = 23.2. Since
both values are greater than the calculated, we accept the hypothesis that the outcome
is fair at both α = 0.05, and 0.01.
26