0% found this document useful (0 votes)

9 views

Engineering Math Class Note II-1

This is the note for any engineering math student

Uploaded by

Emediong Useh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Engineering Math Class Note II-1

This is the note for any engineering math student

Uploaded by

Emediong Useh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

STATISTICS

Statistics can be defined in many ways as stated below

1. The study of data is called STATISTICS

2. Statistics deals with scientific and analytical methods of collecting, organizing,
analyzing and presenting data in such a way that some meanings and conclusions
could be made out of something that appears to be jungle of data.
3. Statistics is a set of scientific principles and techniques that are useful in reaching
conclusions about populations and processes when the available information is
both limited and variable; that is, statistics is the science of learning from data
4. Statistics is the organization, summarization, and presentation of data while inferential
statistics on the other hand, consists of generalizing from samples to populations,
performing estimation and hypothesis testing, determining relationships among
variables, and making predictions.

Statistical analysis means investigating trends, patterns, and relationships using

quantitative data. It is an important research tool used by scientists, governments,
businesses, and other organizations.

We approach the study of statistics by using a four-step process to learn from data

1. Design the data collection process.

2. Prepare the data for analysis.
3. Analyze the data.
4. Communicate the results of the data analysis.
Statistics can be broadly classified into two categories or branches; vis: descriptive and
inferential statistics
1. Descriptive statistics refers to the type of statistics, which deal with collection,
organizing, summarizing and describing quantitative data. For example, the
average score describes the performance of a class but does not make any
generalization about other classes. Descriptive statistics uses graphs, charts (pie
charts, column charts, bar charts, histogram, etc), Pictograms, tables and any form
whereby data are displayed for easier understanding. Other examples are
measures of central tendency (mode, mean, median), correlation coefficient
(degree of relationship), kurtosis, skewedness etc.
2. Inferential statistics deals with the methods by which inferences are made on a
population on the basis of the observations made from its smaller sample. For
example, a researcher may want to estimate the average score of two or more
classes of an Engineering course by making use of the average score of one class.
Any procedure of making generalization that goes beyond the original data is
called inferential statistics. The statistics provide a way of testing the significance
of results obtained when data are collected. It thus uses probability, that is, the
chance of an event occurring. Examples of inferential statistical tools are student
t-test, Analysis of Variance, Analysis of Covariance, Correlation Analysis, Multiple
regressions, Multivariate Analysis of Variance etc.
To draw valid conclusions, statistical analysis requires careful planning from the very
start of the research process. You need to specify your hypotheses and make decisions
about your research design, sample size, and sampling procedure. After collecting data
from your sample, you can organize and summarize the data using descriptive statistics.

1
Then, you can use inferential statistics to formally test hypotheses and make estimates
about the population. Finally, you can interpret and generalize your findings.
Nature of Variables and Types of Data
Variables can be classified as qualitative or quantitative.
1. Qualitative variables are variables that can be placed into distinct categories,
according to some characteristics or attributes. For example, categorization
according to gender (male or female) then, variable gender is qualitative and it
takes categorical data, let us say 1 or 2. Other examples are religious preference,
ability level and geographical locations.
2. Quantitative variables are numerical and can be ordered or ranked. For example,
the variable age is numerical and people can be ranked in order according to the
value of their ages. Other examples are heights, weights and body temperatures.
Quantitative can be further classified into two groups: discrete and continuous.
(a) Discrete variables can be assigned values such as 0,1,2,3 and are said to be
countable. Examples of discrete variables are the number of children in a family,
the number of students in the classroom etc. Thus, discrete variables assume
values that can be counted.
(b) Continuous variables, by comparison can assume all value in an interval between
any two specific values. Temperature is a continuous variable since it can assume
all values between any two given temperatures. Data could also be categorized as
purely numerical and not purely numerical. For example, data involving the
number of people and their mode of collecting information. Data can also assume
nominal level (assigning A, B, C), or ordinal (ordered or ranked), interval (precise
difference exist) or ratio.
Statistical Test are Parametric and Non-Parametric Tests
In the literal meaning of the terms, a parametric statistical test is one that makes
assumptions about the parameters (defining properties) of the population
distribution(s) from which one's data are drawn, while a non-parametric test is one that
makes no such assumptions.
Parametric Tests
Parametric statistics consists of the parameters like mean, standard deviation, variance,
etc. Thus, it uses the observed data to estimate the parameters of the distribution. Data
are often assumed to come from a normal distribution with unknown parameters.
Parametric tests are those that assume that the sample data comes from a population that
follows a probability distribution. Thus, parametric statistics include the normal
distribution, the student’s t test and a one-way Analysis of Variance (ANOVA).
Non-Parametric Tests
A parametric test requires a sample to be normally
distributed. A nonparametric test does not rely on parametric assumptions like
normality. A nonparametric test is a statistical test that makes minimal assumptions
about the underlying distribution of the data. It's often used when the assumptions of
parametric tests, like normality, are not met. Nonparametric tests are also called
"distribution-free" tests because they don't rely on a specific distribution. Examples
include: Chi-Square Test, Wilcoxon Signed Rank Test, Mann-Whitney Test and Kruskal-
Wallis Test.

REGRESSION AND CORRELATION

REGRESSION ANALYSIS

2
Regression analysis can be used to identify the line or curve which provides the best fit
through a set of data points. This curve can be useful to identify a trend in the data,
whether it is linear, parabolic, or of some other form.
Regression allows researchers to predict or explain the variation in one variable based
on another variable.
• The variable that researchers are trying to explain or predict is called the response
variable. It is also sometimes called the dependent variable because it depends on
another variable.
• The variable that is used to explain or predict the response variable is called the
explanatory variable. It is also sometimes called the independent variable because
it is independent of the other variable.
Applications
There are four broad classes of applications of regression analysis.
• Descriptive or explanatory: interest may be on describing “What factors influence
variability in dependent variable?” For example, factor contributing to higher
sales among company’s sales force.
• Predictive, for example setting normal quota or baseline sales. We can also use
estimated equation to determine “normal” and “abnormal” or outlier
observations.
• Comparing Alternative theoretical explanations:
- Consumers use reference price in comparing alternatives,
- Consumers use specific price points in comparing alternatives.
• Decision purpose:
- Estimating variable and fixed costs having calibrated cost function.
- Estimating sales, revenues and profits having calibrated demand function.
- Setting optimal values of marketing mix variables.
- Using estimated equation for “What if” analysis.
Regression Line
Plotting the data points below in a graph
N X Y
1 2 3
2 4 5
3 8 7
4 11 7.5
5 14 8
6 18 9
7 21 12
8 24 14
9 25 17
10 28 19
We have

3
DATA PLOT
20
18 y = 0.5481x + 1.6545
16 R2 = 0.9214

14
12
10
8
6
4
2
0
0 5 10 15 20 25 30

In the “Scatterplots and Correlation” in the chart above, it can be explained that the
scatterplot of two variables that are strongly related tends to describe a line. Software
can be used to compute this line precisely.
• This process is called a regression analysis.
• A regression line is a straight line that describes how a response variable y
changes as an explanatory variable x change.
• A regression line can be used to predict the value of y for a given value of x.
• Regression analysis identifies a regression line.
• The regression line shows how much and in what direction the response variable
changes when the explanatory variable changes.
Least-Squares Regression Line
• The regression line is obtained by applying what is called the least-squares
computation procedure.
• On the graph presented earlier, it was explained that individual points are located
near the line, but very few points, if any, are located exactly on the line.
• To obtain the best approximation of the data, the line is placed in the location
where the distance from all the points to the line is minimal.
• In other words, to predict y, the regression line needs to be as close as possible to
the data points in the vertical (y) direction.
• Some of the points are above the line and some are below the line. If the
differences from these points to the points on the line are computed, some
differences will be positive while others will be negative. Direction is not
important, so the differences are squared (to eliminate the negatives).
• This method is called the least-squares computation procedure because it aims to
minimize the squared distances between each of the points and the line.
The line is represented by the straight-line equation below
𝑌 = 𝑎0 + 𝑎1 𝑋
This is called the regression line of Y on X. The subscripts a0 and a1 are determined from
equation 1, 2, 3 and 4 below
ƩY = 𝑎0 N + 𝑎1 ƩX (1)
ƩXY = 𝑎0 ƩX + 𝑎1 ƩX2 (2)

4
Solving equations 1 and 2 above simultaneously, provides solution for the coefficients a0
and a1. They can also be calculated from equations 3 and 4 below

(ƩY) (ƩX2 ) ― (ƩX)(ƩXY)

𝑎0 = (3)
𝑁ƩX2 ― (ƩX)2
𝑁(ƩXY) ― (ƩX)(ƩY)
𝑎1 = (4)
𝑁ƩX2 ― (ƩX)2
Also:
𝑋 = 𝑏0 + 𝑏1 𝑌
Is called the regression line of X on Y. The subscripts b0 and b1 are determined from
equation 5, 6, 7 and 8 below
ƩX = 𝑏0 N + 𝑏1 ƩY (5)
ƩXY = 𝑏0 ƩX + 𝑏1 ƩY 2 (6)
Solving equations 5 and 6 above simultaneously, provides solution for the coefficients a
b0 and b1. They can also be calculated from equations 7 and 8 below

(ƩX) (ƩY2 ) ― (ƩX)(ƩXY)

𝑏0 = (7)
𝑁ƩY2 ― (ƩY)2
𝑁(ƩXY) ― (ƩX)(ƩY)
𝑏1 = (8)
𝑁ƩY2 ― (ƩY)2
CORRELATION/ANALYSIS
• Correlation is a statistical tool that helps to measure and analyze the degree of
relationship between two variables
- As X is increasing, Y is increasing
- As X is decreasing, Y is decreasing
- Example: demand and price; bad macro-economic policy and hardship; water
consumption and environmental temperature; study time and grades; etc
• Correlation analysis deals with the association between two or more variables
- As X is increasing, Y is decreasing
- As X is decreasing, Y is increasing
- Example: supply and price; phone fiddling and GPA, etc
-
Correlation denotes the interdependency among the variables for correlating two
phenomena. It is essential that the two phenomena should have cause-effect relationship,
and if such relationship does not exist then the two phenomena cannot be correlated. If
two variables vary in such a way that movement in one are accompanied by movement
in other, these variables are called cause and effect relationship.
Correlation can be:
Positive Correlation: The correlation is said to be positive correlation if the values of two
variables changing with same direction
Negative Correlation: The correlation is said to be negative correlation when the values
of
variables change with opposite direction.
Single Correlation: Under simple correlation problem there are only two variables are
studied
Multiple correlation: Under Multiple Correlation three or more than three variables are
studied. Y = f (a, b, c, d)

5
- Partial correlation: analysis recognizes more than two variables but considers
only two variables keeping the other constant.
- Total correlation: is based on all the relevant variables, which is normally not
feasible.
Linear correlation: Correlation is said to be linear when the amount of change in one
variable tends to bear a constant ratio to the amount of change in the other. The graph of
the variables having a linear relationship will form a straight line.
Non-Linear correlation: The correlation would be nonlinear if the amount of change in
one variable does not bear a constant ratio to the amount of change in the other variable.
CORRELATION COEFFICIENT
Coefficient of Correlation denoted by ‘r’: The coefficient of correlation ‘r’ measures the
degree of linear relationship between two variables say x and y. This was developed by
Karl Pearson. The correlation coefficient, r, ranges between 0 and 1 or between 0 and -1.
The closer it is to unity, the stronger the correlation and vice versa. The correlation
coefficient, r, is calculated from the equation below
𝑁(ƩXY) ― (ƩX) (ƩY)
𝑟=
[𝑁ƩX2 ― (ƩX)2 ] [𝑁ƩY2 ― (ƩY)2 ]
COEFFICIENT OF DETERMINATION
The convenient way of interpreting the value of correlation coefficient is to use of square
of
coefficient of correlation which is called Coefficient of Determination (R2).
Suppose: r = 0.9, then R2 = 0.81, meaning that 81% of the variation in the dependent
variable
has been explained by the independent variable.
The maximum value of R2 is 1 because it is possible to explain all of the variation in y but
it
is not possible to explain more than all of it.
Coefficient of Determination = Explained variation / Total variation
Spearman’s Rank Coefficient of Correlation
When statistical series in which the variables under study are not capable of quantitative
measurement but can be arranged in serial order, in such situation Pearson’s correlation
coefficient cannot be used in such case. Spearman’s Rank correlation can be used.
• This method is useful where we can give the ranks and not the actual data
(qualitative term)
• This method is to use where the initial data in the form of ranks
6ƩD2
𝑟𝑠 = 1 ―
𝑁(𝑁 2 ― 1)
R = Rank correlation coefficient
D = Difference of rank between paired item in two series.
N = Total number of observations.
The value of rank correlation coefficient, R ranges from -1 to 1
• If R = 1, then there is complete agreement in the order of the ranks and the ranks
are in the same direction
• If R = -1, then there is complete agreement in the order of the ranks and the ranks
are in the opposite direction
• R = 0, then there is no correlation
Example 1
Fiddling with Phone 20 5 8 10 13 7 13 5 25 14

6
GPA 2.35 3.80 3.50 2.75 3.25 3.40 2.90 3.50 2.25 2.75

The table above shows the survey on the connection between Grade Point average (GPA)
and the numbers of hours spend fiddling with phones per week conducted on 10
University of Uyo students
(a) Determine the regression line (equation) that represents expected effect of phone
fiddling on GPA
(b) Determine the Correlation and Rank correlation coefficients of the relationship
and make your inference on the relationship
Solution
Completing the statistics table gives
S/N X Y X2 Y2 XY
1 20 2.35 400 5.5225 47
2 5 3.8 25 14.44 19
3 8 3.5 64 12.25 28
4 10 2.75 100 7.5625 27.5
5 13 3.25 169 10.5625 42.25
6 7 3.4 49 11.56 23.8
7 13 2.9 169 8.41 37.7
8 5 3.5 25 12.25 17.5
9 25 2.25 625 5.0625 56.25
10 14 2.74 196 7.5076 38.36
Ʃ10 120 30.44 1822 95.1276 337.36
The regression line (equation) that represents expected effect of phone fiddling on GPA
is the same as Y on X, with the straight-line equation
𝑌 = 𝑎0 + 𝑎1 𝑋
Where
(ƩY) (ƩX2 ) ― (ƩX)(ƩXY) (30.44) (1822) ― (120)(337.36) 14978.48
𝑎0 = = = = 3.93
𝑁ƩX2 ― (ƩX)2 10(1822) ― (120)2 3820
𝑁(ƩXY) ― (ƩX)(ƩY) 10(337.36) ― (120)(30.44) ―279.20
𝑎1 = = = = ―0.073
𝑁ƩX2 ― (ƩX)2 10(1822) ― (120)2 3820
𝑌 = 3.93 ― 0.073𝑋
The Correlation coefficient, r is given by
𝑁(ƩXY) ― (ƩX) (ƩY) 10(337.36) ― (120) (30.44)
𝑟= =
[𝑁ƩX2 ― (ƩX)2 ] [𝑁ƩY2 ― (ƩY)2 ] [10(1822) ― (120)2 ] [10(95.13) ― (30.44)2 ]
(3373.6) ― (3652.80) ―279.20
= = ―0.97
[3820][10(24.71] 307.23
The Rank Correlation coefficient, rs is given by
S/ Ranked Ranked
FWP Position Rank GPA Position Rank D D2
N FWP GPA

1 5 1 1.5 2.25 1 1 9 2 7 49

2 5 2 1.5 2.35 2 2 1.5 10 -8.5 72.25

3 7 3 3 2.75 3 3.5 4 8.5 -4.5 20.25

7
4 8 4 4 2.75 4 3.5 5 3.5 1.5 2.25

5 10 5 5 2.90 5 5 6.5 6 0.5 0.25

6 13 6 6.5 3.25 6 6 3 7 -4 16

7 13 7 6.5 3.40 7 7 6.5 5 1.5 2.25

8 14 8 8 3.50 8 8.5 1.5 8.5 -7 49

9 20 9 8 3.50 9 8.5 10 1 9 81

10 25 10 10 3.80 10 10 8 3.5 4.5 20.25

Ʃ312.5

6ƩD2 6(312.5) 1875

𝑟𝑠 = 1 ― 2
= =1― = 1 ― 1.89 = ―0.89
𝑁(𝑁 ― 1) 10(100 ― 1) 990
The high correlation coefficients indicates that the amount of time spent by students
fiddling on their phones has a negative impact /effect on their GPA

Example 2
The table below shows the final grade obtained in mathematics and physics by 10
students selected at random from a large group of students.
Mathematics (X) 75 80 93 65 87 71 98 68 84 77

Physics (Y) 82 78 86 72 91 80 95 72 89 74

(a)(i) Determine the straight-line equation of the relationship, using X as the independent
variable
(ii) What would be the expected grade in physics for a student who scored 75 in
mathematics?
(iii) If a student scored 95 in physics, what grade is expected in mathematics?

S/N X Y X2 Y2 XY
1 75 82 5625 6724 6150
2 80 78 6400 6084 6240
3 93 86 8649 7396 7998
4 65 72 4225 5184 4680
5 87 91 7569 8281 7917
6 71 80 5041 6400 5680
7 98 95 9604 9025 9310
8 68 72 4624 5184 4896
9 84 89 7056 7921 7476
10 77 74 5929 5476 5698
Ʃ 10 798 819 64722 67675 66045

𝑌 = 𝑎0 + 𝑎1 𝑋

8
Where
(ƩY) (ƩX2 ) ― (ƩX)(ƩXY) (819) (64722) ― (798)(66045)
𝑎0 = =
𝑁ƩX2 ― (ƩX)2 10(64722) ― (798)2
53007318 ― 52703910
=
647220 ― 636804
303408
= = 29.13
10416
𝑁(ƩXY) ― (ƩX)(ƩY) 10(66045) ― (798)(819) 660450 ― 653562
𝑎1 = = =
𝑁ƩX2 ― (ƩX)2 10(64722) ― (798)2 647220 ― 636804
6888
= = 0.66
10416
𝑌 = 29.13 + 0.66𝑋
(ii) The expected grade in physics for a student who scored 75 in mathematics:
𝑌 = 29.13 + 0.66(75) = 78.63
(iii) The expected mathematics grade of a student who scored 95 in physics:
𝑌 ― 29.13 95 ― 29.13
𝑋= = = 99.80
0.66 0.66
(b)(i) Determine the coefficient of correlation of the score relationship

The Correlation coefficient, r is given by

𝑁(ƩXY) ― (ƩX) (ƩY) 10(66045) ― (798)(819)
𝑟= =
[𝑁ƩX2 ― (ƩX)2 ] [𝑁ƩY2 ― (ƩY)2 ] [10(64722) ― (798)2 ] [10(67675) ― (819)2 ]
660450 ― 653562
[647220 ― 636804] [676750 ― 670761]
6888 6888
= = = 0.87
(10416)(5989) 7898.19
(ii) Calculate the coefficient of rank correlation of the relationship?

AX RX AY RY X ARX Y ARY D D2
65 1 72 1.5 75 4 82 2 2 4
68 2 72 1.5 80 6 78 4 2 4
71 3 74 3 93 9 86 7 2 4
75 4 78 4 65 1 72 1.5 -0.5 0.25
77 5 80 5 87 8 91 9 -1 1
80 6 82 6 71 3 80 5 -2 4
84 7 86 7 98 10 95 10 0 0
87 8 89 8 68 2 72 1.5 0.5 0.25
93 9 91 9 84 7 89 8 -1 1
98 10 95 10 77 5 74 3 2 4
22.5

6ƩD2 6(22.5) 135

𝑟𝑠 = 1 ― 2
= =1― = 1 ― 0.1364 = 0.86
𝑁(𝑁 ― 1) 10(100 ― 1) 990

Expectations and large sampling

Test hypothesis and quality and quality control
Elementary Sampling Theory

9
Sampling theory is a study of relationship existing between a population and sample data
drawn from the population. This is useful in estimating unknown population quantities
like mean and variance often called population parameters or simply parameters from a
knowledge of corresponding sample quantities like sample mean and variance often
called sample statistics or simply statistics. Sampling theory is used in determining
whether the observed differences between samples are due to chance variation or
whether they are really significant.
Basically, a study of inferences made concerning a population by using sample drawn
from it, together with the accuracy of such inferences by using probability theory, is called
statistical inference
Statistical Inference: is to draw conclusions about the Population on the basis of
information available in the sample which has been drawn from the population by a
random sampling technique/ procedure. There are two branches Statistical Inference
namely ESTIMATION & TESTING OF HYPOTHESIS.
Basic Definitions:
Population: Any collection of individuals under study is said to be Population (Universe).
The individuals are often called the members or the units of the population which may be
physical objects or measurements expressed numerically or otherwise.
Sample: A part or small section selected from the population is called a sample and
process of such selection is called sampling.
(The fundamental object of sampling is to get as much information as possible of the
whole universe by examining only a part of it. An attempt is thus made through sampling
to give the maximum information about parent universe with the minimum effort).
Parameters: Statistical measurements such as Mean, Variance, standard deviation, etc. of
the population are called parameters.
Hypothesis: is a statement given by an individual. Usually, it is required to make decisions
about populations on the basis of sample information. Such decisions are called Statistical
Decisions. In attempting to reach decisions it is often necessary to make assumption
about population involved. Such assumptions, which are not necessarily true, are called
statistical hypothesis.
Sampling Process
1. Random Samples and Random Numbers
In order that the conclusions of sampling theory and statistical inference be valid,
samples must be chosen so as to be a representative of a population.
A study of sampling methods and of the related problems that arises is called the Design
of the Experiment.
One method in which a representative sample may be obtained is by a process called
random sampling: which gives each member of the population an equal chance of been
included in the sample selection. One technique of obtaining random sample is to assign
numbers to each member of the population and write the numbers on pieces of paper,
place them in an urn, mix thoroughly and then draw numbers from the urn. Another
method is to use a table of random numbers specifically constructed for such purposes.
2. Sampling With Replacement and Without Replacement
If we draw a number from the urn, we have the choice of replacing or not replacing the
number into the urn before a second drawing. In the first case, the number can come up
again and again, whereas in the second it can only come up once.
Sampling where each member of the population may be chosen more than once is called
sampling with replacement. while

10
Sampling where each member of the population cannot be chosen more than once is
called sampling without replacement
Populations are either finite or infinite. If we draw 10 balls successively without
replacement from an urn containing 100 balls, we are sampling from a finite population.
However, if we toss a coin 50 times and count the number of heads, we are sampling from
an infinite population.
SAMPLING DISTRIBUTION
Consider all possible samples of size N that can be drawn from a given population (either
w or w/o replacement). For each sample, we compute a statistic (such as mean and
standard deviation) that will vary from sample to sample. In this manner we obtain a
distribution of the statistic that is called its sampling distribution.
If, for example, the particular statistic used is the sample mean, then the distribution is
called the sampling distribution of means. Similarly, we could have sampling
distributions of standard deviations, variance, median, proportions, etc
Sampling Distribution of Means
Supposed that all possible samples of size N that can be drawn from finite population of
size Np > N. If we denote the mean and standard deviation of the sampling distribution
of means by 𝜇𝑋 𝑎𝑛𝑑 𝜎𝑋 and the population mean and standard deviation by μ and σ
respectively,
𝜎 𝑁𝑃 ― 𝑁
𝜇𝑋 = 𝜇 𝑎𝑛𝑑 𝜎𝑋 =
𝑁 𝑁𝑃 ― 1
If the population is infinite of if sampling is with replacement, the above results reduce to
𝜎 𝑁𝑃 ― 𝑁
𝜇𝑋 = 𝜇 𝑎𝑛𝑑 𝜎𝑋 =
𝑁 𝑁𝑃 ― 1
Sampling Distribution of Proportions
Supposed that a population is finite and that the probability of occurrence of an event
(called success) is p, while the probability of non-occurrence of an event is q = p – 1. For
example, the population may be all tosses of a fair coin in which the probability of the
event ‘head’ is p = ½. Consider all possible samples of sizes N drawn from this population,
and for each sample determine the proportion P of successes. In the case of the coin, P
would be the proportion of head turning up in N tosses. We obtain a sampling distribution
of proportions whose mean μp and standard deviation σp are given by

𝑝𝑞 𝑝(1 ― 𝑝)
𝜇𝑝 = 𝜇 𝑎𝑛𝑑 𝜎𝑝 = =
𝑁 𝑁
Sampling Distribution of Difference of Sums
Supposed that we are given two populations. For each sample of size N1 drawn from the
first population, let us compute a statistic S1; this yields a sampling distribution for
statistic S1 whose mean and standard deviation we donate by μs1 and σs1, respectively.
Similarly, for each sample of size N1 drawn from the second population, let us compute a
statistic S2; this yields a sampling distribution for statistic S1 whose mean and standard
deviation we donate by μs2 and σs2, respectively, from all possible combinations of these
samples from the two populations we can obtain a distribution of differences, S1 – S2,
which is called sampling distribution of difference the statistics. The mean and standard
deviation of this sampling distribution, denoted respectively by μs1 - S2 and σs1-s2 are given
by

11
𝜇𝑆𝑆―𝑆𝑆 = 𝜇𝑆𝑆 ― 𝜇𝑆𝑆 𝑎𝑛𝑑 𝜎𝑆𝑆―𝑆𝑆 = 𝜎2𝑆𝑆 + 𝜎2𝑆𝑆
(provided that the samples chosen do not in any way depend on each other: that is, the
samples are independent)
If S1 and S2, are sample means from two populations; whose mean we denote by 𝑋1 𝑎𝑛𝑑
𝑋2
respectively, then the sample distribution of the differences of means is given for infinite
population with means and standard deviations (μ1, σ1) and (μ1, σ1), respectively by
𝜎12 𝜎22
𝜇𝑋𝑋―𝑋𝑋 = 𝜇𝑋𝑋 ― 𝜇𝑋𝑋 = 𝜇1 ― 𝜇2 𝑎𝑛𝑑 𝜎𝑋𝑋―𝑋𝑋 = 𝜎2𝑋𝑋 + 𝜎2𝑋𝑋 = +
𝑁1 𝑁2
STATISTICAL DECISION THEORY
Statistical Decisions
Most times in practice, we are required to make decisions about populations on the basis
of sample information. Such decisions are called statistical decisions.
Statistical Hypothesis or Test Hypothesis
Oftentimes, in an attempt to reach decisions, it is useful to make assumptions (or guesses)
about the population involved. Such assumptions, which may or may not be true, are
called statistical hypothesis. They are generally statements about the probability
distribution of the population.
we want to determine whether a claim is true or false. Such a claim is called a hypothesis.
• Null hypothesis: A specific hypothesis to be tested in an experiment. The null
hypothesis is usually labeled H0. A hypothesis which is tested under the
assumption that it is true is called a null hypothesis
• Alternative hypothesis: A hypothesis that is different from the null hypothesis,
which we usually want to show, is true (thereby showing that the null hypothesis
is false). The alternative hypothesis is usually labeled H1. (The hypothesis against
which we test the null hypothesis, is an alternative hypothesis).
• If the alternative involves showing that some value is greater than or less than a
number, there is some value c that separates the null hypothesis rejection region
from the fail to reject region. This value is known as the critical value.
The null hypothesis is tested through the following procedure:
1. Determine the null hypothesis and an alternative hypothesis.
2. Pick an appropriate sample.
3. Use measurements from the sample to determine the likelihood of the null
hypothesis.
Test: Test is a rule through which we test the null hypothesis against the given alternative
hypothesis.

Tests of Significance: Procedure which enables us to decide, on the basis of sample

information whether to accept or reject the hypothesis or to determine whether observed
sampling results differ significantly from expected results are called tests of significance,
rules of decisions or tests of hypothesis. For example, if 20 tosses of a coin yield 16 heads,
we would be inclined to reject the hypothesis that the coin is fair, although it might be
conceivable that we might be wrong

Two Types of Errors in Testing of a Hypothesis:

12
• Type I error: If the null hypothesis is true but the sample mean is such that the null
hypothesis is rejected, a Type I error occurs. The probability that such an error will
occur is the, α risk.
• Type II error: If the null hypothesis is false but the sample mean is such that the null
hypothesis cannot be rejected, a Type II error occurs. The probability that such an
error will occur is called the, β risk.
The wrong decision of rejecting a null hypothesis H0, when it is true is called the Type I
Error i.e. we reject H0 when it is true. Similarly, the wrong decision of accepting the null
hypothesis H0 when it is not true is called the Type II Error i.e. we accept H0 when H1 is
true.
Level of Significance: The probability level below which we reject the hypothesis is called
level of significance. The levels of significance usually employed in testing of hypothesis
are 5% and 1%.

P-Value: The p-value is the level of marginal significance within a statistical hypothesis
test representing the probability of the occurrence of a given event. The p-value is used
as an alternative to rejection points to provide the smallest level of significance at which
the null hypothesis would be rejected.
Critical Region and Acceptance Region: A region (corresponding to a statistic t) is called
the sample space. The part of sample space which amounts to rejection of null hypothesis
H0, is called critical region or region of rejection.
Probability Distribution Function
Probability Distribution refers to the function that gives the probability of all possible
values of a random variable. It shows how the probabilities are assigned to the different
possible values of the random variable.
Common types of probability distributions Include:
• Binomial Distribution.
• Bernoulli Distribution.
• Normal Distribution.
• Geometric Distribution.
Probability Distribution Graph
The graph that plots the Probability Distribution Functions are called the Probability
Distribution graphs. These graphs help us to visualize the probability distribution around
a random variable and help us to easily find the required solution.
The sum of all the probabilities in any discrete distribution is one and for a continuous
distribution of random variables the area under the graph is equal to 1. The distribution
graph of the continuous distribution function is added below, where X (the random
variable) lies between a and b: Pr(a < X < b). It is made using the Probability Density
Function.

13
Example 1: Let a pair of a fair dice be tossed and let X denote the some of the point
obtained. Plot the graph of P(X) against X
Solution:
X 2 3 4 5 6 7 8 9 10 11 12

P(X) 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36

p(X)0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
0 2 4 6 8 10 12 14
X

P(X) is called a probability density function or simply density function, and when such
function is given, we say that a continuous probability distribution for X has been defined.
The variable X is often called a continuous random variable
Example 2: The number of old people living in houses on a randomly selected city block
is described by the following probability distribution.
No. of Adults X 3 4 5 6
Probability P(X) 0.5 0.25 0.1 ?

What is the probability that 6 or more old people live in a randomly selected house?
Solution:
Sum of all the p(probability) is equal to 1
Probability that six or more old peoples live in a house
= 1 ― (0.50 + 0.25 + 0.10) = 1 ― 0.85 = 0.15
Thus, probability that six or more old peoples live in a house is equal to 0.15

14
THE BINOMIAL DISTRIBUTION
If p is the probability that an even will happen in any single trial (called the probability of
success) and q = 1 – p is the probability that it will fail to happen in any single trial (called
the probability of failure), then the probability that the event will happen exactly X times
N trials (that is, X successes and N – X failures will occur) is given by
𝑁 𝑁!
𝑝(𝑋) = 𝑋 𝑝 𝑋 𝑞 𝑁―𝑋 = 𝑝 𝑋 𝑞 𝑁―𝑋 where X = 0, 1, 2, …., N; N! = N(N – 1)(N – 2)
𝑋!(𝑁 ― 𝑋)!
…. 1; and 0! = 1.
Example 1: What is the probability of getting exactly 2 heads in 6 tosses of a fair coin?
Solution: Using the binomial distribution formular, N = 6; X = 2, p = ½ and q = ½
2 6―2 2 6―2 6
6 1 1 6! 1 1 6×5×4×3×2 1 15
𝑝(𝑋) = 2 = = = Example 2:
2 2 2!4! 2 2 (2)(4 × 3 × 2) 2 64
What is the probability of getting at least 4 heads in 6 tosses of a fair coin?
Solution: Using the binomial distribution formular, N = 6; X = 4, 5, 6, p = ½ and q = ½
4 6―4 5 6―5 6 6―6
6 1 1 6 1 1 6 1 1
𝑝(𝑋) = 4 + 5 + 6
2 2 2 2 2 2
15 6 1 11
= + + = Alternative formular for the binomial distribution is given as
64 64 64 64
𝑁 𝑁
(𝑝 + 𝑞) 𝑁 = 𝑞 𝑁 + 1 𝑞 𝑁―1 𝑝 + 2 𝑞 𝑁―2 𝑝 2 + … + 𝑝 𝑁
𝑁 𝑁
Where 1, 1 , 2 …. 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑏𝑖𝑛𝑜𝑚𝑖𝑎𝑙 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑠
Some properties of the binomial distribution are
Mean: μ = Np; Variance σ2 = Npq and standard deviation 𝜎 = 𝑁𝑝𝑞
Example 3: In 100 tosses of a fair coin the mean number of heads is μ = Np = 100(½) =
50; the variance σ2 = Npq = 100(½)(½) = 25 and the standard deviation 𝜎 = 25 = 5

NORMAL DISTRIBUTION
A normal distribution is the continuous probability distribution with a probability
density function that gives you a symmetrical bell curve. Simply put, it is a plot of the
probability function of a variable that has maximum data concentrated around one point
and a few points taper off symmetrically towards two opposite ends.

• Continuous Probability Distribution: A probability distribution where the random

variable, X, can take any given value, e.g., amount of rainfall. You can record the
rainfall received at a certain time as 9 inches. But this is not an exact value. The
actual value can be 9.001234 inches or an infinite quantity of other numbers.
There is no definitive way to plot a point in this case, and instead, you use a
continuous value.

1. Probability Density Function: An expression that is used to define the range of

values that a continuous random variable can take.

A normal distribution has a probability distribution that is centred around the mean. This
means that the distribution has more data around the mean. The data distribution
decreases as you move away from the center. The resulting curve is symmetrical about
the mean and forms a bell-shaped distribution.

15
Consider the below graph which shows the probability distribution of heights in a class:

The normal distribution is the most widely known and used of all distributions. Because
the
normal distribution approximates many natural phenomena so well, it has developed
into a
standard of reference for many probability problems.

16
The Normal Distribution Curve
A normal distribution, sometimes called the bell curve, is a distribution that occurs
naturally in many situations. For example, the bell curve is seen in general tests like the
SAT and GRE. The bulk of students will score the average (C), while smaller numbers of
students will score a B or D. An even smaller percentage of students score an F or an A.
This creates a distribution that resembles a bell (hence the nickname). The bell curve is
symmetrical. Half of the data will fall to the left of the mean; half will fall to the right.

Characteristics of the Normal distribution

1. Symmetric, bell shaped

2. Continuous for all values of X between -∞ and ∞ so that each conceivable interval
of real numbers has a probability other than zero.
―∞ ≤ 𝑋 ≤ ∞
3. It is a function of two parameters, mean, μ and standard deviation, σ. Note that the
normal distribution is actually a family of distributions, since μ and σ determine
the shape of the distribution.
4. It is normal density function. The rule for a normal density function is
5. Large sampling with N > 30, called large samples are approximately normal
1 ―((―𝜇𝜇 2
𝑓(𝑥:𝜇, 𝜎 2 ) = 𝑒 2𝜎 2
2𝜋𝜎 2
𝜎 2 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

Using the equation for a normal density function could be difficult and tedious. Thus, the
equation below called the standard score is used
𝑥―𝜇 𝑑𝑎𝑡𝑎 𝑝𝑜𝑖𝑛𝑡 ― 𝑚𝑒𝑎𝑛
𝑧= →𝑧 = 𝑠𝑐𝑜𝑟𝑒 =
𝜎 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛

17
The z-score is used to tell how far from the mean the data point is. You calculate it using
the mean and standard deviation, so it can also be said that the Z-Score is how many
standard deviations below or above the mean the data is.

The z-score is used to standardize your normal distribution. Using the z-score, you can
convert each data point into a value in terms of mean and standard deviation, effectively
converting the graph into a scaled-down version. The z-score tells you how far each data
point is from the mean in steps of standard deviation. So, with the mean and standard
deviation, you can plot all points on our graph.

Examples

1. A summary of the daily travel time of a person commuting from work is given below.
The values are in minutes. Calculate the Mean, Standard Deviation, and Z-Score.

The mean is the average of all values:

Now, subtract the mean from each data point and find the variance and standard
deviation.

18
The Z-Score tells us where the data point falls relative to other points. The z-score will
tell you how far away from the mean a point is in steps of your standard deviation. Now,
calculate the z-score for each point

The negative values tell you that the point lies below the mean and positive values imply
that the point is above the mean. Multiplying each value with the standard deviation will
give the difference between mean and datapoint
𝑋 = 𝑍𝜎 + 𝜇

19
We now consult the Normal Distribution table for standard scores
2. The sample of scores in an Engineering course exam were 20, 15, 26, 32, 18, 28, 35, 14,
26, 22, and 17 out of 60 marks. The lecturer felt the exam must have been really hard and
decided to standardize all the scores so that failure would be from scores 1 standard
deviation (1σ) below the mean. What are the scores of the students who would fail if the
standard deviation is 6.6?
Equation for standard score is given as:
𝑥―𝑥
𝑧=
𝑠𝑑

Finding the mean score:

20 + 15 + 26 + 32 + 18 + 28 + 35 + 14 + 26 + 22 + 17 253
𝑥 = = = 23
11 11
20 ― 23
𝑧1 = = ―0.45
6.6
15 ― 23
𝑧2 = = ―1.21
6.6
26 ― 23
𝑧3 = = 0.45
6.6
32 ― 23
𝑧4 = = 1.36
6.6
18 ― 23
𝑧5 = = ―0.76
6.6
28 ― 23
𝑧6 = = 0.76
6.6
35 ― 23
𝑧7 = = 1.82
6.6
14 ― 23
𝑧8 = = ―1.36
6.6
26 ― 23
𝑧9 = = 0.45
6.6
22 ― 23
𝑧11 = = ―0.15
6.6
17 ― 23
𝑧11 = = ―0.91
6.6
From the normal curve, the scores that failed are 14 and 15.

3. Find and sketch the areas under the normal distribution of

• between z = 0.82 and z = 1.95

z-value for 0.82 = 0.2939
z-value for 1.95 = 0.4744
The area between them is α = 0.4744 – 0.2939 = 0.1805
The sketch
• to the right of z = -1.30
z-value for -1.30 = 0.4032
The area to the right of z = -1.30 is α = 0.5000 + 0.4032 = 0.9032
The sketch
• to the right of z = 2.05 or to the left of z = -1.44

20
z-value for 2.05 = 0.4798
The area to the right of z = 2.05 is α = 0.5000 – 0.4798 = 0.0202
z-value for -1.44 = 0.4251
The area to the left of z = -1.44 is α = 0.5000 – 0.4251 = 0.0749
To the right of z = 2.05 or to the left of z = -1.44 is 0.0202 + 0.0749 = 0.0951
4. A survey conducted on 1000 middle school students on time spent on social media
weekly has a normal distribution with mean of 20 hours and standard deviation of 5.0
hours. Determine the percentage and number of students
• Who spend less than 25 hour per week
𝑥 ― 𝑥 25 ― 20 5
𝑧= = = = 1.00
𝑠𝑑 5 5
z-value for 1.00 = 0.3413
Lest than 25 hours per week is α = 0.5000 + 0.3413 = 0.8413
Representing 84.13% of the population
Thus, the number of students = 0.8413 × 1000 = 841 students
• Who spend over 30 hour per week
𝑥 ― 𝑥 30 ― 20 10
𝑧= = = = 2.00
𝑠𝑑 5 5
z-value for 2.00 = 0.4772
Over 30 hours per week is α = 0.5000 – 0.4772 = 0.0228
Representing 2.3% of the population
Thus, the number of students = 0.023 × 1000 = 23 students
• What is the remaining population?
1000 – (841 + 23) = 136 students
5. If the inner diameter of nuts produced by a company is normally distributed with mean
0.5 inches and standard deviation 0.005 inches. The nuts are considered defective if their
inner diameter is less than 0.490 inches or greater than 0.510 inches. Find the percentage
defective nuts. If the company produces 20 million nuts per annum, and the cost of
producing one nut is ₦12.50k. Calculate the annual losses due to defective nuts if all
defective nuts are counted as complete waste.
𝑥―𝑥
𝑧=
𝑠𝑑
0.490 ― 0.500 ―0.01
𝑓𝑜𝑟 𝑡ℎ𝑒 𝑙𝑜𝑤𝑒𝑟 𝑖𝑛𝑛𝑒𝑟 𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟, 𝑧 = = = ―2.00
0.005 0.005
0.510 ― 0.500 0.01
𝑓𝑜𝑟 𝑡ℎ𝑒 𝑙𝑎𝑟𝑔𝑒𝑟 𝑖𝑛𝑛𝑒𝑟 𝑑𝑖𝑎𝑚𝑒𝑡𝑒𝑟, 𝑧 = = = 2.00
0.005 0.005
z value at 2.00 = 0.4772
To the left = to the right of z = 2, with an area α = 2(0.5000 – 0.4772) = 0.0456
The percentage defective nuts are 0.0465 × 100 = 4.56%
The annual loss incurred as a result of defective nuts = 20,000,000 × 0.0456 × ₦12.50
= ₦11,400,000
Testing of Normal Distributions
For a Two-Tailed Test
• The acceptance region for α = 0.05 significant level lies between – 1.96 < z <
1.96 (0.95 acceptance region) and (0.025×2 critical/rejection region)
• The acceptance region for α = 0.01 significant level lies between – 2.58 < z <
2.58 (0.99 acceptance region) and (0.005×2 critical/rejection region)
For a One-Tailed Test

21
• The acceptance region for α = 0.05 significant level lies between – 1.645 < z <
1.645 (0.95 acceptance region) and (0.025×2 critical/rejection region)
• The acceptance region for α = 0.01 significant level lies between – 2.33 < z <
2.33 (0.99 acceptance region) and (0.005×2 critical/rejection region)

22
SMALL SAMPLING THEORY

Student’s t Test
Student’s t-test, is a statistical method of testing hypotheses about the mean of a small
sample drawn from a normally distributed population when the population standard
deviation is unknown

Examples
1. Using the Student’s t – test; infer at 0.05 significant level if a washer producing machine
(for 0.050 inches washer thickness) is in proper working condition if the thickness of
randomly selected samples of washers produced lately by the machine are (0.050, 0.053,
0.051, 0.048, 0.050, 0.058, 0.046, 0.058, 0.056 and 0.060) with standard deviation of
0.003 inches.
The mean washer thickness, μ
0.050 + 0.053 + 0.051 + 0.048 + 0.050 + 0.058 + 0.046 + 0.058 + 0.056 + 0.060
𝜇=
10
= 0.053
𝑥―𝜇
𝑡= 𝑁―1
𝑠𝑑
0.053 ― 0.050 0.003
= 10 ― 1 = 9 = 9 = 3.00
0.003 0.003
A two-tailed test, at 0.05 s. f. level, 1 – 0.05/2 = 0.025, that is –t.975 to t.975
v = N – 1 = 10 – 1 = 9
Thus 9t.975 = 2.26, which is less than 3.00, therefore we reject the hypothesis that the
machine is in a good working condition at α = 0.05.
At 0.01 s. f. level, 1 – 0.01/2 = 0.005, that is –t.995 to t.995
v = N – 1 = 10 – 1 = 9
Thus 9t.995 = 3.25, which is greater than 3.00, therefore we accept the hypothesis that the
machine is in a good working condition at α = 0.01.

2. A machine used by a company to produce plates of thickness 0.050 in, was checked for
efficiency. A sample of 15 plates chosen at random gave a mean thickness of 0.053 inches
and standard deviation of 0.003 inches. Test the hypothesis that the machine is efficient
at 0.05 and 0.01 significant levels respectively.
𝑥―𝜇
𝑡= 𝑁―1
𝑠𝑑
0.053 ― 0.050 0.003
= 10 ― 1 = 9 = 9 = 3.00
0.003 0.003
A two-tailed test, at 0.05 s. f. level, 1 – 0.05/2 = 0.025, that is –t.975 to t.975
v = N – 1 = 15 – 1 = 14
Thus 9t.975 = 2.14, which is less than 3.00, therefore we reject the hypothesis that the
machine is in a good working condition at α = 0.05.
At 0.01 s. f. level, 1 – 0.01/2 = 0.005, that is –t.995 to t.995
v = N – 1 = 15 – 1 = 14
Thus 9t.995 = 2.98, which is less than 3.00, we also reject the hypothesis that the machine
is in a good working condition at α = 0.01

23
Analysis of Variance (ANOVA)
Examples

A company wishes to purchase one of five different machines: A, B, C, D or E. In an

experiment designed to test whether there is a difference in the machines’ performance,
each of the five experienced operators works on each of the machines for equal period of
times. The table below shows the number of units produced per machine. Test by ANOVA
the hypothesis that there is no significance difference between the machines at α = 0.05
and α = 0.01 respectively.

Solution
Manipulating the given table as shown below
𝑇𝑦𝑝𝑒 𝐵 𝑇𝑦𝑝𝑒 𝐶 𝑇𝑦𝑝𝑒 𝐷
𝑇𝑦𝑝𝑒 𝐴 ― 60; ― 60; ― 60; ― 60
2 3 4
Reduces the statistics values to a uniform table as shown below

Number of units produced Tj T2j

A 8 12 17 -18 -7 12 144

B 12 -7 3 -7 -12 -11 121

Machine

C 0 22 4 15 12 53 2809

D -12 1 -3 4 -10 -20 400

E 4 5 10 8 -7 20 400

Ʃ X2j = 2658 54 3874

Variations Degree of Mean F calculated F table Decision

freedom Squares

V = Ʃ X2j - (T)2/ab ab – 1 = 5.5 SB/SW = α0.05 No sd

– 1 =24 164.55/94.16
= 2625 – (54)/(5)(5) = 1.75 =
2.87
= 2541.36

VB = 1/bƩX2j- (T)2/ab a – 1 = 5 – 1 SB =
=4 VB/DF =
= 1/5 × 3874 – 658.2/4 = α0.01 No sd
(54)/(5)(5) 164.55
=
= 658.16 4.43

Vw = V – VB a(b - 1) = SW =
5(5 - 1) = VW/DF =
= 2541.36 - 658.16 20 1883.2/20
=1883.2 = 94.16

24
Chi Square (χ2)
The Chi-Square test is a statistical procedure for determining the difference between
observed and expected data. This test can also be used to determine whether it correlates
to the categorical variables in our data. It helps to find out whether a difference between
two categorical variables is due to chance or a relationship between them
A chi-square test is a nonparametric statistical test that is used to compare observed and
expected results. The goal of this test is to identify whether a disparity between actual
and predicted data is due to chance or to a link between the variables under
consideration. As a result, the chi-square test is an ideal choice for aiding in our
understanding and interpretation of the connection between our two categorical
variables
Examples
1. Out of 256 visual artists surveyed to find out their zodiac sign, the results were: Aries
(29), Taurus (24), Gemini (22), Cancer (19), Leo (21), Virgo (18), Libra (19), Scorpio
(20), Sagittarius (23), Capricorn (18), Aquarius (20), and Pisces (23). Using the Chi-
square at 0.01 and 0.05 respectively; test the hypothesis that zodiac signs are evenly
distributed across visual artists.

Expected
Zodiac Observed (O) (O-E) (O-E)2 (O-E)2/E
(E)
Aries 29 21 8 64 3.0476
Taurus 24 21 3 9 0.4286
Gemini 22 21 1 1 0.0476
Cancer 19 21 -2 4 0.1905
Leo 21 21 0 0 0.0000
Virgo 18 21 -3 9 0.4286
Libra 19 21 -2 4 0.1905
Scorpio 20 21 -1 1 0.0476
Sagittarius 23 21 2 4 0.1905
Capricorn 18 21 -3 9 0.4286
Aquarius 20 21 -1 1 0.0476
Pisces 23 21 2 4 0.1905
5.2381
Degree of freedom: v = k – 1; 12 – 1
At 0.01 significance level, χ2 = 24.7 and 19.7 at 0.05 significance level. Since both values
are greater than the calculated, we accept the hypothesis that the zodiac signs are evenly
distributed across visual artists.
2. The table below shows the outcome of 500 tosses of a pair of dice. Using the Chi-square,
test if the outcomes are fair enough at α = 0.05 and 0.01 respectively
2 3 4 5 6 7 8 9 10 11 12

15 35 49 58 65 76 72 60 35 29 6

Completing the table for Chi-square

Sum e e e o o-e (o-e)2 (o-e)2/e
2 1 0.02778 13.8889 15 1.1111 1.234568 0.0889

25
3 2 0.05556 27.7778 35 7.2222 52.16049 1.8778
4 3 0.08333 41.6667 49 7.3333 53.77778 1.2907
5 4 0.11111 55.5556 58 2.4444 5.975309 0.1076
6 5 0.13889 69.4444 65 -4.4444 19.75309 0.2844
7 6 0.16667 83.3333 76 -7.3333 53.77778 0.6453
8 5 0.13889 69.4444 72 2.5556 6.530864 0.0940
9 4 0.11111 55.5556 60 4.4444 19.75309 0.3556
10 3 0.08333 41.6667 35 -6.6667 44.44444 1.0667
11 2 0.05556 27.7778 29 1.2222 1.493827 0.0538
12 1 0.02778 13.8889 6 -7.8889 62.23457 4.4809
10.3456
v = k – 1 = 11 – 1 = 10
At α = 0.05, and v = 10, χ2 = 18.3, and at 0.01, α = 0.05, and v = 10, χ2 = 23.2. Since
both values are greater than the calculated, we accept the hypothesis that the outcome
is fair at both α = 0.05, and 0.01.

Lecture Notes in MAED Stat Part 1
100% (1)
Lecture Notes in MAED Stat Part 1
15 pages
Class Note II-1-1
No ratings yet
Class Note II-1-1
30 pages
Statistics Analysis With Software Application
No ratings yet
Statistics Analysis With Software Application
22 pages
What Is Statistics
No ratings yet
What Is Statistics
5 pages
Updated -BCSC 108 MAY 24 Introduction to Statistics
No ratings yet
Updated -BCSC 108 MAY 24 Introduction to Statistics
69 pages
Data Analysis
No ratings yet
Data Analysis
10 pages
Stats Bio Supp. 1
No ratings yet
Stats Bio Supp. 1
11 pages
CSE 323 (1) Statistics in Education
No ratings yet
CSE 323 (1) Statistics in Education
31 pages
Chapter 1 - NATURE OF STATISTICS
No ratings yet
Chapter 1 - NATURE OF STATISTICS
14 pages
Basic Concepts and Terminologies
No ratings yet
Basic Concepts and Terminologies
37 pages
Introduction To Biostatistics
No ratings yet
Introduction To Biostatistics
44 pages
Statistical Analysis
No ratings yet
Statistical Analysis
26 pages
Statistics Theory Notes
No ratings yet
Statistics Theory Notes
21 pages
7a1a96f31c748dbb0763fa4427dffe7b
No ratings yet
7a1a96f31c748dbb0763fa4427dffe7b
66 pages
Prob and Stat_unit1
No ratings yet
Prob and Stat_unit1
67 pages
Dewi Prita Statistic 4D
No ratings yet
Dewi Prita Statistic 4D
7 pages
Chapter-1 Data analysis
No ratings yet
Chapter-1 Data analysis
14 pages
Lecture 1 - Introduction To Statistics
No ratings yet
Lecture 1 - Introduction To Statistics
3 pages
Not1
No ratings yet
Not1
8 pages
Statistical Instruments and References Writing in Research
No ratings yet
Statistical Instruments and References Writing in Research
36 pages
Lesson 1 Basic Concepts of Statistics
No ratings yet
Lesson 1 Basic Concepts of Statistics
9 pages
BCSC 108 MAY 24 Introduction to Statistics
No ratings yet
BCSC 108 MAY 24 Introduction to Statistics
63 pages
Chapter One Definition of Statistics
No ratings yet
Chapter One Definition of Statistics
17 pages
From Weakest To Strongest in Terms of Statistical Inference)
No ratings yet
From Weakest To Strongest in Terms of Statistical Inference)
1 page
Chapter 1 (2)
No ratings yet
Chapter 1 (2)
22 pages
EDA MODULE 1 Nature of Statistics
No ratings yet
EDA MODULE 1 Nature of Statistics
12 pages
01 Introduction
No ratings yet
01 Introduction
50 pages
Statistik 1
No ratings yet
Statistik 1
17 pages
Chapter1 Introduction To Statistics
No ratings yet
Chapter1 Introduction To Statistics
27 pages
Note for Int to Statistics
No ratings yet
Note for Int to Statistics
24 pages
Statistical Analysis with Software Application
100% (1)
Statistical Analysis with Software Application
6 pages
Definition of Statistics
No ratings yet
Definition of Statistics
4 pages
Statistics
No ratings yet
Statistics
61 pages
Introduction To Statistics
No ratings yet
Introduction To Statistics
7 pages
Intro To Biostatistics Lecture BSMLS 3-A&B
No ratings yet
Intro To Biostatistics Lecture BSMLS 3-A&B
74 pages
Planning Data Analysis Choosing Statistical Tool (1)
No ratings yet
Planning Data Analysis Choosing Statistical Tool (1)
27 pages
1 Descriptive Part
No ratings yet
1 Descriptive Part
13 pages
Ghon Stat Chapter1
No ratings yet
Ghon Stat Chapter1
39 pages
Quantitative Data Analysis
No ratings yet
Quantitative Data Analysis
2 pages
Chapter 1. Introductory Notions Meaning of Statistics
No ratings yet
Chapter 1. Introductory Notions Meaning of Statistics
4 pages
Lecture # 1 Introduction and Scope of Statistics
No ratings yet
Lecture # 1 Introduction and Scope of Statistics
34 pages
Descriptive Statistics: Atistics
No ratings yet
Descriptive Statistics: Atistics
49 pages
Lesson 01
No ratings yet
Lesson 01
6 pages
STA132 Complete Note
No ratings yet
STA132 Complete Note
110 pages
Chapter One: 1.1definition and Classification of Statistics
No ratings yet
Chapter One: 1.1definition and Classification of Statistics
22 pages
RES1N Prefinal Module 4
No ratings yet
RES1N Prefinal Module 4
3 pages
Basic Concept in Statistics-Biostat
No ratings yet
Basic Concept in Statistics-Biostat
29 pages
Applied Statistics Basic Concepts
No ratings yet
Applied Statistics Basic Concepts
28 pages
Intro - Stat
No ratings yet
Intro - Stat
29 pages
STATISTICS Powrepoint 2
No ratings yet
STATISTICS Powrepoint 2
82 pages
Stastics ppt
No ratings yet
Stastics ppt
226 pages
Lec 1 - Data, Tables and Graphs
No ratings yet
Lec 1 - Data, Tables and Graphs
18 pages
Std121-121e - Business Statistics Course Booklet 2023
No ratings yet
Std121-121e - Business Statistics Course Booklet 2023
82 pages
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
100% (1)
DATA PROCESSING, ANALYSING AND INTERPRETATION Ipmi
120 pages
Math-101-Statistics
No ratings yet
Math-101-Statistics
100 pages
ESM 507 statistical analysis B
No ratings yet
ESM 507 statistical analysis B
3 pages
STAT 203 Topic 1
No ratings yet
STAT 203 Topic 1
33 pages
Lecture 1 Introduction to statistics
No ratings yet
Lecture 1 Introduction to statistics
15 pages
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
From Everand
Machine Learning - A Complete Exploration of Highly Advanced Machine Learning Concepts, Best Practices and Techniques: 4
Peter Bradley
No ratings yet
Statistical Analysis and Visualization
From Everand
Statistical Analysis and Visualization
Mohit Chatterjee
No ratings yet
How To Read A Scientific Research Paper
100% (1)
How To Read A Scientific Research Paper
13 pages
Jurimetrics
No ratings yet
Jurimetrics
19 pages
Xat 2007 Solved Question Paper
No ratings yet
Xat 2007 Solved Question Paper
31 pages
Excel 2016 for Educational and Psychological Statistics A Guide to Solving Practical Problems 1st Edition Thomas J. Quirk (Auth.) instant download
100% (3)
Excel 2016 for Educational and Psychological Statistics A Guide to Solving Practical Problems 1st Edition Thomas J. Quirk (Auth.) instant download
60 pages
2019 Campitelli JournalOfExpertise RetiringStatisticalSignificance
No ratings yet
2019 Campitelli JournalOfExpertise RetiringStatisticalSignificance
8 pages
chapter 1 Introduction to Research
No ratings yet
chapter 1 Introduction to Research
31 pages
Academy of Management The Academy of Management Review
No ratings yet
Academy of Management The Academy of Management Review
11 pages
Research Method
No ratings yet
Research Method
22 pages
What Is Research?: He Good Researcher
No ratings yet
What Is Research?: He Good Researcher
12 pages
Thesis Chapter4
No ratings yet
Thesis Chapter4
31 pages
Module 1
No ratings yet
Module 1
25 pages
Strategies For Creative PROBLEM SOLVING
100% (5)
Strategies For Creative PROBLEM SOLVING
53 pages
Harambee University Faculty of Business and Economics Master of Business Administration (MBA)
No ratings yet
Harambee University Faculty of Business and Economics Master of Business Administration (MBA)
7 pages
=a Life Span, Life Space Approach to Career Development
No ratings yet
=a Life Span, Life Space Approach to Career Development
17 pages
Impact of Extracurricular Activities On Employees Productivity - by Nabeel A. Khan
100% (1)
Impact of Extracurricular Activities On Employees Productivity - by Nabeel A. Khan
15 pages
School-Based Management in The Operation
No ratings yet
School-Based Management in The Operation
7 pages
Theory, Building Blocks, Types, Conceptual Framework, Conceptual & Schematic Model
83% (6)
Theory, Building Blocks, Types, Conceptual Framework, Conceptual & Schematic Model
2 pages
Thesis Topics With Hypothesis
100% (3)
Thesis Topics With Hypothesis
5 pages
hypothesis 练习答案
No ratings yet
hypothesis 练习答案
13 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
72 pages
Science7 Q1 W1 D1
No ratings yet
Science7 Q1 W1 D1
2 pages
Carnap Logical Foundations of Probability
100% (2)
Carnap Logical Foundations of Probability
365 pages
G. Yukl: Influence Tactics and Objectives in Upward, Downward and Lateral
No ratings yet
G. Yukl: Influence Tactics and Objectives in Upward, Downward and Lateral
9 pages
Research Method FD
No ratings yet
Research Method FD
77 pages
Action Research
No ratings yet
Action Research
15 pages
Research q3 m1
No ratings yet
Research q3 m1
13 pages
Theoretical and Conceptual Frameworks With Audio
No ratings yet
Theoretical and Conceptual Frameworks With Audio
12 pages
Scientific Method of Philosophizing
No ratings yet
Scientific Method of Philosophizing
33 pages
Methods and Principles of Statistical Analysis: 2.1 Recommended Textbooks On Statistics
No ratings yet
Methods and Principles of Statistical Analysis: 2.1 Recommended Textbooks On Statistics
18 pages
Awareness of People Towards Green Marketing and Its Impact On Environment Management
No ratings yet
Awareness of People Towards Green Marketing and Its Impact On Environment Management
12 pages

Engineering Math Class Note II-1

Uploaded by

Engineering Math Class Note II-1

Uploaded by

STATISTICS

Statistics can be defined in many ways as stated below

1. The study of data is called STATISTICS

Statistical analysis means investigating trends, patterns, and relationships using

1. Design the data collection process.

REGRESSION AND CORRELATION

(ƩY) (ƩX2 ) ― (ƩX)(ƩXY)

(ƩX) (ƩY2 ) ― (ƩX)(ƩXY)

2 5 2 1.5 2.35 2 2 1.5 10 -8.5 72.25

3 7 3 3 2.75 3 3.5 4 8.5 -4.5 20.25

5 10 5 5 2.90 5 5 6.5 6 0.5 0.25

7 13 7 6.5 3.40 7 7 6.5 5 1.5 2.25

8 14 8 8 3.50 8 8.5 1.5 8.5 -7 49

10 25 10 10 3.80 10 10 8 3.5 4.5 20.25

6ƩD2 6(312.5) 1875

The Correlation coefficient, r is given by

6ƩD2 6(22.5) 135

Expectations and large sampling

Tests of Significance: Procedure which enables us to decide, on the basis of sample

Two Types of Errors in Testing of a Hypothesis:

• Continuous Probability Distribution: A probability distribution where the random

1. Probability Density Function: An expression that is used to define the range of

Characteristics of the Normal distribution

1. Symmetric, bell shaped

The mean is the average of all values:

Finding the mean score:

3. Find and sketch the areas under the normal distribution of

• between z = 0.82 and z = 1.95

A company wishes to purchase one of five different machines: A, B, C, D or E. In an

Number of units produced Tj T2j

B 12 -7 3 -7 -12 -11 121

D -12 1 -3 4 -10 -20 400

Ʃ X2j = 2658 54 3874

Variations Degree of Mean F calculated F table Decision

V = Ʃ X2j - (T)2/ab ab – 1 = 5.5 SB/SW = α0.05 No sd

Completing the table for Chi-square

You might also like