Lecture Note On Statistics For Physical
Lecture Note On Statistics For Physical
ON
BY
1|Page
INTRODUCTION
In the modern world of information and communication technology, the
importance of statistics is very well recognised by all the disciplines. Statistics
has originated as a science of statehood and found applications slowly and
steadily in Agriculture, Economics, Commerce, Biology, Medicine, Industry,
planning, education and so on. As of today, there is no other human walk of life,
where statistics cannot be applied.
Statistics is concerned with the scientific method of collecting, organizing,
summarizing, presenting and analyzing statistical information (data) as well as
drawing valid conclusion on the basis of such analysis. It could be simply
defined as the ╉science of data╊. Thus, statistics uses facts or numerical data,
assembled, classified and tabulated so as to present significant information
about a given subject. Statistic is a science of understanding data and making
decisions in the face of randomness.
The study of statistics is therefore essential for sound reasoning, precise
judgment and objective decision in the face of up- to- date accurate and reliable
data. Thus many researchers, educationalists, business men and government
agencies at the national, state or local levels rely on data to answer operations
and programs. Statistics is usually divided into two categories, which is not
mutually elution namely: Descriptive statistics and inferential statistics.
DESCRIPTIVE STATISTICS
This is the act of summarizing and given a descriptive account of numerical
information in form of reports, charts and diagrams. The goal of descriptive
statistics is to gain information from collected data. It begins with collection of
data by either counting or measurement in an inquiry. It involves the summary
of specific aspect of the data, such as averages values and measure of dispersion
(spread). Suitable graphs, diagrams and charts are then used to gain
understanding and clear interpretation of the phenomenon under investigation
2|Page
keeping firmly in mind where the data comes from. Normally, a descriptive
statistics should:
i. be single – valued
ii. be algebraically tractable
iii. consider every observed value.
INFERENTIAL STATISTICS
This is the act of making deductive statement about a population from the
quantities computed from its representative sample. It is a process of making
inference or generalizing about the population under certain conditions and
assumptions. Statistical inference involves the processes of estimation of
parameters and hypothesis testing.
Statistics in Engineering
Statistics also plays an important role in Engineering. For example, such
topics as the study of heat transfer through insulating materials per unit time,
performance guarantee testing programs production control, inventory control,
standardization of fits and tolerances of machine parts, job analysis of technical
personnel, studies involving the fatigue of metals (endurance properties),
corrosion studies, quality control, reliability analysis and many other specialized
3|Page
problems in research and development make great use of probabilistic and
statistical methods.
DISADVANTAGES
Cost of data collection is high
Time consuming
There may larger range of non response
4|Page
2. Secondary data: These are data obtained from publication, newspapers,
and annual reports. They are usually summarized data used for purpose
other than the intended one. These could be obtain from the following:
(i) Publication e.g. extract from publications
(ii) Research/Media organization
(iii) Educational institutions
ADVANTAGES
The outcome is timely
The information gathered more quickly
It is less expensive to gather.
DISADVANTAGES
Most time information are suppressed when working with
secondary data
The information may not be reliable
DIRECT OBSERVATION
Observational methods are used mostly in scientific enquiry where data are
observed directly from controlled experiment. It is used more in the natural
5|Page
sciences through laboratory works than in social sciences. But this is very useful
studying small communities and institutions.
INTERVIEWING
In this method, the person collecting the data is called the interviewer goes to
ask the person (interviewee) direct questions. The interviewer has to go to the
interviewees personally to collect the information required verbally. This makes
it different from the next method called questionnaire method.
QUESTIONNAIRE
A set of questions or statement is assembled to get information on a variable (or
a set of variable). The entire package of questions or statement is called a
questionnaire. Human beings usually are required to respond to the questions
or statements on the questionnaire. Copies of the questionnaire can be
administered personally by its user or sent to people by post. Both interviewing
and questionnaire methods are used in the social sciences where human
population is mostly involved.
6|Page
PRESENTATION OF DATA
When raw data are collected, they are organized numerically by
distributing them into classes or categories in order to determine the number of
individuals belonging to each class. Most cases, it is necessary to present data in
tables, charts and diagrams in order to have a clear understanding of the data,
and to illustrate the relationship existing between the variables being examined.
FREQUENCY TABLE
This is a tabular arrangement of data into various classes together with
their corresponding frequencies.
4. Determine the numbers of observations falling into each class interval i.e.
find the class frequencies.
NOTE: With advent of computers, all these steps can be accomplishes easily.
7|Page
SOME BASIC DEFINITIONS
Variable: This is a characteristic of a population which can take different values.
Basically, we have two types, namely: continuous variable and discrete variable.
A continuous variable is a variable which may take all values within a given
range. Its values are obtained by measurements e.g. height, volume, time, exam
score etc.
A discrete variable is one whose value change by steps. Its value may be
obtained by counting. It normally takes integer values e.g. number of cars,
number of chairs.
Class interval: This is a sub-division of the total range of values which a
(continuous) variable may take. It is a symbol defining a class E.g. 0-9, 10-19 etc.
there are three types of class interval, namely: Exclusive, inclusive and open-end
classes method.
Exclusive method:
When the class intervals are so fixed that the upper limit of one class is the
lower limit of the next class; it is known as the exclusive method of
classification. E.g. Let some expenditures of some families be as follows:
0 – 1000, 1000 – 2000, etc. It is clear that the exclusive method ensures
continuity of data as much as the upper limit of one class is the lower limit of the
next class. In the above example, there are so families whose expenditure is
between 0 and 999.99. A family whose expenditure is 1000 would be included in
the class interval
1000-2000.
Inclusive method:
In this method, the overlapping of the class intervals is avoided. Both the lower
and upper limits are included in the class interval. This type of classification may
be used for a grouped frequency distribution for discrete variable like members
in a family, number of workers in a factory etc., where the variable may take
only integral values. It cannot be used with fractional values like age, height,
weight
8|Page
ADEOSUN S.A www.crescent-university.edu.ng
etc. In case of continuous variables, the exclusive method should be used. The
inclusive method should be used in case of discrete variable.
Class limit: it represents the end points of a class interval. {Lower class limit &
Upper class limit}. A class interval which has neither upper class limit nor lower
class limit indicated is called an open class interval e.g. ╉less than ╊, ’にの and above╊
Class boundaries: The point of demarcation between a class interval and the
next class interval is called boundary. For example, the class boundary of 10-19
is
9.5 – 19.5
9|Page
ADEOSUN S.A www.crescent-university.edu.ng
Solution:
(a) Range (R) = 73 − 47 = 26
No of classes 倦 = 券 = 50 = 7.07 ≈ 7
Frequency Table
47 − 50
Mark Tally frequency
7
51-54 7
55 -58 7
59 – 62 8
63 – 66 11
67 – 70 6
血 = 50
71 – 74 4
Example 2: The following data represent the ages (in years) of people living in a
housing estate in Abeokuta.
18 31 30 6 16 17 18 43 2 8 32 33 9 18 33 19 21 13 13 14
14 6 52 45 61 23 26 15 14 15 14 27 36 19 37 11 12 11
20 12 39 20 40 69 63 29 64 27 15 28.
Present the above data in a frequency table showing the following columns;
class interval, class boundary, class mark (mid-point), tally, frequency and
cumulative frequency in that order.
Solution:
Range (R)= 69 − 2 = 67
No of classes 倦 = 券 = 50 = 7.07 ≈ 7.00
10 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 1
Below are the data of weights of 40students women randomly selected in Ogun
state. Prepare a table showing the following columns; class interval, frequency,
class boundary, class mark, and cumulative frequency.
96 84 75 80 64 105 87 62 105 101 108 106 110 64 105 117
103 76 93 75 110 88 97 69 94 117 99 114 88 60 98 77
96 96 91 73 82 81 91 84
Use your table to answer the following question
i. How many women weight between 71 and 90?
ii. How many women weight more than 80?
iii. What is the probability that a woman selected at random from Ogun
state would weight more than 90?
11 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
MEASURES OF LOCATION
These are measures of the centre of a distribution. They are single values
that give a description of the data. They are also referred to as measure of
central tendency. Some of them are arithmetic mean, geometric mean, harmonic
mean, mode, and median.
捲 1 + 捲 2 + 捲 3 + 橋 + 捲券 =
券
捲件
隙 = 券 券
件=1
Solution: 隙 =
捲件
券
件=1 券
=
16+20+19+21+18+20+17+22+20+17
10
= = 19 検結欠堅嫌.
190
10
If the numbers 捲 1,捲 2,…,捲券 剣潔潔憲堅 血 1,血 2,血 3, … 血券 times respectively, the
隙 = =
血 1 捲 1 +血 2 捲 2+橋+血券 捲券
券
血捲
血件 捲件 (or for short.)
血 1+血 2+橋+血券 券件=1
件=1 血
券
= =
層×匝 + 惣×捜 + 想×6 + 匝×
Solution 1+3+4+2
= = 5.7 .
57
10
12 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Waiting time(in 1.5 – 1.9 2.0 – 2.4 2.5 – 2.9 3.0 – 3.4 3.5 – 3.9 4.0 – 4.4
mins)
No. of customers 3 10 18 10 7 2
Find the average waiting time of the customers.
Solution:
Waiting (in min) No of customers Class mark mid-value(X) 血捲
3
1.5 – 1.9
10 1.7 5.1
2.0 – 2.4 18
2.2 22
10
2.5 – 2.9
7 2.7 48.6
3.0 – 3.4 2
3.2 32
3.5 – 3.9
3.7 25.9
4.0 – 4.4
4.2 8.4
血 = 50 血捲 = 142
隙 =
血捲
=
142
血
= 2.84
50
13 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Therefore, 隙 =
=
血捲
券 血(畦+穴)
券
= +
血畦 血穴
券 券
= +
畦 血 血穴
=畦+ since 血 = 券.
券 券
血穴
Pension in N 25 30 35 40 45
No of person 7 5 6 4 3
= 35 +
券
−45
25
= 35 − 1.8
= 33.2
14 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
隙 =畦+
血穴
血
= 2.7 +
7
50
= 2.7 + 0.14
= 2.84
NOTE: It is always easier to select the class mark with the longest frequency as
the assumed mean.
ADVANTAGE OF MEAN
The mean is an average that considers all the observations in the data set. It is
single and easy to compute and it is the most widely used average.
DISAVANTAGE OF MEAN
Its value is greatly affected by the extremely too large or too small observation.
15 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
券 件=1 捲 件=1 捲
件 件
Solution:
H.M =
5
= 4.112 .
= = 4107
5
12
1 1 1 1 1 107
+ + + +
2 4 8 11 4 80
Note:
(i) Calculation takes into account every
value (ii)Extreme values have least effect
(iii) The formula breaks down when ╉o╊ is one of the observations.
5%, 8%, 12%, 25% and 34%. What was the average rate of inflation per year?
17 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Solution:
G.M = 5
1.05 × 1.08 × 1.12 × 1.25 × 1.34
= 2.127384
5
= 1.16
∴ Average rate of inflation is 16%
Note: (1) Calculate takes into account every value.
ゅにょ )t cannot be computed when ╉╊ is on of the observation.
THE MEDIAN
This is the value of the variable that divides a distribution into two equal
parts when the values are arranged in order of magnitude. If there are 券 (odd)
observation, the median 隙 is the center of observation in the ordered list. The
location of the median is 隙 =
券+1
th item.
2
But if 券 is even, the median 隙 is the average of the two middle observations in the
ordered list.
+隙 券
建h +1 建 h
=
隙 券
i.e. 隙
2 2
2
Example 1: The values of a random variable 捲 are given as 8, 5, 9, 12, 10, 6 and 4.
Find the median.
The median, 隙 = 隙 建h
券 +1
=隙4 建h
=8
Example 2: The value 0f a random variable 捲 are given as
15, 15, 17, 19, 21, 22, 25, and 28. Find the median.
Solution: 券 is odd.
Median, 隙 = 隙 +隙
+1
券 建月 券 建月
2 2
=
隙 4+ 隙 5
2
=
19+21
2
= 20
隙 =詣1+
券
2
−系血決
拳
軽 = 血 = Total frequency
系血決 = Cumulative frequency before the median class
血兼 = Frequency of the median class.
拳 = Class size or width.
Example3: The table below shows the height of 70 men randomly selected at
Sango Ota.
Height 118-126 127-135 136-144 145-153 154-162 163-171 172-180
No of rods 8 10 14 18 9 7 4
Compute the median.
19 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Solution
HeightFrequency 118 – 126Cumulative frequency 8
127 – 135 8 18
136 – 144 10 32
145 – 153 14 50
154 – 162 18 59
163 – 171 9 66
172 – 180 7 70
4
70
券=
70 = 30. The sum of first three classes frequency is 32 which therefore means
2 2
that the median lies in the fourth class and this is the median class. Then
詣 1 = 144.5, 券 = 70, 潔血決 = 32, 拳 = 9
券
2
− 潔血
隙 =詣 +
決
拳血
1
兼
= 144.5 + ×9
35−32
18
= 144.5 +
3×9
18
20 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
THE MODE
The mode is the value of the data which occurs most frequently. A set of
data may have no, one, two or more modes. A distribution is said to be uni-
model, bimodal and multimodal if it has one, two and more than two modes
respectively. E .g: The mode of scores 2, 5, 2, 6, 7 is 2.
Calculation of mode from grouped data
From a grouped frequency distribution, the mode can be obtained from the
formula.
Mode, 隙 = 詣兼剣 +
∆1
拳
∆1+∆2
Mode, 隙 = 詣兼剣 +
∆1
拳
∆1+∆2
= 20.5 + 10
14
14+8
= 20.5 + 10
14
22
= 20.5 + 0.64 10
= 20.5 + 6.4
= 26.9
21 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 2
1. Find the mean, median and mode of the following observations: 5,
6,10,15,22,16,6,10,6.
2. The six numbers 4, 9,8,7,4 and Y, have mean of 7. Find the value of Y.
3. From the data below
Class 21 – 23 24 – 26 27 – 29 30 – 32 33 – 35 36 – 38 37 – 41
Frequency 2 5 8 9 7 3 1
22 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
MEASURES OF PARTITION
From the previous section, we’ve seen that the median is an average that
divides a distribution into two equal parts. So also these are other quantity that
divides a set of data (in an array) into different equal parts. Such data must have
been arranged in order of magnitude. Some of the partition values are: the
quartile, deciles and percentiles.
THE QUARTILES
Quartiles divide a set of data in an array into four equal parts.
For ungrouped data, the distribution is first arranged in ascending order of
magnitude.
Then
First Quartiles: 芸 1
= 4
建月
軽+1
=2 建月 = 兼結穴件欠券
Second Quartile: 芸 軽+1
2 4
Third Quartile: 芸 = 3 建月 member of the distribution
軽+1
3 4
血圏件
Where
件 = The quality in reference
詣圏件= Lower class boundary of the class counting the quartile
軽 = Total frequency
系血圏件 = Cumulative frequency before the 芸件 class
血圏件 = The frequency of the 芸件 class
拳 = Class size of the 芸件 class.
23 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
DECILES
The values of the variable that divide the frequency of the distribution
into ten equal parts are known as deciles and are denoted by 経 1, 経 2, … , 経 9.
the fifth deciles is the median.
1 10
券+1
経2=2
10
建月兼結兼決結堅剣血建月結穴件嫌建堅件決憲建件剣券
券+1
経9=9
10
建月兼結兼決結堅剣血建月結穴件嫌建堅件決憲建件剣券
経件 = 詣経件 + 件 = 1, 2, … , 9
件軽
10
−系血経件
拳
繋経件
Where 件 = 経結潔件健結件券堅結血結堅結券潔結
詣経件 = 健剣拳結堅潔健欠嫌嫌決剣憲券穴欠堅検剣血建月結潔健欠嫌嫌潔剣憲券建件券訣建月結穴結潔件健結軽 = 劇剣建欠健 血堅結圏憲結券潔検
系血経件 = 潔憲兼憲健欠建件懸結血堅結圏憲結券潔検憲喧建剣建月結健剣拳決剣憲券穴欠堅検剣血建月結経 件 潔健欠嫌嫌繋経件 = 建月結血堅結圏憲結券潔検剣血 建月結経 件 潔健欠嫌嫌
拳 = 系健欠嫌嫌嫌件権結 剣血建月結経 件 潔健欠嫌嫌.
PERCENTILE
The values of the variable that divide the frequency of the distribution
into hundred equal parts are known as percentiles and are generally
24 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
券+1
鶏1= 建月 member of the distribution
100
2券+1
鶏2= 建月 member of the distribution
100
99 券 + 1
鶏 99 = 建月 member of the distribution
100
鶏件 = 詣喧件
件軽
×拳 件 = 1, … . , 99
− 系血
+
喧件
100 血喧件
Where
件 = 喧結堅潔結券建件健結件券堅結血結堅結券潔結
詣喧件 = 詣剣拳結堅潔健欠嫌嫌決剣憲券穴欠堅検剣血 建月結潔健欠嫌嫌潔剣憲券建件券訣建月結喧結堅潔結券建件健結軽 = 劇剣建欠健 血堅結圏憲結券潔検
系血喧件 = 系憲兼憲健欠建件懸結 血堅結圏憲結券潔検憲喧建剣建月結健剣拳結堅潔健欠嫌嫌決剣憲券穴欠堅検剣血建月結 鶏件 潔健欠嫌嫌血喧件 = 繋堅結圏憲結券潔検剣血 建月結喧 件 潔健欠嫌嫌.
Example: For the table below, find by calculation (using appropriate expression)
(i) Lower quartile, 芸 1
(ii) Upper Quartile, 芸 3
(iii) 6th Deciles, 経 6
(iv) 45th percentile of the following distribution
Mark 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99
Frequency 8 10 14 26 20 16 4 2
25 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Solution
Marks frequency cumulative frequency
20 – 8 8
29
10 18
30 –
39 14 32
40 – 26 32
49 20 58
50 – 16 78
59
4 98
60 –
69 2 100
70 – 100
79 −系血圏 1
件軽
80 –
89
90 –
1 × 100
= = 25, 系血圏 1 = 18, 血圏 1 = 14, 拳 = 10, 詣圏 1 = 39.5
件軽
4 4
25 − 18
芸 1 = 39.5
10
+ 14
= 44.5 3軽
3軽 3 × 100
= = 75, 詣圏 3 = 59.5, 系血圏 3 = 58, 繋圏 3 = 20, 拳 = 10
4 4
75 − 58
芸 3 = 59.5 + 10 = 68
20
(iii) 経 6 = 詣経 6 6軽
+
−系血経
10 6
拳
血経 6
6軽 6 × 100
= = 60, 詣経 6 = 59.5, 系血経 6 = 58, 血経 6 = 20, 拳 = 10
10 10
26 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
60 − 58
経 6 = 59.5 + 10 = 60.5
20
27 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
(iv) 鶏 45 = 詣喧 45 45 軽
+ 10 45
−系血喧
血喧 45
拳
45 軽 45 × 100
=
100 = 45, 詣喧 45 = 49.5, 系血喧 45 = 32, 血喧 45 = 26, 拳 = 10
100
鶏 45 = 49.5 + 45−32 10
26
= 49.5 + 5
= 54.5
28 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
MEASURES OF DISPERSION
Dispersion or variation is degree of scatter or variation of individual value of
a variable about the central value such as the median or the mean. These include
range, mean deviation, semi-interquartile range, variance, standard deviation
and coefficient of variation.
THE RANGE
This is the simplest method of measuring dispersions. It is the difference
between the largest and the smallest value in a set of data. It is commonly used
in statistical quality control. However, the range may fail to discriminate if the
distributions are of different types.
Coefficient of Range =
詣−鯨
詣+鯨
警経捲 =
券 血 隙件 −隙
件=1
券
件=1 血
29 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
(i) Range
(ii)Mean
(iii) Mean deviation from the mean
(iv) Mean deviation from the median.
Solution:
(i) Range = 60 − 41 = 19
1
(ii) Mean (捲 )= 券 捲
券 件=1 件
=
47+45+56+60+41+54
6
= = 50.5
303
=
47−50.5 + 45−50.5 + 56−50.5 + 60−50.5 + 41−50.8 + 54−50.5
6
=
−3.5 + −5.5 + 5.5 + 9.5 + −9.5 + 3.5
6
=
37
6
= 6.17
= = = 50.5
隙 3+隙 4 47+54
2 2
=
47−50.5 + 45−50.5 + 56−50.5 +橋+ 54−50.5
警経捲
6
= 6.17
Example2: The table below shown the frequency distribution of the scores of 42
30 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
31 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Scores 0–9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69
No of 2 5 8 12 9 5 1
student
Find the mean deviation from the mean for the data.
Solution:
Classes midpoint 捲 血 血捲 捲−捲 捲−捲 血捲−捲
0–9 4.5 2 9 -29.52 29.52 59.04
10 – 19 14.5 5 72.5 -19.52 19.52 97.60
20 – 29 24.5 8 196 -9.52 9.52 76.16
30 – 39 34.5 12 414 0.48 0.48 5.76
40 – 49 44.5 9 400.5 10.48 10.48 94.32
50 – 59 54.5 5 272.5 20.48 20.48 102.4
60 – 69 64.5 1 64.5 30.48 30.48 30.48
42 1429 465.76
隙 = = ≈ 34.02
1429
券
血件 捲件
4
件=1
券
件=1 血
警経捲 =
=
465.70
券
血 捲件 −捲
42
件=1
券
= 11.09
件=1 血
32 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
捲件 − 捲 2
拳月結堅結 捲 =
券 券 捲件
購=
件 =1
件 =1
券 券
2
(Alternatively, 購 =
券 隙2
件=1 件 件=1
券 隙件
券
− 券
. )
For a grouped data
The standard deviation is computed using the formula
券 血 隙−隙 2 2
購=
件 =1 件 件
券 血隙 血捲
購=
件 =1 1 件 剣堅 − 件 件
券
件=1 件血 血件 血件
MERIT
(i) It is well defined and uses all observations in the distribution.
(ii) It has wider application in other statistical technique like skewness,
correlation, and quality control e.t.c
DEMERIT
(i) It cannot be used for computing the dispersion of two or more
distributions given in different unit.
THE VARIANCE
The variance of a set of observations is defined as the square of the
standard deviation and is thus given by 購 2
33 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
COEFFICIENT OF VARIATION/DISPERSION
This is a dimension less quantity that measures the relative variation
between two servers observed in different units. The coefficients of variation
are obtained by dividing the standard deviation by the mean and multiply it by
100. Symbolically
系撃 = × 100 %
購
隙 = = 8.4
5+6+9+10+12
SOLUTION
5
購2= 券
件=1 捲件 −捲 2
軽
=
5−8.4 2+ 6−8.4 2+ 9−8.4 2+ 10−8.4 2+ 12−8.4 2
5
=
11.56+5.76+0.36+2.56+12.96
5
= 33.2
5
= 6.64
∴購= 購2
= 6.64
= 2.58
Hence C.V = × 100
2.58
8.4
= 30.71%
34 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Ages(in years) 50 – 54 55 – 59 60 – 64 65 – 69 70 – 74 75 – 79 80 – 84
Frequency 1 2 10 12 18 25 9
SOLUTION
隙 = = = 72.06
血捲 5549
血 77
購=
血 捲−捲 2
血
=
3674.68
77
= 47.7231
= 6.9082
C.V = × 100 %
購
= × 100
7.14
隙
72
= 9.917%
35 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 3
The data below represents the scores by 150 applicants in an achievement text
for the post of Botanist in a large company:
Scores 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69 70 – 79 80 – 89 90 – 99
Frequency 1 6 9 31 42 32 17 10 2
Estimate
(i) The mean score
(ii) The median score
(iii) The modal score
(iv) Standard deviation
(v) Semi – interquartile range
(vi) D4
(vii) P26
(viii) coefficient of variation
36 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
PROBABILITY
Probability Theory is a mathematical model of uncertainty. We shall
briefly consider the following terminologies:
Experiment: This can be described as an act performed.
Trial: Is an act performed.
Outcome: Is a result realized from the trial.
Sample Space: This is the list of all the possible outcomes of an experiment.
Each of the outcome in a sample space is called sample point.
Event: This is a subset of a sample space of an experiment.
E.g. When a coin is tossed twice, Sample Space (鯨) = 茎茎, 茎劇, 劇茎, 劇劇 . Define the event
畦 as: at least one head is observed. We have 畦 = 茎茎, 茎劇, 劇茎 .
Axioms of Probability
Let 鯨 be a sample space, let be the class of events, and let P be a real-
valued function defined on . Then P is called a probability function, and 鶏(畦) is
called the probability of the event 畦 if the following axioms hold:
(I) For every event 畦, 0 判 鶏(畦) 判 1.
(II) 鶏 鯨 = 1.
(III) If 畦 and 稽 are mutually exclusive events, then 鶏 畦∪ 稽 = 鶏 畦 + 鶏(稽).
37 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
which gives frequency distribution and relative frequency for all 2000 families
living in a small town. Consider 隙 to be the number of heads obtained in 3 tosses
of a coin. Sample Space 鯨 = 茎茎茎, 茎茎劇, 茎劇茎, 茎劇劇, 劇茎茎, 劇茎劇, 劇劇茎, 劇劇劇 . Secondly this
variable is random in the sense that the value that will occur in a given instance
cannot be predicted in certainty, we can make a list of elementary outcomes as
associated with 隙.
Numerical value of 隙 as an event Comparative of the event
[隙 = 0] [劇劇劇]
[隙 = 1] 茎劇劇 , 劇茎劇 , [劇劇茎]
[隙 = 2] 茎劇茎 , 劇茎茎 , [茎茎劇]
[隙 = 3] [茎茎茎]
38 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
25 for
捲 = 0, … ,5 is a probability distribution function.
Solution:
血0 = ,血1 = ,血2 = ,血3 = ,血4 = ,血5 =
2 3 4 1 6 7
.
25 25 25 5 25 25
2 3 1 6 7
血(捲) = 4 + + +
+ +
25 25 5 25 25
25
=7
2
25
=
1.08
∴ 血(捲) is not a probability distribution function.
A function which values 血(捲) define all over a set of real number is called
probability density function if and only if
鶏欠判捲判決 =
決
欠
血(捲) 穴捲.
Hence, we define the following:
A function can said to be probability density function of a continuous
variable 捲 if its value 血(捲) satisfies the following
(1)血 捲 半 0 ∀ 捲
(2) 血(捲) 穴捲 = 1; −∞ < 捲 < ∞.
決
for 捲>0
Example: The probability density function (p.d.f) of the random variable 捲 is
given by 血 捲 =
, find the value of 倦 and 鶏(0.5 判 捲 判 1).
倦結 −3 捲
0 結健嫌結拳月結堅結
倦結−3 捲 穴捲 = 1
欠
∞
0
3
倦 −3 捲 ∞
⇒ − 結
39 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
0
=1
⇒ 0
− (1) = 1
− 倦
40 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
⇒ =1 ⇒ 倦 = 3.
倦
1
1 3 結 −3 捲 穴捲 = = 0.1733.
(ii)
0.5
−結−3 捲
0.
5
PP: Each of the following tables list contain values of 捲 and their probabilities.
Determine whether or not, each table represents a valid probability distribution.
鶏(捲) 鶏(捲)
(1) 捲 鶏(捲) (2) 捲 (3) 捲
0 .08 2 .25 7 .70
7 .34 8 .80
1 .11 4 .28 9 .20
2 .39 5 .13
Mathematical Expectation
The expected value of a discrete random variable having a distribution
function 鶏(捲) is 継 捲 = 捲捲 鶏(捲).
For a continuous random variable, 継 捲 =
∞
捲血(捲) 穴捲.
−
∞
Note: If 隙 is a discrete random variable with p.d.f 鶏(捲) and 訣(捲), and if 鶏(捲) is any
real value function, then expectation is
継訣捲 = 訣 捲 鶏(捲) (discrete r.v.)
継訣捲 =
捲
∞
訣 捲 鶏(捲) 穴捲 (continuous r.v.)
−
∞
購(捲) = 継 隙 − 航 2 .
For any random variable 隙, we have
(i) 継 欠捲 + 決 = 欠 継 捲 + 決
(ii) 撃 欠捲 + 決 = 欠 2 撃(捲) .
41 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Proof:
(i) 継 欠捲 + 決 = 欠捲 + 決 鶏(捲)
= 欠捲 鶏 捲 + 決鶏(捲)
捲
= 欠捲 鶏 捲 + 決 鶏 (捲)
= 欠 捲 鶏 捲 + 決 鶏(捲)
= 欠継 捲 + 決 (since 鶏(捲) = 1) ∎
(ii) 撃 欠捲 + 決 = 継 欠捲 + 決 − 継 欠捲 + 決 2
= 継 欠捲 + 決 − (欠継 捲 + 決) 2
= 継 欠捲 − 欠継(捲) 2
= 欠 2 継 捲 − 継(捲) 2
=欠2撃 捲 ∎
Example: Use the above result; compute the variance of 隙 as given in the table
below:
隙 0 1 2
鶏(捲) 0.1 0.5 0.4
42 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 4
(1)Suppose 隙 has probability density function given as
3 捲 2, 0 判 捲 判 1
繋 捲 = 0 , 結健嫌結拳月結堅結 .
Find the mean and variance of 捲.
(2)A lot of 12 television sets chosen at random are defective, if 3 of the sets
are chosen at random for shipment in hotel, how many defective set can
they expect?
(3)Certain coded measured of the pitch diameter of threads of a fitting have
4
the probability density 血 捲 = 講 1+捲 2
for 0 < 捲 < 1
0,
.
剣建月結堅拳件嫌結
=
0, 結健嫌結拳月結堅結
(4)If 隙 has the p.d.f 血 捲 . Find the expected value and
the variance of 捲.
BERNOULLI DISTRIBUTION
A random variable 捲 has a Bernoulli distribution if and only if its
probability distribution is given as 血 捲, 喧 = 喧捲 1 − 喧 券−捲 ; 捲 = 0,1. In this
context, 喧 may be probability of passing or failing an examination.
血 0, 喧 = 1 − 喧 {a coin 券 = 1}
血 1, 喧 = 喧.
43 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
BINOMIAL DISTRIBUTION
An experiment consisting of 券 repeated trials such that
(1)the trials are independent and identical
(2)each trial result in only one or two possible outcomes
(3)the probability of success 喧 remains constant
(4)the random variable of interest is the total number of success.
The binomial distribution is one of the widely used in statistics and it used
to find the probability that an outcome would occur 捲 times in 券 performances of
an experiment. For example, consider a random variable of flipping a coin 10
times. When a coin is toss, the probability of getting head is 喧 and that of tail
is 1 − 喧 = 圏.
A random variable 捲 has a binomial distribution and it is referred to as
binomial random variable if and only if its probability is given by
血 捲; 券, 喧 = 券
喧捲 1 − 喧 券−捲 , 捲 = 0,1, … , 券
Where 券 = Total number of trials
捲
喧 = Probability of success
1 − 喧 = Probability of failure
券− 捲 = Number of failure in 券 – trials
To find the probability of 捲 success in 券 trials, the only values we need are that of
券 and 喧.
Properties of Binomial Distribution
Mean 航 = 券喧
Variance 購 2 = 券喧圏 ; 圏=1−喧
Standard deviation 購 = 券喧圏
44 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Example: Observation over a long period of time has shown that a particular
sales man can make a sale on a single contact with the probability of 20%.
Suppose the same person contact four prospects,
(a) What is the probability that exactly 2 prospects purchase the product?
(b)What is the probability that at least 2 prospects purchase the product?
(c) What is the probability that all the prospects purchase the product?
(d)What is the expected value of the prospects that would purchase the
product?
= 0, 結健嫌結拳月結堅結
捲
4
血捲 = 0.2 捲 0.8 4−捲
捲
(a) 血 捲 = 2 = 4
0.2 2 0.8 4−2 = 0.1536
2
(b)血 捲 半 2 = 血 2 + 血 3 + 血(4) ( OR 1 − 血 捲 判 1 = 1 − {血 0 + 血(1)})
= 0.1536 + 4 0.2 3 0.8 4−3 + 4 0.2 4 0.8 4−4
3 4
45 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
POISSON DISTRIBUTION
When the size of the sample ( 券 ) is very large and the probability of
obtaining success in any one trial very small, then Poisson distribution is
adopted.
Given an interval of real numbers, assumed counts of occur at random
throughout interval, if the interval can be partition into sub interval of small
enough length such that
(1)The probability of more than one count sub interval is 0
(2)The probability of one count in a sub interval is the same for all sub
intervals and proportional to the length of the sub interval
(3)The count in each of the sub interval is independent of all other sub
intervals.
A random experiment of this type is called a Poisson Process. If the mean
number of count in an interval is 膏 > 0, the random variable 捲 that equals the
number of count in an interval has a Poisson distribution with parameter 膏 and
the probability density function is given by
血 捲; 膏 = ; 捲 = 0,1, … , 券.
結 −膏 膏 捲
Example: Flaws occur at random along the length of a thick, suppose that a
number of flaws follows a Poisson distribution with a mean flaw of 2.3 per mm.
determine probability of exactly 2 flaws in one mm of wire.
Solution: 血 捲 = 2 = = 0.265.
結 −2.3 2.3 2
2!
, 捲 = 0,1, … , 券
結 券喧 捲
捲!
46 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Hence we can apply the Poisson distribution to the binomial when 券 半 30 and
券喧 < 5.
3
0.03 3 0.97 117 = 0.215
血 捲 判 3 = 0.026 + 0.095 + 0.177 + 0.215 = 0.513.
(b) By Poisson distribution:
血 捲; 膏 =
, 膏 = 券喧 = 120 × 0.03 = 3.6
結 −膏 膏 捲
血0 = = 0.027
結 −3.6 3.6 0
捲!
0!
血1 = = 0.098
結 −3.6 3.6 1
1!
血2 = = 0.177
結 −3.6 3.6 2
2!
血3 = = 0.213
結 −3.6 3.6 3
3!
血 捲判3 =血 0 +血 1 +血 2 +血 3
47 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
48 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
= 0.515
Hence, Poisson distribution result is very close to the binomial distribution
result showing that it can be use to approximate binomial distribution in this
problem.
GEOMETRIC DISTRIBUTION
Consider a random experiment in which all the conditions of a binomial
distribution hold. However, instead of fixed number of trials, trials are
conducted until first success occurs. Hence by definition, in a series of
independent binomial trials with constant probability 喧 of success, let the
random variable 捲 denotes number of trials until first success. Then 捲 is said to
have a geometric distribution with parameter 喧 and given by
血 捲 = 喧 1 − 喧 捲−1 ; 捲 = 1,2, …
For Geometric distribution, the mean and variance are given by
航=
1
購2=
1−喧
喧2
喧
.
Examples:
(1)If the probability that a wave contain a large particles of contamination is
0.01, it assumes that the wave are independent, what is the probability
that exactly 125 waves need to be analyzed before a large particle is
detected?
Solution: Let 隙 denotes the number of samples analyzed until a large particle is
detected. Then 隙 is a geometric random variable with 喧 = 0.01. Hence, the
required probability is 血 捲 = 125 = 0.01 0.99 124 = 0.0029.
(2)Each sample of 券 has 10% of chance of containing a particular rare
molecule. Assume samples are independent with regard to the present of
rare molecule. Find the probability that in the next 18 samples, (a) exactly
2 containing rare molecule. (b) at least 4 sample.
Solution: Left as exercise
49 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
HYPERGEOMETRIC DISTRIBUTION
Suppose we have a relatively small quantity consisting of 軽 items of which
倦(= 軽鶏) are defective. If two items are samples sequentially then the outcome
for the second draw is very much influenced by what happened on the first
drawn provided that the first item drawn remain in the quantity. We need to
obtain a formula similar to that of binomial distribution, which applies to sample
without replacement.
A random variable 捲 is said to have a hypergeometric distribution if and
only if its probability density function is given by
血 捲; 券, 軽, 倦 = , 捲 判 券, 捲 = 0,1, … , 券, 券 − 捲 判 軽 − 倦.
倦 軽 −倦
捲 券 −捲
= 0 otherwise
券
撃捲 = 軽 2 軽−1
軽 券倦 軽−倦 軽−券
;
Examples:
(1)The random sample of 3 oranges is taking from a basket containing 12
oranges, if 4 of the oranges in the basket are bad, what is the probability
of getting (a) no bad oranges from the sample (b) more than 2 are bad
from the sample.
Solution: 喧 = = , 軽 = 12, 券 = 3, 倦 = 軽喧 = 4
4 1
12 3
4 8
血捲 = 捲 3−捲 , 捲 = 0,1,2,3
3
= 0 otherwise 4 8
4 8
(a) 血 0 = 0 3
= 0.25 (b) 血 捲 > 2 = 血 3 = 3 0
= 0.018.
12 12
3 3
50 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
(2)A batch of parts contain 100 parts from a local supplier of tubing and 200
parts from a supplier of tubing in the next state, if 4 parts are selected at
random without replacement, what is the probability that they are all
from the local supplier.
Solution: Let 隙 equals the number of parts in the sample from the local supplier,
then 隙 has a hypergeometric distribution and the required probability is
血(捲 = 4) consequently
100 200
月 4; 4,300,100 = 4 0
= 0.0119.
4
喧2
喧
51 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 5
(1)The probability that an experiment will succeed is 0.6 if the experiment
is repeated to 5 successive outcomes have occurred, what is the mean
and variance of number of repetition required?
(2)A high performance aircraft contains 3 identical computer, only one is
used to operates in aircraft, the other two are spares that can be
activated incase the primary system fails. During one hour of operations,
the probability of failure in the primary computer or any activated spare
system is 0.0005. Assume that each hour represent an identical trial.
(a) What is the expected value to failure of all the 3 computers?
(b)What is the probability that all the 3 computers fail within a 5
hour flight?
NORMAL DISTRIBUTION
The normal distribution is the most important and the most widely used
among all continuous distribution in the statistics. It is considered as the corner
stone of statistics theory. The graph of a Normal distribution is a bell – shaped
curved that extends indefinitely in both direction.
Normal curve
1 or 100%
航
Features (Properties) of Normal Curve
1. The curve is symmetrical about the vertical axis through the mean 航.
2. The mode is the highest point on the horizontal axis where the curve is
maximum and occurs where 捲 = 航.
3. The normal curve approaches the horizontal axis asymptotically.
4. The total area under the curve is one (1) or 100%.
52 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Note: The last three of the above properties are arrived at through advanced
mathematical treatment.
It is clear from these properties that a knowledge of the population means
and standard deviation gives a complete picture of the distribution of all the
values.
券 捲; 航, 購 = 血 捲 = 購
1 1 捲 −航
, −∞ < 捲 < ∞
−2 購
2講
結
Where 購 and 航 are the parameters of the distribution. Note that since 券(捲; 航, 購) is a
p.d.f, it established the fact that the area is 1. In order word, cumulative
distribution function is given by
2
1 1 捲 −航
繋 捲 =鶏 捲判捲 =
∞ −
2 購 穴捲.
−∞ 購 2 講
結
53 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
血権 = −2 ,
−∞ < 権 < ∞.
called the standard normal variable with p.d.f
2講
結
The table usually used to determine the probability that a random variable 捲
drawn from a normal population with no mean and standard deviation 1 is
the standard normal distribution table (or 権- scores table).
The following point has keen noted in the used of the table:
鶏 権 判 0 = 0.5
鶏 権 判 −権 = 1 − 鶏 権 判 権 = 鶏(権 半 権)
0.5 0.5
0 -z 0 z
Case I
(i) Area 畦 = 鶏(権 判 −権 1)
= 鶏(権 > 権 1)
= 1 − 鶏(権 < 権 1)
(ii) 鶏 権 < 権 1 = 鶏(権 > −権 1)
鶏 0 < 権 < 権 1 = 鶏(−権 < 権 < 0)
Case II
AB
-z 0 z
Area A ≠ Area B
Here 鶏 −権 1 < 権 < 権 2 = 鶏 権 < 権 2 − 鶏 権 > −権 1
= 鶏 権 < 権 2 − [1 − 鶏 権 < 権 1 ]
=鶏 権<権2 +鶏 権<権1 −1
54 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Example: Find the probability that a random variable having the standard
distribution will take a value
(a) less than 1.72 (b) less than –0.88
(c) between 1.19 and 2.12 (d) between –0.36 and 1.21
Solution:
(a) 鶏 権 < 1.72 = 0.9573 (using the table)
(b)鶏 権 < −0.88 = 1 − 鶏(権 < 0.88)
= 1 − 0.8106
= 0.1894
(c) 鶏 1.19 < 権 < 2.12 = 鶏 権 < 2.12 − 鶏 権 > 1.19
= 鶏 権 < 2.12 − 1 − 鶏(権 < 1.19)
= 0.9830 − 1 + 0.8830
= 0.866
(d)鶏 −0.36 < 権 < 1.21 = 鶏 権 < 1.21 − 鶏 権 > −0.36
= 鶏 権 < 1.21 − 1 − 鶏(権 < 0.36)
= 鶏 権 < 1.21 + 鶏 権 < 0.36 − 1
= 0.8869 + 0.6406 − 1
= 0.5275
権=
捲−航 (for 航 ≠ 0 and 購 ≠ 1)
購
Example:
(1)Suppose the current measurement in a strip of wire assumed to be
normally distributed with a mean of 10mA and a variance of 4mA. What is
the probability that the current is greater than 13?
55 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
購 購
=鶏 <権<
51−55 59−55
4 4
= 1 − 鶏(権 判 −0.5)
= 1 − 0.3085
= 0.6915
∴ 69% of all the candidates will be expected to do better than you.
56 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
PP: It is known from the previous examination results that the marks of
candidates have a normal distribution with mean 55 and standard deviation 10.
If the pass mark in a new examination is set at 45, what percentages of the
candidates will be expected to fail?
57 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 6
(1)Two fair dies are toss 600 times. Let 隙 denotes number of times the total
of 7 occurs. Find the probability that 隙 lies between 80 and 110.
(2)A manufacturer of machine parts claims that at most 10% of each parts
are defective. A parameter needs 120 of such parts and to be sure of
getting many goods ones, he places an order for 140 parts. If
manufacturer claims is valid, what is the probability that the purchaser
would receive at least 120 good parts?
⇒鶏 権 > = 0.10
購 購
膏 −2
⇒1−鶏 権 判 = 0.10
膏−2
= 0.90
膏 −2
⇒鶏 権 判
2
= 0.90
膏−2
i.e.
2
= −1 0.90 = 1.285
2
i.e. 膏−2
⇒ 膏 = 2 1.285 + 2 = 4.570.
58 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Scatter diagram: The first step in studying the relationship between two
variables is to draw a scatter diagram. This is a graph that shows visually the
relationship between two variables in which each point corresponds to pair of
observations, one variable being plotted against the other. The way in which the
dots lie on the scatter diagrams shows the type of relationship that exists.
y
Regression Models
In order to predict one variable from the other, it is necessary to
construct a line or curve that passes through the middle of the points, such that
the sum of the distance between each point and the line is equal to 0. Such line
is called the line of best fit.
The simple regression equation of 桁 on 隙 is defined as
桁 = 欠 + 決隙 + 結
while the multiple regression equation of 桁 on 隙 1, 隙 2, … , 隙倦 is
59 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
桁 = 欠 + 決 1 隙 1 + 決 2 隙 2 + 橋 + 決倦 隙倦 + 結. where:
桁 is the observed dependent variable
隙 is the observed independent (explanatory) variable
欠 is the intercept (the point at which the reg. line cuts the 桁 axis)
決 is the slope (regression coefficient). It gives the rate of change in 桁 per unit
change in 隙.
結 is the error term.
∴ 桁件 = 券欠 + 決 隙件......................................................................................... ゅなょ
欠 件=1 件 件
芸 = −2
券 桁 − 欠 − 決隙 隙 = 0
件 件 件
⇒ 桁件 隙件 = 欠 隙件 + 決 隙件
件=1
2................................................................................
決
(2)
Equation (1) and (2) are called the normal equations. Form equation (1), we have
60 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
欠=
桁件 −決 隙件
券 …....................................................ゅぬょ
Substituting equation (3) in the equation (2), we have
+決 隙2
桁隙
=
桁件 −決 隙件
隙
⇒ 券 桁件 隙件 = 桁件 隙件 − 決 隙件 2 + 決券 隙件
件 件 件 件
2
券
券 桁件 隙件 = 桁件 隙件 + 決 券 隙件 − 隙件 2
2
⇒決= 券 隙件 2− 隙件 2
券 桁件 隙件 − 桁件 隙件
…......................................ゅねょ
Also, dividing through by 券 we have
決=
桁件 隙件
桁件 隙件 −
2 隙件 2 券
券 券
決=
桁件隙件−券桁 隙
2 2 ….................................................ゅのょ
欠 = 桁 − 決隙
隙件 −券隙
…………………………………………… ゅはょ ∎
∴ 桁 = 欠 + 決隙 is the regression equation.
鯨検 = 結件 2
=
error is given by 券
where 結件 件=1 桁 − 桁 and 鯨検 is the standard
券−2
error of estimate.
=
2 結件 2
桁 − 桁 2, the unexplained variation is 鯨検
2
=
variation is 鯨劇 券
件=1
.
券 −2
61 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
堅2=1 =1 鯨検 2
戟券結捲喧健欠件券結穴 懸欠堅件欠建件剣 券劇剣建欠健 懸欠堅件欠建
鯨劇 2 .
件剣券
− −
62 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Examples:
(1)The following are measurement height and ages months of maize plant in
a plantation farm.
Age month 1 2 3 4 5 6 7
Height (cm) 5 13 16 23 33 38 40
(a) Draw the scatter diagram
(b) Obtained the regression equation of the age month on the height of
the plant.
(c) Estimate the height of the maize plant aged 10 months.
Solution:
(a) (Using SPSS)
(Using Excel)
63 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Scatter diagram showing both Age month and Height (cm) of soya plants
45
40
35
30
25
20
Height
15
10 Series1
5 Linear (Series1)
0
02468
Age month
(b) X Y XY
X2
1 5 1 5
2 13 4 26
3 16 9 48
4 23 16 92
5 33 25 165
6 38 36 228
7 40 49 280
28 16 140 844
Using equation (4), we have
決= =
7 844 − 168 (28)
7 140 − 28 2
券 桁件 隙件 − 桁件 隙件
2 2
券 隙件 − 隙件
=
1204
196
=
6.143
欠= =
168−6.143(28)
7
桁件 −決 隙件
= −0.572
券
64 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
(c) When 隙 = 10, 桁 = −0.572 + 6.143 10 = 60.858. That is, the height of the
65 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
(2)A study was made on the effect of a certain brand of fertilizer (隙) on
cassava yield (桁) per plot of farm area resulting in the following data:
Fertilizer 0 1 2 3 4 5 6 7 8 9 10
Cassava yield 5 9 8 11 14 12 13 15 18 17 22
(a) Plot the scatter diagram and draw the line of best fit.
(b) Resolve this data to a simple regression equation
(c) What is the value of maize yield when the fertilizer is 12?
(d) Obtain the standard error of the regression
Solution:
(a)
25
20
15
Cassava
10 Series1
Linear (Series1)
0
0 5 10 15
Fertilizer
66 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
(b)
隙 桁 隙2 隙桁 桁 = 欠 + 決隙 結件 = 桁件 − 桁 件 結件 2
0 5 0 0 6.091 -1.091 1.190
1 9 1 9 7.491 1.509 2.277
2 8 4 16 8.891 -0.891 0.794
3 11 9 33 10.291 0.709 0.503
4 14 16 56 11.691 2.309 5.331
5 12 25 60 13.091 -1.091 1.190
6 13 36 78 14.491 -1.491 2.223
7 15 49 105 15.811 -0.891 0.794
8 18 64 144 17.291 -0.709 0.503
9 17 81 153 18.691 1.691 2.859
10 22 100 220 20.091 1.909 3.644
決= =
11 874 − 144 (55)
11 385 − 55 2
券 桁件 隙件 − 桁件 隙件
2 2
券 隙件 − 隙件
= 1210
1694
= 1.4
欠= =
144−1.4(55)
11
桁件 −決 隙件
= 6.091
券
券 −2 9
67 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
CORRELATION
Correlation measures the degree of linear association between two or
more variables when a movement in one variable is associated with the
movement in the other variable either in the same direction or the other
direction. Correlation coefficient is a magnitude, which indicates the degree of
linear association between two variables. It is given by
堅= 券 隙 2− 隙 2 券 桁 2− 桁2
券 隙桁− 隙 桁
= 隙 2−券隙 2 桁 2−券桁 2
隙桁−券隙 桁
= =
that 糠 1 . Given 糠 1 and 紅 1, the product
桁 2− 桁 2 隙 2− 隙 2
券 隙桁− 隙 桁 券 , 紅 券 隙桁− 隙 桁 券
1
堅 = 糠1紅1=
隙桁 −券隙 桁 隙桁 −券隙 桁
2−券桁 2 隙 2−券隙 2
∙
桁
=
隙桁 − 券 隙 桁 2
2 2
桁 −券桁 2隙 − 券隙
= Or = 券 隙 2− 隙 2 券 桁 2− 桁2
券 隙桁− 隙 桁
2−券桁 2 2−券隙 2
隙桁−券隙 桁
桁 隙
Interpretation of r:
堅 = +1, Implies that there is a perfect positive linear (direct) relationship
堅 = −1, Implies a perfect negative (indirect) linear relationship
−1 < 堅 < −0.5, Implies there is a strong negative linear relationship
−0.5 < 堅 < 0, Implies there is a weak negative linear relationship
0 < 堅 < +0.5, Implies there is a weak positive linear relationship
+0.5 < 堅 < +1, Implies there is a strong positive linear relationship
堅 = 0, Implies there is no linear relationship between the two variables.
68 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Solution:
X Y X2 XY Y2
1 5 1 5 25
2 13 4 26 169
3 16 9 48 256
4 23 16 92 529
5 33 25 165 1089
6 38 36 228 1444
7 40 49 280 1600
堅= 券 隙 2− 隙 2 券 桁 2− 桁2
=
7 844 − 28 (168)
券 隙桁 − 隙 桁
=
1204
196 7560
= 0.9891
Comment: There is a strong positive linear relationship between 隙 and 桁.
嫌 2
券 券 −1
Judge Y 68 56 60 76 45 46 57 85 32 40 85 88 57 68 64
Solution:
Judge 72 45 48 85 59 94 68 92 29 50 73 90 48 56 75
X
Judge 68 56 60 76 45 46 57 85 32 40 85 88 57 68 64
Y
RI 7 14 12.5 4 9 1 8 2 15 11 6 3 12.5 10 5
6 経2
6(229.5)
堅嫌 = 1 − =1− = 1 − 0.4098 = 0.5902
券 券2− 1 15 152 − 1
Comment: There is a fairly strong agreement between the two judges in their
assessment.
70 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 7
In a certain company, drums of mentholated spirit are kept in storage for
sometime before being bottled. During storage, the evaporation of part of water
content of the spirit takes place and an examination of such drums give the
following results.
Storage time (weeks) 2 5 7 9 12 13
Evaporation loss (a.m) 38 57 65 73 84 91
(i) Find the regression equation of evaporation loss on the storage time
for the mentholated spirit.
(ii) Find the regression equation of storage time on the evaporation loss.
(iii) Find the product moment correlation coefficient.
(iv) Determine the Spearman’s rank correlation.
ESTIMATION
When we assign value to a population parameter based on sample
information is called Estimation. An estimate is a value assign to population
parameter based on the value of the statistics. A sample statistics used to
estimate a population parameter is called an estimator. In other words, the
function or rule that is used to guess the value of a parameter is called an
Estimator, and estimate is a particular value calculated from a particular sample
of an observation. But estimator like any statistic is a random variable.
Parameter is represented by Greek letter while statistic represented by roman
numbers:
Parameter (population characteristics) Statistic (sample characteristics)
航 隙
貢 喧
購2 鯨2
71 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
An estimator is divided into two, namely (i) Point estimator (ii) Interval
estimator. A point estimator is a single value given to the population parameter
based on the value of the sample statistics; while an interval estimator consists
of two numerical values within which we believe with some degree of
confidence include the value of the parameter being estimated. In many
situations, a point estimate does not supply the complete information to a
researcher; hence, an approach is used called the confidence interval.
The subject of estimator is concerned with the methods by which
population characteristics are estimated from sample information. The
objectives are:
(i) To present properties for judging how well a given sample statistics
estimates the parent population parameter.
(ii) To present several methods for estimating these parameters.
券 捲
Solution:
=継
隙件
(a) 継 隙
= 継
1
券
隙
=
1
券 件
継 隙
券 件
72 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
=
1
航
1
= ∙ 券航
券
=航
券
∴ 隙 is an unbiased estimator.
(b)継 鯨 2 = 継 隙 −隙 2
1
券
=継
1
件
隙 −航 − 隙 −航 2
券 件=1
券
= 券 継 隙 −航 2−継 隙 −航 2
1
券 件=1 件
=
券1 件=1 件
券 撃隙 −撃 隙
= ∙ 券購 2 − 購 ∙ 券
件=1
1 1
券
2
券
=購2−
購2
券 券
券− 1 購 2
券
券購 − 購 2
2
⇒継鯨 2
= =
券券
2 2 2
∴継 鯨 ≠ 購 So 鯨 is not unbiased estimator.
(c) Proof: 撃 捲 = 撃
隙件
=
券
券 21 + 隙 2 + 橋 + 隙券
撃隙
1
= 撃 隙 + 撃 隙 + 橋+ 撃 隙
1
券2 1 2 券
= 購 2 + 購 2 + 橋+ 購 2
1
券2
=
1 券購 2
券2
=
購2 ∎
券
2. Efficiency
The most efficiency estimator among a group of unbiased estimator is the
one with the smallest variance. This concept refers to the sampling
variability of an estimator.
73 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
= 1.576
購2
券
Answer: 撃 隙
1
2
Relatively Efficiency of two unbiased estimators and can be
1
calculated by taking the ratios: 撃
撃 2
where 撃 1 is the smallest
variance.
3. Sufficiency
An estimator is sufficient if it uses all the information that a sample can
provide about a population parameter and no other estimator can provide
additional information.
4. Consistency
An estimator is consistent if as the sample size becomes larger, the
probability increases that the estimates will approach the true value of
74 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
INTERVAL ESTIMATION
This involves specifying a range of values on which we can assert
with a certain degree of confidence that a population parameter will fall
within the interval. The confidence that we have a population parameter
will fall within confidence interval is 1 − 糠 , where 糠 is the probability
that the interval does not contain .
Confidence Interval (C.I)= 1 − 糠
糠 2 1− 糠 糠 2 C.I +糠 = 1
判 航 判 隙 + 傑糠 隙 ± 傑糠
購 購 購
2 2 2
隙 − 傑糠 Or for short
券 券 券
隙 −航
2 2
[Derived from −傑糠 判 購 判 傑糠 ]
券
(a) 隙 ± 傑糠 購
.隙 = 130, 券 = 49, 購 = 10, 1 − 糠 = 0.90 ⇒ 糠 = 0.1, 糠 2 = 0.05
2
10
券
130 ±
2 49
傑糠
75 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
130 ± 傑 10
0.05
7
130 ± 2.578(1.43)
126.31, 133.69
76 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
73 ± 建 0.05,24
25
73 ± 1.71(2)
77 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
73 ± 3.42
69.58, 76.42
喧 ± 傑糠
2 喧 1−喧 where 喧 = 捲 券.
券
Example: A survey was conducted to study the dental health practices and
attitudes of certain urban adult population. Of 300 adults interviewed, 123 said
that they regularly had a dental check up twice a year. Obtain a 95% confidence
interval for 喧 based on this data.
= 券 = 300, 喧 = 捲
= 0.41, 券喧 = 300 × 0.41 = 123, 券圏 =
123
券 300
Solution:
喧 ± 傑糠 喧 1−喧
2
券
0.41 ± 傑 0.41(0.59)
0.025
300
0.41 ± 1.96(0.028)
0.41 ± 0.055
0.36, 0.47
78 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 8
A medical record Liberian drew a random sample of などど patients’ charts and found
that in 8% of them, the face sheet had at least one item of information
contradiction to other information in the record. Construct the 90%, 95% and
99% confidence interval for the population of charts containing such
discrepancies.
17150
17150
5.6287 判 購2判 26.1190
79 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
HYPOTHESIS TESTING
This is another very important aspect of statistical inference. It involves
testing the validity of a statistical statement about a population parameter based
on the available information at a given level of significance.
Basic Definitions
Statistical hypothesis: This is a statistical statement which may or may not be
true concerning one or more populations.
Test Statistic: This is a calculated quantity from the given information which
when compares with the tabulated value is used to take a decision about the
=
hypothesis being tested. E.g. The test statistic for single mean: 傑潔 隙
−航購 when
券 半 30, 建潔 = 隙
券
Note: The choice of the appropriate test statistic depends on the type of estimate
and sample distribution.
Critical Region: This is the subset of the sample space which leads to the
rejection of the null hypothesis under consideration. It is the set of all values
whose total probability is small on the null hypothesis, which is better explained
by the alternative hypothesis.
80 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
81 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
傑=
購 known, population normal 隙 −航 0
"航"
購
券
"航"
購
券
"購 "
券
2
= with (券 − 1) df
2 (券−1)鯨 2
Population variance, population normal
購 02
"貢"
傑= , 喧0 = 券
Binomial proportion 喧 −喧 0 捲
喧0圏0
券
3. Determine the critical region (or rejection region) using the table of the
statistic. But the value of 糠 must be known i.e. one or two – tail test.
4. Compute the value of the test statistic based on the sample information
5. Make the statistical decision and interpretation.
82 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
population
83 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
that is normally distributed with a known variance 45. Can the researcher
conclude that mean enzyme level in this population is different from 25? Take
糠 = 0.05.
Solution: Hypothesis 茎 0: 航 0 = 25
茎 1: 航 0 ≠ 25
糠 = 0.05, 券 = 10, 隙 = 22, 糠 2 = 0.025 (建拳剣 − 建欠件健 建結嫌建)
Test Statistic = 傑潔
= 0
隙 −航
購
= −1.4142
券
傑 = =
22−25
−3
45 2.1213
潔
10
傑 = = 0.18 = 1.507
隙 −航 0 2.52−2.49
82
潔 購
84 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
unknown
Example 3: A certain drug company claims that one brand of headache tablet is
capable of curing headache in one hour. A random variable of 16 headache
patients is given the tablets. The mean curing time was found to be one hour
and nine minutes while the standard deviation of the 16 times was eight
minutes. Does the data support the company’s claim or not at 9 の % level of
significance?
Solution: 茎 0: 航 0 = 1.00
茎 1: 航 0 ≠ 1.00
=
建潔欠健
隙 −航 0
鯨
券
=
1.15−1.00
0.13
16
= 0.0325 = 4.6154
0.15
Decision: Since 建潔欠健 > 建 0.025,15 , we reject 茎 0 and conclude that the data does not
2
購 12 購 22
+
券1 券2
Hypothesis
茎 0: 航 1 − 航 2 = 0 vs 茎 1: 航 1 − 航 2 ≠ 0
茎 0: 航 1 − 航 2 > 0 vs 茎 1: 航 1 − 航 2 <
0 茎 0: 航 1 − 航 2 判 0 vs 茎 1: 航 1 − 航 2 >
0
85 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
86 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
to assume that the two populations of values are normally distributed with
variances equal to one. Do this data provide sufficient evidence to indicate a
difference in the mean levels between mongolism, using 糠 = 0.05?
Solution: 茎 0: 航 1 = 航 2 ⇒ 航1− 航2=0
茎 1: 航 1 ≠ 航 2 ⇒航1−航2 ≠ 0
傑 潔欠健 =
4.5−3.4 −0
1 1
12 +15
=
1.1 = 2.8402
0.15
傑糠 = 傑 0.025 = ±1.96
2
Decision: We will reject 茎 0 since 傑潔欠健 > 傑糠 and conclude that the means are not
2
equal.
=
券 1−1 鯨 12 + 券 2−1 鯨 22
=航 2
2
Test Statistic: 建
捲 1 −捲 2 −航1 where 鯨
−
2 2 券 1+券 2−2
潔欠健 喧
+
喧 喧
鯨 鯨
券1 券2
鯨喧 2 = = = 1450
券 1−1 鯨 12+ 券 2−1 鯨 22 14 1225 +21(1600)
券 1+券 2−2 35
建 潔欠健 = −01
96−120
捲 − 捲 2 − 航 1−航 2
= = −1.85
2 2 1 1
87 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
1450 +
22
+ 1
鯨喧
鯨喧
券1 券2
88 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
= = = 16
2 (券−1)鯨 2 24(100)
潔欠健 2 150
= 20.025,24 = 39.3641
購
2
糠 , 券−1
2
Decision: Since 2 潔欠健 < 20.025,24 , we accept 茎 0 and conclude that 購 2 = 150 at
糠 = 0.05 level of significance.
GOODNESS OF FIT
This test is used to compare the observed frequencies and the frequencies
we might expect (expected frequency) from a given theoretical explanation of
the phenomenon under investigation. A measure of this discrepancy is given by
2
(Chi square) statistic define as:
2 頚件 −継件 2 継
=
倦
件=1
件
89 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Example 1: The distribution of final grade given by STS 202 lecturers in the past
was 10% 畦嫌, 20% 稽嫌, 30% 系嫌, 25% 経嫌 and 15% 繋嫌. A new lecturer gave the
following grades for the second semester:
Category A B C D F
Observed 12 20 26 14 8
)s there sufficient evidence to suggest that the new lecturer’s policy is
different from that of the formal lecturers? Use a 95% level of significance.
Solution: 茎 0: 喧 1 = 0.10, 喧 2 = 0.20, 喧 3 = 0.30, 喧 4 = 0.25, 喧 5 = 0.15
茎 1: 茎 0 is not true
券 = 12 + 20 + 26 + 14 + 8 = 80
Category Observed frequency Expected frequency
10
× 80 = 8
100
A 12
20
× 80 = 16
100
B 20
30
× 80 = 24
100
C 26
25
× 80 = 20
100
D 14
15
× 80 = 12
100
F 8
5
=
2 頚件 −継件 2 継
潔欠健 件=1 件
= + + + 20 +
12−8 2 20−16 2 26−24 2 14−20 2 8−12 2
8 16 12
24
=
6.3
Since we have five classes (grades), degree of freedom = 倦 − 1 = 5 − 1 = 4
and 糠 = 0.05.
90 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
2
0.05,4 = 9.49. Hence, since 2 潔欠健 < 2 , we do not reject 茎 0, meaning that
0.0
the data do not suggest that the new lecturer’s grading policy is different.
2 頚 21 頚 22 橋 頚 2潔 匝
教 教 教 教 教 教
r 頚堅 1 頚 22 橋 頚堅潔
Column 層 匝 橋
Total
Example 2: Consider the case where pressure gauges are being hydraulically
tested by 3 inspectors prior to shipment. It has been noted that their acceptance
and rejection for some period of time have been as follows:
91 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Inspectors
I II III Totals
Passed 150 50 100 300
Failed 20 10 30 60
Totals 170 60 130 360
Test the hypothesis that all inspectors are equally stringent, take 糠 = 0.05.
Solution: 茎 0: 畦健健 i 券嫌喧結潔建剣堅嫌欠堅結 結圏憲欠健健検嫌建堅 i 券訣結券建茎 1: 茎 0 券剣建建堅憲結
= 2
2 2
Test statistic: 3
倹 =1
頚件 倹 −継件倹
件 継件倹
:
= = 141.67
300(170)
継件倹
360
継
11
= = 50
300(60)
継 360
継= = 108.33
300(130)
12 360
継= = 28.33
60(170)
13 360
21
継 22 = 60(60)
= 10
360
= = 21.67
継 23 60(130)
360
= + 橋+ = 6.7817
150−141.67 2 30−21.67 2
141.67 21.67
92 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 9
Random samples of 1200 students who live in Saudi hall of the Crescent
University were asked their daily eating habits by means of questionnaire 隙 .
Similarly, information was obtained from another sample of 1050 students
living in Yola hall of the same institution, by means of questionnaire 桁 . The
results were as follows. Can the difference in these distributions be purely due
to chance? Support your answer from a statistical point of view.
Number of Students
X Y
Miss no meal per week 430 500
Miss 1 – 4 meals per week 500 300
Miss 5 – 8 meals per week 200 225
Miss 9 or more meals per 70 25
week
93 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
94 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
95 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Assumptions
1. The 倦 sets of observed data constitute 倦 random samples for the
respective populations.
2. Each of the population from which the sample is normally distributed
with mean 航件 and variance 購倹 2 respectively.
1 2 3 … 倦
捲 11 捲 12 捲 13 … 捲 1倦
捲 21 捲 22 捲 23 … 捲 2倦
教 教 教 教 教
捲券 1 捲券 2 捲券 3 …… 捲券倦
Total
劇.1 劇.2 劇.3 … 劇.倦 劇..
捲 .1 捲 .2 捲 .3 … 捲 .倦 捲 ..
Hypothesis
茎 0: 航 1 = 航 2 = 航 3 = 橋 = 航倦茎
1 : Not all 航倹 are equal
96 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Alternatively:
茎 0: 倹 = 0
茎 1: Not all 倹 equal 0.
CALCULATION
The following calculations are going to be used.
捲件倹 = 件 th observation receiving 倹 th treatment
劇.倹 = 券 捲件倹 = Total of the 倹 th column
件
劇.. = 倦
券
捲..
=
劇..
軽
SStotal = 倦 2
捲件倹 − 捲 ..
券
件=1
倹
劇..2
= 倹 =1 捲 2
倦 券
件=1 件倹 − 軽
ANAOVA TABLE
Source d.f S.S M.S F
Treatment t-1 SStrt SStrt/t-1 T.M.S/E.M.S
Error N-t SSerror SSerror/N-t
Total N-1 SStotal
97 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Solution:
劇..2
=
2
倹 =1
SStotal 倦 券
件=1 捲件倹 −
= 32 + 22 + 橋 + 52 −
8100
軽
15
= 602 −
8100
15
= 62
SS = 券 劇件.2 劇..2
trt 件=1 券
−
軽
= + + 橋+ − 540
2
9 212 182
件
3 3 3
= 576 − 540
= 36
SSerror = SStotal − SStrt
= 62 − 36
= 26
ANAOVA TABLE
Source d.f S.S M.S F
Treatment 5-1=4 36 36/4 = 9 9.0/2.6 = 3.46
Error 15-5=10 26 26/10 =2.6
Total 15-1=14 62
98 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
Decision: Since 繋 潔欠健 < 繋 建欠決 , we accept 茎 0 and conclude that the treatment means
are equal or no statistical difference in the treatment means.
劇..2
=
2
SStotal
倹 =1
倦 券
件=1 捲件倹 −
= 32.12 + 橋 + 44.62 −
343.3 2
軽
= 205.68
= + + − 13094.99
2
95.9 1602 87.42
倹
3 4 2
= 189.99
SSerror = SStotal − SStrt
= 205.68 − 189.99
99 | P a g e
ADEOSUN S.A www.crescent-university.edu.ng
= 15.69
ANAOVA TABLE
Source S.S d.f M.S F
Treatment 189.99 2 94.995 94.995/2.615 = 36.33
Error 15.69 6 2.615
Total 205.68 8
Critical value: 繋建欠決 = 繋糠,懸 1 ,懸 2 = 繋 0.05,2,6 = 5.14
Decision: Since 繋潔欠健 > 繋建欠決 , we reject 茎 0 and conclude that there is an evidence to
say that the mean tensile strength differ for the three processes.
教 教 劇券 捲券
券 捲券 1 捲券 2 捲券 3 … 捲券倦
Total 劇.1 劇.2 劇.3 … 劇.倦 劇..
Mean 捲 .1 捲 .2 捲 .3 … 捲 .倦 捲 ..
Total of the 件 th block (Replicate):
劇件. = 倦 捲件倹
倹
Mean of the 件 th block:
捲 =
倹 =1 件倹倦
倦
捲
件
Grand total:
劇.. = 倦
劇.倹
倹
The statistical model for two – way classification of ANOVA is
捲件倹 = 航 + 紅件 + 倹 + 結件倹 件 = 1, … , 券 , 倹 = 1, … , 倦
100 | P a g
e
ADEOSUN S.A www.crescent-university.edu.ng
Where
101 | P a g
e
ADEOSUN S.A www.crescent-university.edu.ng
Assumptions
1. 結件 倹 ~軽 0, 購 2
2. 倹 = 紅件 = 0.
Calculation
劇..2
=
SStotal
倹 =1
倦 券
捲 2 −
=
件=1 件倹 軽
劇件.2 劇..2
SS
券
block 件=1 券
−
件 軽
劇.倹 2 劇..2
=
SStreatment
倹 =1 券
倦
− 軽
Example: Samples of 200 machined parts were selected from the one week
output of machine shop that employs three machinists. The parts were
inspected to determine whether or not, they are defective and are categorized
according to which machinist did the work. The results are as follows:
Machines
A B C
Defective 10 8 14
Non – Defective 52 60 56
102 | P a g
e
ADEOSUN S.A www.crescent-university.edu.ng
Hypothesis 茎 0: 劇倹 = 0 懸嫌 茎 1: 軽剣建欠健健 劇倹 = 0
103 | P a g
e
ADEOSUN S.A www.crescent-university.edu.ng
Calculation
劇..2
=
SStotal
倹 =1
倦 券
捲 2
件=1 件倹 − 軽
= 3133.33
SS
券 = 劇件.2 劇..2
block 件=1 券
−
件 軽
= − 6666.64
322 +1682
= 3082.66
劇.倹 2 劇..2
=
SStrt
倹 =1 券
倦
倹
− 軽
= − 6666.64
622 +682 +702
= 17.33
= 33.34
ANAOVA TABLE
Source S.S d.f M.S VR
Treatment 17.33 2 8.665 3082.66/16.67 = 184.92
Block 3082.66 1 3082.66
Error 33.34 2 16.67
Total 3133.33 5
繋 1,2,0.01 = 98.50
Decision: We reject 茎 0 and conclude that defective, non-defective classification
dependent of machinist.
104 | P a g
e
ADEOSUN S.A www.crescent-university.edu.ng
Exercise 10
Three kinds of tomato are grown. The yields in grams, after harvesting
are given in the table below.
Kind
A B C
7 10 7
8 11 10
8 12 11
6 10 6
10 9
12
9
Carry out the analysis of variance for the data at 95% level of significance.
Non parametric tests were the test developed to deal with situations where
the population distributions are non-normal or unknown, or when little is
known about the distributions of the populations under study, or when these
distributions do not meet the requirements necessary for the use of parametric
tests especially when the sample size is small (less than 30). Non parametric
105 | P a g
e
ADEOSUN S.A www.crescent-university.edu.ng
method employs the median of the population and the method of hypothesis
testing in conducting its test. Some of parametric tests include sign test, Wilcox
on signed-rank test, Mann-Whitney U test, runs test, etc.
This test is used to study the median of a population and to compare two
populations when the samples are dependent. We usually assume that the data
are continuous.
Procedure
1. Hypothesis 茎 0: 警穴 = 警剣 茎 1: 警穴 ≠ 警剣
2. Test Statistic: 隙 = Number of data values in the sample above the
median value given in the null hypothesis 茎 0.
3. Significant level: Choose appropriate, say 糠.
4. Critical Region: When 茎 0 is true, 隙 has a binomial distribution with 券
sample size and 喧 = 0.5. We use table of binomial probabilities to find
the critical region. (Note ≠, >, <) If 糠 is the desire level of significance,
we choose critical value so that the probability that 隙 falls in the critical
region is as close to 糠 as possible.
5. Decision: We check whether the observed value, 隙 is in the critical region
or not; if 隙 falls into the critical region, we reject 茎 0, otherwise, we accept
茎 0.
The sign test may be used to compare two populations when the samples
are dependent (i.e. the values from the two samples occur in pairs).
106 | P a g
e
ADEOSUN S.A www.crescent-university.edu.ng
Although the sign test is very simple to use, it is not a very sensitive test.
Sometimes it will fail to reject a false null hypothesis when another test would
be successful in detection of the false of the null hypothesis. This is because, sign
test throws away a good deal of information about the data-it ignores the
magnitude of the data value, it only uses the information about whether the data
value is above the conjectured value of the median or not.
The Wilcox on signed-rank test is better than sign test because it uses
more information. We use this test to investigate a single population median and
to compare two populations using a paired experiment.
Let M be the value of the median in question that appears in the null
hypothesis. Calculate D = X – M for each data value X. we then rank the values of
D and place a minus sign in front of each rank corresponding to a negative
difference D. Let W+ = Sum of the positive ranks. W- = Absolute value of sum of
negative ranks.
0
statistic. We denote this value by W. If 激 判 系, reject 茎
.
107 | P a g
e
ADEOSUN S.A www.crescent-university.edu.ng
108 | P a g
e