MCM1C03
MCM1C03
M.Com
( I SEMESTER)
(2019 Admn Onwards)
QUANTITATIVE TECHNIQUES
FOR BUSINESS DECISIONS
190603
QUANTITATIVE TECHNIQUES FOR
BUSINESS DECISIONS.
M.Com
(2019 Admn Onwards)
UNIVERSITY OF CALICUT
SCHOOL OF DISTANCE EDUCATION
Calicut University- PO, Malappuram,
2 Correlation Analysis 8 - 23
3 Regression Analysis 24 - 37
4 Probability Distributions 38 - 40
5 Binomial Distribution 41 - 45
6 Poisson Distribution 46 - 49
7 Normal Distribution 50 - 56
8 Exponential Distribution 57
9 Uniform Distribution 58
10 Statistical Inferences 59 - 76
11 Chi-Square Test 77 - 87
12 Analysis of Variance 88 – 99
CHAPTER 1
Mathematical techniques
They are quantitative techniques in which numerical data are used along with the
principles of mathematics such as integration, calculus etc. They include permutations,
combinations, set theory, matrix analysis, differentials integration etc.
Permutations and combinations
Permutation is mathematical device of finding possible number of arrangements
or groups which can be made of a certain number of items from a set of observations. They
are groupings considering order of arrangements.
Combinations are number of selections or subsets which can be made of a certain
number of items from a set of observations, without considering order. Both combinations
and permutations help in ascertaining total number of possible cases.
Set theory
It is a modern mathematical device which solves the various types of critical
problems on the basis of sets and their operations like Union, intersection etc.
Matrix Algebra
Matrix is an orderly arrangement of certain given numbers or symbols in rows and
columns. Matrix analysis is thus a mathematical device of finding out the results of
different types of algebraic operations on the basis of relevant matrices. This is useful to
find values of unknown numbers connected with a number of simultaneous equations.
Differentials
Differential is a mathematical process of finding out changes in the dependent
variable with reference to a small change in the independent variable. It involves
differential coefficients of dependent variables with or without variables.
Integration
It is a technique just reversing the process of differentiation. It involves the
formula f(x) dx where f(x) is the function to be integrated
Statistical techniques
They are techniques which are used in conducting statistical inquiry concerning a
certain phenomenon. They include all the statistical methods beginning from the collection
of data till interpretation of those collected data. Important statistical techniques include
collection of data, classification and tabulation, measures of central tendency, measures of
dispersion, skewness and kurtosis, correlation, regression, interpolation and extrapolation,
index numbers, time series analysis, statistical quality control, ratio analysis , probability
theory, sampling technique, variance analysis, theory of attributes etc.
Programming techniques
These techniques focus on model building, and are widely applied by decision
makers relating to business operations. In programming, problem is formulated in
numerical form, and a suitable model is fitted to the problem and finally a solution is
derived. Prominent programming techniques include linear programming, queuing theory,
inventory theory, theory of games, decision theory, network programming, simulation,
replacement non linear programming, dynamic programming integer programming etc.
3
which arrangement of orders in terms of time and quantity, will give maximum profits.
Such question can be answered with the help of quantitative techniques.
Cost minimization
Quantitative techniques are helpful in tackling cost minimization problems. For
example waiting line theory enables a manager to minimize waiting and servicing costs.
Their techniques help business managers in taking a correct decision through analysis of
feasibility of adding facilities.
Forecasting
Quantitative techniques are useful in demand forecasting. They provide a scientific
basis of coping with the uncertainties of future demand. Demand forecasts serve as the
basis for capacity planning. Quantitative technique enables a manager to adopt the
minimum risk plan.
Inventory control
Inventory planning techniques help in deciding when to buy and how much to buy.
It enables management to arrive at appropriate balance between the costs and benefits of
holding stocks. The integrated production models technique is very useful in minimizing
costs of inventory, production and workforce. Statistical quality controls help us to
determine whether the production process is under control or not.
Applications of quantitative techniques in business operations
Quantitative techniques are widely applied for solving decision problems of routine
operations of business organizations. It is especially useful for business managers,
economist, statisticians, administrators, technicians and others in the field of business,
agriculture, industry services and defense. It has specific applications in the following
functional areas of business organizations.
Planning
In planning, quantitative techniques are applied to determine size and location of plant,
product development, factory construction, installation of equipment and machineries etc.
Purchasing
Quantitative techniques are applied in make or buy decisions, vendor
development, vendor rating, purchasing at varying prices, standardization and variety
reduction, logistics management.
Manufacturing
Quantitative techniques address questions like product mix, production planning,
quality control, job sequencing, and optimum run sizes.
Marketing
Marketing problems like demand forecasting, pricing competitive strategies,
optimal media planning and sales management can be solved through application
appropriate quantitative techniques.
Human resource management
Quantitative techniques supports decision making relating to man power planning
with due consideration to age, skill, wastage and recruitment , recruitment on the basis of
proper aptitude, method study , work measurement, job evaluation, development of
incentive plans, wage structuring and negotiating wage and incentive plan with the union.
6
REVIEW QUESTIONS:
1. Define Quantitative Techniques.
2. Explain the classification of quantitative techniques.
3. Explain the significance of quantitative decisions.
4. What are the uses of quantitative techniques in Business?
5. Explain the qualitative approach in decision making.
6. What are the important limitations of quantitative techniques?
8
Chapter 2
CORRELATION ANALYSIS
Meaning and Definition of Correlation
According to Croxton and Cowden, “when the relationship is of quantitative nature, the
appropriate statistical tool for discovering and measuring the relationship and expressing it in a brief
formula is known as correlation”
Correlation analysis helps to know the direction of relationship as well as the degree of
relationship exists between two or more variables.
Types of Correlation
(a) Positive Correlation: If two variables move in the same direction, then the correlation is
called positive. For example, price and supply are positively correlated. When price goes up,
supply goes up and vice versa.
(b) Negative Correlation: If two variables move in the opposite direction, then the correlation is
called negative. For example, price and demand are negatively correlated. When price goes up,
demand falls down and vice versa.
(a) Simple Correlation: In a correlation analysis, if there are only two variables, then the
correlation analysis is called simple correlation. For example, the relationship between weight
and height, price and demand, price and supply, etc.
(b) Partial Correlation: When there are more than two variables and we study the relationship
between any two variables only, assuming other variables as constant, it is called partial
correlation. For example, the study of the relationship between rainfall and agricultural
produce, without taking into consideration the effects of other factors such as quality of seeds,
quality of soil, use of fertilizer, etc.
(c) Multiple Correlation: When there are more than two variables and we study the relationship
between one variable and all the other variables taken together, then it is the case of multiple
9
correlation. Suppose there are three variables, namely x, y and z. The correlation between x
and (y & z) taken together is multiple correlation. Similarly, the relation between y and (x & z)
taken together is multiple correlation. Again, the relation between z and (x & y) taken together
is multiple correlation.
(a) Linear Correlation: When the amount of change in one variable leads to a constant
ratio of change in the other variable, the relationship is called linear correlation. For
example, if price falls down by 10%, it leads to a fall in supply by 12% each time, it is
linear correlation. When we plot the data on graph, we will get a straight line. Here, the
relationship between the variables may be expressed in the form of y = ax + b.
(b) Non-linear Correlation: When the amount of change in one variable does not lead to
a constant ratio of change in the other variable, the relationship is called non-linear
correlation. When we plot the data on graph, we never get a straight line. Therefore,
non-linear correlation is also called curvi-linear correlation.
IV. Logical and Illogical Correlation
(a) Logical Correlation: When the correlation between two variables is not only
mathematically defined but also logically sound, it is called logical correlation. For
example, correlation between price and demand.
(b) Illogical Correlation: When the correlation between two variables is mathematically
defined but not logically sound, it is called illogical correlation. For example,
correlation between availability of rainfall and height of people. This type of correlation
is also known as Spurious correlation or Non-sense correlation.
The various methods for studying correlation can be classified into two categories. They are:
I. Graphic Methods:
(1) Scatter diagram method
(2) Correlation Graph method
II. Mathematical Methods:
(1) Karl Pearson’s Product Moment Method
(2) Spearman’ Rank Correlation Method
(3) Concurrent Deviation Method
Graphic Methods:
This is a simple method for analysing correlation between two variables. One variable is shown
on the X- axis and the other on the Y-axis. Each pair of values is shown on the graph paper
using dots. When all the pairs of observations are plotted as dots, the relationship exists
10
between the variables is analysed by observing how the dots are scattered. If the dots show an
upward or downward trend, then the variables are correlated. We may interpret the scatter
diagram as follows:
(a) If all the dots are lying on a straight line from left bottom corner to the right upper corner,
there is perfect positive correlation between variables.
(b) If all the dots are lying on a straight line from left upper corner to the right bottom corner,
there is perfect negative correlation between variables.
(c) If all the dots are plotted on a narrow band from left bottom corner to the right upper
corner, there is high degree of positive correlation between variables.
(d) If all the dots are plotted on a narrow band from left upper corner to the right bottom
corner, there is high degree of negative correlation between variables.
(e) If all the dots are plotted on a wide band from left bottom corner to the right upper corner,
there is low degree of positive correlation between variables.
(f) If all the dots are plotted on a wide band from left upper part to the right bottom part, there
is low degree of negative correlation between variables.
(g) If the plotted dots do not show any trend, the variables are not correlated.
In correlation graph method, separate curves are drawn for each variable on the same
graph. The relationship between the variables is interpreted on the basis of the direction and
closeness of the curves. If both the curves move in the same direction, there is positive
correlation and if they are moving in opposite directions, there is negative correlation between
the variables.
Mathematical Methods:
Under mathematical methods, the correlation between variables is studied with the help of a
numerical value obtained using an appropriate formula. This numerical value is called
coefficient of correlation. Coefficient of correlation explains both the direction as well as
degree of relationship exists between the variables.
Degree of Correlation:
(f) Moderate Degree of Positive Correlation: When coefficient of correlation lies between –
0.5 and – 0.75
(g) Low Degree of Positive Correlation: When coefficient of correlation lies between 0 and +
0.33
(h) Low Degree of Negative Correlation: When coefficient of correlation lies between 0 and
– 0.33
(i) No Correlation: When coefficient of correlation is zero.
εxy
r =
√εx2 εy2
where x = deviations of X values from its actual mean
y = deviations of X values from its actual mean
OR
nεdxdy – (εdx . εdy)
r =
√nεdx2 – (εdx)2 nεdy2 – (εdy)2
where n = number of pairs of observations
dx = deviations of observations (variable x) from assumed mean
dy = deviations of observations (variable y) from assumed mean
OR
nεXY – (εX . εY)
r =
√nεX2 – (εX)2 nεY2 – (εY)2
where n = number of pairs of observations
X = Given values of variable X
Y = Given values of variable Y
Qn: From the following data compute product moment correlation coefficient and interpret it:
X 57 42 40 38 42 45 42 44 40 46 44 43
Y 10 26 30 41 29 27 27 19 18 19 31 29
Sol:
12
Probable Error
Probable error is a statistical device used to measure the reliability and dependability of
the value of correlation coefficient. When the numerical value of probable error is added to and
13
subtracted from the value of correlation coefficient, we get two limits within which the
population parameter is expected to lie.
1. Probable error can be used to measure the reliability and dependability of coefficient of
correlation
2. It helps to determine the limits within which population parameter is expected to lie.
3. With the help of P.E, coefficient of correlation can be interpreted more accurately:
(a) If ‘r’ is less than P.E, there is no evidence of correlation.
(b) If ‘r’ is more than 6 times of P.E, correlation is significant
(c) If ‘r’ is 0.5 or more and the P.E is not much, the correlation is considered to be
significant.
Qn: Find Standard Error (S.E) and Probable Error (P.E), if r = 0.8 and number of pairs of observations
= 64. Also interpret the value of ‘r’.
Sol:
= 0.36/8 = 0.045
Since ‘r’ is more than 26.32 times of P.E, the value of ‘r’ is highly significant.
Qn: Following table shows the marks obtained by students in two courses:
Course I 45 70 65 30 90 40 50 75 85 60
Course II 35 90 70 40 95 40 60 80 80 50
Find coefficient of correlation and P.E. Is ‘r’ significant?
Sol:
r = (10 x 3575) – (10 x 40) / √[(10 x 3500) – (10)2] [(10 x 4550) – (40)2]
= + 0.903
Since coefficient of correlation is more than 22.86 of P.E, ‘r’ is very significant.
Coefficient of Determination
Coefficient of Determination = r2
Qn: If the coefficient of correlation between two variables is 0.85, what percentage of variation of
dependent variable is explained? Also find the coefficient of non-determination.
Sol:
Coefficient of Determination
(Percentage of explained Variance) = r2
= 0.85 x 0.85 = 0.7225 = 72.25%
Coefficient of non-determination = 1– r2 = 1 – 0.7225= 2775 = 27.75%
6εD2
R =1 –
(n3 – n)
6εD2
R =1 –
(n3 – n)
Qn: From the following data, compute Spearman’s Rank Correlation Coefficient:
x 330 332 328 331 327 325
y 415 434 420 430 424 428
Sol:
Here ranks are not given. So, at first, we have to assign ranks to each observation:
6εD2
R =1 –
(n3 – n)]
6εD2
R =1 – = 1
(n3 – n)]
= 1 – [(6 x 20)/(63–6)] = 1 – (120/210) = 1– 0.5714 = 0.4286
Qn: From the following data, compute Spearman’s Rank Correlation Coefficient:
x 80 45 55 58 55 60 45 68 70 45 85
y 82 56 50 43 56 62 64 65 70 64 90
Sol:
This is the case of equal marks.
6 [79 + (2+0.5+0.5+0.5)]
=1 –
1320
= 1 – [6(79 + 3.5) ÷ 1320] = 1 – (495/1320) = 1– (0.375) = 0.625
the basis of number of concurrent deviations. That is why this method is named as such. The
coefficient of concurrent deviation is denoted by rc.
The formula for computing coefficient of concurrent deviation is:
rc = ± √± (2c – n) / n
rc = ± √± (2c – n) / n
Computation of Coefficient of Concurrent Deviation
x y dx Dy Dxdy
180 246 ... ... ...
182 240 + - -
186 230 + - -
191 217 + - -
183 233 - + -
185 227 + - -
189 215 + - -
196 195 + - -
193 200 - + -
c = 0
Number of concurrent deviations = 0
rc = ± √± (2c – n) / n
rc = ± √± (2 x 0 – 8) / 8 = ±√±(0—8)/8 = –1
There is perfect negative correlation between x and y.
PARTIAL CORRELATION
When there are more than two variables and we study the relationship between any two
variables only, assuming other variables as constant, it is called partial correlation. For example, the
study of the relationship between rainfall and agricultural produce, without taking into consideration
the effects of other factors such as quality of seeds, quality of soil, use of fertilizer, etc.
Partial correlation coefficient measures the relationship between one variable and one of the
other variables assuming that the effect of the rest of the variables is eliminated.
19
Suppose there are 3 variables namely x1, x2 and x3. Here, we can find three partial correlation
coefficients. They are:
(1) Partial Correlation coefficient between x1 and x2, keeping x3 as constant. This is denoted by
r12.3
(2) Partial Correlation coefficient between x1 and x3, keeping x2 as constant. This is denoted by
r13.2
(3) Partial Correlation coefficient between x2 and x3, keeping x1 as constant. This is denoted by
r23.1
The formulae for computing the above partial correlation coefficients are:
Sol:
(1)
r12 – r13 r23
r12.3 =
√1–r132 √1–r232
0.98 – 0.2376
=
√1–0.1936 √1–0.2916
(2)
r13 – r12 r23
r13.2 =
20
√1–r122 √1–r232
0.44 – 0.5292
=
√1–0.9604 √1–0.2916
0.54 – 0.4312
=
√1–0.9604 √1–0.1936
MULTIPLE CORRELATION
When there are more than two variables and we study the relationship between one
variable and all the other variables taken together, then it is the case of multiple correlation.
Suppose there are three variables, namely x, y and z. The correlation between x and (y & z)
taken together is multiple correlation. Similarly, the relation between y and (x & z) taken
together is multiple correlation. Again, the relation between z and (x & y) taken together is
multiple correlation. In all these cases, the correlation coefficient obtained will be termed as
coefficient of multiple correlation.
Suppose there are 3 variables namely x1, x2 and x3. Here, we can find three multiple
correlation coefficients. They are:
1. Multiple Correlation Coefficient between x1 on one side and x2 and x3 together on the other
side. This is denoted by R1.23
2. Multiple Correlation Coefficient between x2 on one side and x1 and x3 together on the other
side. This is denoted by R2.13
3. Multiple Correlation Coefficient between x3 on one side and x1 and x2 together on the other
side. This is denoted by R3.12
21
The formulae for computing the above multiple correlation coefficients are:
Qn: If r12 = 0.6, r23 = r13 = 0.8, find R1.23, R2.13 and R3.12 .
Sol:
REVIEW QUESTIONS:
Marks I 45 56 39 54 45 40 56 60 30 35
Marks II 40 56 30 44 36 32 45 42 20 36
Chapter 3
REGRESSION ANALYSIS
Meaning and Definition of Regression Analysis
Correlation analysis helps to know whether two variables are related or not. Once the
relationship between two variables is established, the same may be used for the purpose of predicting
the unknown value of one variable on the basis of the known value of the other. For this purpose we
have to examine the average functional relationship exists between the variables. This is known as
regression analysis.
Regression analysis may be defined as the process of ascertaining the average functional
relationship exists between variables so as to facilitate the mechanism of prediction or estimation or
forecasting. Regression analysis helps to predict the unknown values of a variable with the help of
known values of the other variable. The term regression was firstly used by Francis Galton.
Types of Regression
In a regression analysis, if there are only two variables, it is called simple regression analysis.
2. Multiple Regression
In a regression analysis, if there are more than two variables, it is called multiple regression analysis.
3. Linear Regression
In a regression analysis, if linear relation exists between variables, it is called linear regression
analysis. Under this, when we plot the data on a graph paper, we get a straight line. Here, the
relationship exists between variables can be expressed in the form of y = a + bx. In case of linear
regression, the change in dependent variable is proportionate to the changes in the independent
variable.
4. Non-linear Regression:
In case of non-linear regression, the relation between the variables cannot be expressed in the
form of y = a + bx. When the data are plotted on a graph, the dots will be concentrated, more
or less, around a curve. This is also called curvi-linear regression.
25
According to Francis Galton, “The regression lines show the average relationship between two
variables.”
Regression Equations
Regression equations are algebraic expression of regression lines. As there are two regression
lines, there are two regression equations. They are:
(a) Regression Equation of X on Y : It shows the change in the value of variable X for a given
change in the value of variable Y.
(b) Regression Equation of Y on X : It shows the change in the value of variable Y for a given
change in the value of variable X.
There are two methods for drawing regression lines. They are:
This is a simple method for constructing regression lines. Under this method, the values of paired
observations of the variable are plotted, by way of dots, on a graph paper. The X- axis represents the
independent variable and Y-axis represents the dependent variable. After observing how the dots are
scattered on the graph paper, we draw a straight line in such a way that the areas of the curve above
and below the line are approximately equal. The line so drawn clearly indicates the tendency of the
original data. Since there is subjectivity, this method is not commonly used in practice.
X 10 16 24 36 48
Y 20 12 32 40 55
Sol:
26
Y
60
50
40
30
20
10
0
0 10 20 30 40 50 60
Under method of least squares, the regression line should be drawn in such a way that the sum
of the squares of the deviations of the actual Y-values from the computed Y-values is the least. In
other words, 𝛆(y – yc)2 = minimum. The line so fitted is called line of best fit.
The following are the two important methods for calculating regression equations:
X on Y : X = a + bY
Y on X : Y = a + bX
For finding out the constants ‘a’ and ‘b’, we have to develop and solve certain equations, called
normal equations. Therefore, this method is called normal equation method.
The normal equations computing ‘a’ and ‘b’ in respect of regression equation X on Y are:
εX = Na + bεY, and
εXY = aεY + bεY2
27
The normal equations computing ‘a’ and ‘b’ in respect of regression equation Y on X are:
εY = Na + bεX, and
εXY = aεX + bεX2
After computing the values of the constants ‘a’ and ‘b’, substitute them to the respective
regression equations.
Qn: From the following data, fit the two regression equations:
x 4 5 8 2 1
y 5 6 7 3 2
Sol:
X = a + bY
The normal equations to find the values of ‘a’ and ‘b’ are:
εX = Na + bεY, and
20 = 5a + 23 b .............................. (1)
86 b = 110
b = 110/86 = 1.28
20 = 5a + 23 x 1.28;
28
a = -9.44/5 = – 1.89
X = – 1.89 + 1.28y
Y = a + bX
εY = Na + bεX, and
23 = 5a + 20 x 0.73; 23 = 5a + 14.6
Y = 1.68 + 0.73x
Under regression coefficient method, regression equations are developed with the help of
regression coefficients. Since there are two regression equations, two regression coefficients are to be
computed.
X – X̄ = bxy (Y – Ȳ )
bxy = r . (σx/σy)
bxy = εxy/εy2
Y – Ȳ = byx (X – X̄)
byx = r . (σy/σx)
byx = εxy/εx2
X 7 2 1 1 2 3 2 6
Y 2 6 4 3 2 2 8 4
Using regression coefficients:
Sol.
Y – Ȳ = byx (X – X̄)
8 x -17 – (-8 x 7)
byx =
8 x 44 – (-8)2
32
Y – 3.875 = – 0.278 (X – 3)
Y = – 0.278 X + 4.709
X – X̄ = bxy (Y – Ȳ )
X– 3 = – 0.3042 (Y – 3.875)
X = – 0.3042Y + 4.179
MULTIPLE REGRESSION
In multiple regression there are more than two variables. Here, we examine the effect of
two or more x3ndependent variables on one dependent variable. Suppose there are three
variables, namely, x1, x2 and x3. Here we may find three regression equations. They are:
where x̄1, x̄2 and x̄3 are actual means of x1, x2 and x3 respectively.
Yule’s Notation
Yule suggested that, the above equations may be simplified by taking (x3 – x̄3) = X1, (x3 – x̄3) = X2
and (x3 – x̄3) = X3. Then the equations of planes of regression are:
In the above three equations, we used six regression coefficients. Following are the formulae
for computing regression coefficients:
Qn: If r12 = 0.7, r31 = r23 = 0.5, σ1 = 2, σ2 = 3 and σ3 = 3, find the equation of plane of regression x1
on x2 and x3.
Sol:
Here, means of the variables are not given, and therefore, it is convenient to write the equations
of planes of regression using Yule’s notation.
X1 = b12.3X2+ b13.2X3
∴ X1 = 0.4X2+ 0.133X3
Qn: In a trivariate distribution, x̄1 =53, x̄2 = 52, x̄3 = 51, σ1 = 3.88, σ2 = 2.97, σ3 = 2.86, r23= 0.8, r31=
0.81 and r12= 0.78. Find the linear regression equation of x1 on x2 and x3.
Sol:
Qn: In a trivariate distribution, x̄1 =28.02, x̄2 = 4.91, x̄3 = 594, σ1 = 4.4, σ2 = 1.1, σ3 = 80, r23= –0.56,
r31= – 0.4 and r12= 0. 8. Estimate the value of x1 when x2 = 6 and x3 = 650.
Sol:
Here, to estimate the value of x1, we have to find the regression equation of x1 on x2 and x3.
REVIEW QUESTIONS:
CHAPTER 4
PROBABILITY DISTRIBUTIONS
(THEORETICAL DISTRIBUTIONS)
Definition
Probability distribution (Theoretical Distribution) can be defined as a distribution obtained
for a random variable on the basis of a mathematical model. It is obtained not on the basis of
actual observation or experiments, but on the basis of probability law.
Random variable
Random variable is a variable who value is determined by the outcome of a random
experiment. Random variable is also called chance variable or stochastic variable.
For example, suppose we toss a coin. Obtaining of head in this random experiment is a random
variable. Here the random variable of “obtaining heads” can take the numerical values.
Now, we can prepare a table showing the values of the random variable and corresponding
probabilities. This is called probability distributions or theoretical distribution.
In the above, example probability distribution is :-
Obtaining of heads Probability of
(X) obtaining heads
P(X)
0 ½
1 ½
∑ P (X) = 1
X: 0 1 2 3 4
Solution
39
Here all values of P(X) are more than zero; and sum of all P(X) value is equal to 1
Since two conditions, namely P(X) ≤0 and ∑P(X) = 1, are satisfied, the given distribution is a probability
distribution.
MATHEMATICAL EXPECTATION
(EXPECTED VALUE)
If X is a random variable assuming values x1, x2, x3,…………,xn with corresponding probabilities P1,
P2, P3,…………,Pn, then the Expectation of X is defined as x1p1+ x2p2+ x3p3+………+ xnpn.
E(X) = ∑ [x. p(x)]
Qn:
A petrol pump proprietor sells on an average Rs. 80,000/- worth of petrol on rainy days and an
average of Rs. 95.000 on clear days. Statistics from the meteorological department show that the
probability is 0.76 for clear weather and 0.24 for rainy weather on coming Wednesday. Find the expected
value of petrol sale on coming Wednesday.
There are three alternative proposals before a business man to start a new project:-
Proposal I: Profit of Rs. 5 lakhs with a probability of 0.6 or a loss of Rs. 80,000 with a
probability of 0.4.
Proposal II: Profit of Rs. 10 laksh with a probability of 0.4 or a loss of Rs. 2 lakhs with a
probability of 0.6
Proposal III: Profit of Rs. 4.5 lakhs with a probability of 0.8 or a loss of Rs. 50,000 with a
probability of 0.2
If he wants to maximize profit and minimize the loss, which proposal he should prefer?
Sol:
Here, we should calculate the mathematical expectation of each proposal.
Expected Value E(X) = ∑ [x. p(x)]
Expected Value of Proposal I = (500000 x 0.6) + (80000 x 0.4) = 300000 – 32,000
= Rs. 2,68,000
Expected Value of Proposal II = (10,00.000 × 0.4) +(-2,00.000) = 400000 - 120000
= Rs: 2,80,000.
Expected Value of Proposal III = (450000 × 0.8) + ( - 50000 x 0.2) = 360000 - 10000
= Rs: 3,50,000
40
Since expected value is highest in case of proposal III, the businessman should prefer the proposal III.
REVIEW QUESTIONS:
1. Define frequency distribution.
2. Define Random Variable.
3. What are the important properties of frequency distribution?
4. What is meant by Expected Value?
5. What are the different types of probability distributions?
41
CHAPTER 5
BIONOMIAL DISTRIBUTION
Meaning & Definition:
Binomial Distribution is associated with James Bernoulli, a Swiss Mathematician.
Therefore, it is also called Bernoulli distribution. Binomial distribution is the probability
distribution expressing the probability of one set of dichotomous alternatives, i.e., success or
failure. In other words, it is used to determine the probability of success in experiments on which
there are only two mutually exclusive outcomes. Binomial distribution is discrete probability
distribution.
Binomial Distribution can be defined as follows: “A random variable r is said to follow
Binomial Distribution with parameters n and p if its probability function is:
P(r) = nC r prqn-r
Where, P = probability of success in a single trial
q=1–p
n = number of trials
r = number of success in ‘n’ trials.
Assumption of Binomial Distribution
(Situations where Binomial Distribution can be applied)
Binomial distribution can be applied when:-
1. The random experiment has two outcomes i.e., success and failure.
2. The probability of success in a single trial remains constant from trial to trial of the
experiment.
3. The experiment is repeated for finite number of times.
4. The trials are independent.
Properties (Features) of Binomial Distribution
1. It is a discrete probability distribution.
2. The shape and location of Binomial distribution changes as ‘p’ changes for a given ‘n’.
3. The mode of the Binomial distribution is equal to the value of ‘r’ which has the largest
probability.
4. Mean of the Binomial distribution increases as ‘n’ increases with ‘p’ remaining
constant.
5. The mean of Binomial distribution is np.
6. The Standard deviation of Binomial distribution is √npq
7. The variance of Binomial Distribution is npq
8. If ‘n’ is large and if neither ‘p’ nor ‘q’ is too close zero, Binomial distribution may be
approximated to Normal Distribution.
9. If two independent random variables follow Binomial distribution, their sum also
follows Binomial distribution.
42
Qn: Six coins are tossed simultaneously. What is the probability of obtaining 4 heads?
Sol: P(r) = nC r prqn-r
r=4
n=6
p=½
q=1–p=1–½=½
p ( r = 4) = 6C4 ( ½ )4 ( ½ )6-4
6! x (½)4+2
=
(6–4)!4!
6 ! x (½)6
=
2! 4!
6x5 1
= x
2x1 64
30
=
128
= 0.234
Qn: The probability that Sachin scores a century in a cricket match is 1/3. What is the probability that
out of 5 matches, he may score century in:
(1) Exactly 2 matches
(2) No match
Sol: Here p = 1/3 , n = 5, q = 2/3
P(r) = nC r prqn-r
(1) Probability that Sachin scores centuary in exactly 2 matches is:
P (r = 2) = 5C2 1/32 2/35-2
5! 1 8
= x x
(5–2)!2! 9 27
5X4 1 8
= x x
2X1 9 27
160
=
486
43
80 = 0.329
=
243
(2) Probability that Sachin scores century in no match is:
P( r = 0) = 5C0 1/3 0 2/35-0
5! X 1 x 2/35
(5-0)! 0!
= 1 x 1 x (2/3)5
32
=
243
= 0.132
Qn: Consider families with 4 children each. What percentage of families would you expect to have :-
(a) Two boys and two girls
(b) At least one boy
(c) No girls
(d) At the most two girls
(a) P( having a boy) = ½
P (having a girl) = ½
n = 4
P (getting 2 boys & 2 girls) = p (getting 2 boys)
= p (r = 2) = 4C2 (½) 2 (1/2)4-2
= 4! x (1/2)2 x (½)2
(4-2)! 2!
= 4 x 3 x (1/2)4
2
= 6 x 1/16 = 6/16 = 3/8
∴p = 1 - 1/3 = 2/3
∴ n x 2/3 = 4, n = 4 x 3/2 = 6
n = 6
Qn: Eight coins were tossed together for 256 times. Fit a Binomial Distribution of getting heads.
Also find mean and standard deviation.
Put r = 0, 1, 2, 3 .............. 8, then are get the terms of the Binomial Distribution.
Binomial Distribution
Mean = np = 8*1/2 = 4
REVIEW QUESTIONS:
1. Define Binomial Distribution.
2. What are the important properties of Binomial Distribution?
3. Examine whether the following statement is true:
“ For a Binomial Distribution, mean = 10 and S D = 4”
4. For a Binomial Distribution, mean = 6 and S D = √2. Find parameters. Write down all the
terms of the distribution.
*********
46
CHAPTER 6
POISSON DISTRIBUTION
Meaning and Definition
Poisson distribution is a limiting form of Binomial Distribution. In Binomial distribution,
the total number of trials is known previously. But in certain real life situations, it may be
impossible to count the total number of times a particular event occurs or does not occur. In such
cases Poisson distribution is more suitable.
Poison Distribution is a discrete probability distribution. It was originated by Simeon
Denis Poisson.
A random variable “r” said to follow Binomial distribution if its probability function is:
e –m . mr
P ( r) = r!
e = 2.7183
m = mean of Poisson distribution.
Properties of Poisson Distribution
1. Poisson distribution is a discrete probability distribution.
2. Poisson distribution has a single parameter ‘m’. When ‘m’ is known all the terms can
be found out.
3. It is a positively skewed distribution.
4. Mean and Variance of Poisson distribution are equal to ‘m’.
5. In Poisson distribution, the number of success is relatively small.
6. Standard deviation of Poisson distribution is √m.
Practical situations where Poisson distribution can be used
1. To count the number of telephone calls arising at a telephone switch board in a unit of
time.
2. To count the number of customers arising at the super market in a unit of time.
3. To count the number of defects in Statistical Quality Control.
4. To count the number of bacteria per unit.
5. To count the number of defectives in a park of manufactured goods.
47
Qn: A fruit seller, from his past experience, knows that 3 of apples in each basket will be defectives.
What is the probability that exactly 4 apples will be defective in a given basket?
Sol. m = 0.03
e –m . mr
P ( r) = r!
= 0.16807
Qn: It is known from the past experience that in a certain plant, there are on an average four
industrial accidents per year. Find the probability that in a given year there will be less
than four accidents. Assume Poisson distribution.
Sol:
e –m . mr
P ( r) = r!
m=4
∴ P ( exactly 4 apples are defective) = P (r < 4)
P (r < 4) = P (r = 0 or 1 or 2 or 3)
= P (r = 0) + P (r =1) + P (r = 2) + P (r = 3)
P (r = 0) = (e-4 . 4 0) / 0! = (0.0183 x 1) / 1 = 0.0183
P (r = 1) = (e-4 . 4 1) / 1! = (0.0183 x 4) / 1 = 0.0732
P (r = 2) = (e-4 . 4 2) / 2! = (0.0183 x 16) / 2 = 0.1464
P (r = 3) = (e-4 . 4 3) / 3! = (0.0183 x 64) / 6 = 0.1952
∴ P (r < 4) = 0.0183+ 0.0732 + 0.1464 + 0.1952 = 0.4331
Qn: Out of 500 items selected for inspection, 0.2% is found to be defective. Find how many lots
48
e –m . mr
P ( r) = r!
m = 500 x 0.2% = 1
Qn: In a certain factory producing optical lenses, there is a small chance of 1/500 for any one lens to
be defective. The lenses are supplied in packets of 10. Use P.D to calculate the approximate number of
packets containing no defectives, one defective, two defectives and three defective lenses respectively
in a consignment of 20,000 packets.
Sol:
e –m . mr
P ( r) = r!
m = 10 x 1/500 = 0.02
Qn: A Systematic sample of 100 pages was taken from a dictionary and the observed frequency
distribution of foreign words per page was found to be as follows:
No. of foreign words per page (x) : 0 1 2 3 4 5 6
49
Frequency (f) : 48 27 12 7 4 1 1
Calculate the expected frequencies using Poisson distribution.
Sol: At first, we have to know the parameter of P.D, which is equal to the mean of the given
distribution. So find the mean of the distribution:
Mean = (εfx) / εf
x 0 1 2 3 4 5 6
f 48 27 12 7 4 1 1 N = εf = 100
fx 0 27 24 21 16 5 6 (εfx) = 99
Mean = 99/100 = 0.99
REVIEW QUESTIONS:
1. Define Poisson distribution.
2. What are the important properties of P.D?
3. What are the situations under which P D can be applied?
4. Write down the probability function of P.D. whose mean is 2. What is its variance?
5. A machine is producing 4% defectives. What is the probability of getting at least 4 defectives
in a sample of 50 =, using (a) BD and (b) PD?
6. The following table gives the number of days in a 50 day period during which automobile
accidents occurred in a certain part of the city. Fit a Poisson distribution to the data:
No. of accidents 0 1 2 3 4
No. of days 19 18 8 4 1
********
50
CHAPTER 7
NORMAL DISTRIBUTION
Meaning and Definition
The normal distribution is a continuous probability distribution. It was first developed by
De-Moivre in 1733 as limiting form of binomial distribution. Fundamental importance of normal
distribution is that many populations seem to follow approximately a pattern of distribution as
described by normal distribution. Numerous phenomena such as the age distribution of any
species, height of adult persons, intelligent test scores of students, etc. are considered to be
normally distributed.
A continuous random variable, ‘X’, said to follow Normal Distribution if its probability
function is:
2
1 -- ½ (x-μ)/σ
P (x) = e
√2π . σ
51
Qn: The variable, x, follows normal distribution with mean = 45 and S.D = 10. Find the
probability that x ≥ 60.
Z = (x – μ) / σ
0.4332
P (x ≥ 60) = 0.0668
Qn: The variable, x, follows normal distribution with mean = 45 and S.D = 10. Find the
probability that x ≤ 40.
Z = (x – μ) / σ
0.1915
P (x ≤ 40) = 0.3085
Qn: The variable, x, follows normal distribution with mean = 45 and S.D = 10. Find the
probability that 40 ≤ x ≤ 56.
Z = (x – μ) / σ
When x = 40, Z = (40 -- 45) / 10 = -5 / 10 = - 0.5
When x = 56, Z = (56 -- 45) / 10 = 11 / 10 = 1.1
0.1915 0.3643
Qn: The scores of students in a test follow normal distribution with mean = 80 and S D = 15. A
sample of 1000 students has been drawn from the population. Find (1) probability that a randomly
chosen student has score between 85 and 95 (2) appropriate number of students scoring less than 60.
Sol.
(1) μ = 80, σ = 15, x1 = 85, x2 = 95
Z = (x – μ) / σ
0.1293 0.212
54
0.0918 0.4082
Qn: In a competitive examination, 5000 candidates have appeared. Their average mark was
62 and S.D was 12. If there are only 100 vacancies, find the minimum marks that one should
score in order to get selection.
Sol: 𝛍 = 62, σ = 12
Number of vacancies = 100
Percentage of vacancies to the total number of candidates = (100/5000) x 100 = 2% = 0.02
Area corresponds to the students who will get selection is shown in the following normal
curve:
0.02
Therefore, the area to the left of the above area of 0.02 is:
0.48
55
Locate the area of 0.48 in the table and find the Z – value corresponds to it.
The table shows the area nearest to 0.48 is 0.4798, and the corresponding z-value is 2.05
Z = 2.05
(x – μ)/σ = 2.05
(x -- 62)/12 = 2.05, x – 62 = 2.05 x 12
x -- 62 = 24.6, ∴ x = 24.6 + 62 = 86.6
∴ The minimum marks one should score to get section = 86.6 marks
Procedure:
1. Find the mean and S.D of the given distribution and take them as μ and σ
(parameters) of the normal distribution.
2. Take the lower limit of each class as the x values.
3. Calculate the z-value corresponding to each x-value by using formulae z = (x—μ)/σ.
Z-value of first and last values need not be computed.
4. Find the area corresponds to z-value from the standard normal distribution table. The
area corresponds to the first and last z-values will be 0.5.
5. Find the area of each class using the area (probability) of respective class limits.
(Take the difference in case of same signs; and take the total in case of opposite signs)
6. Multiply tye area of ech class by the total frequency to the frequency of the class.
The new frequency distribution with theoretical frequencies will be a normal
approximation to the given frequency distribution.
Qn: Fit a normal distribution to the following data:
∴ μ = 44 and σ = 12.45
Review Questions:
1. Define normal distribution.
2. What are the important properties of normal distribution?
3. Explain the importance of normal distribution.
4. Explain the procedure for construction of normal distribution.
5. If x follows a normal distribution with mean 12 and variance 16, find P(x≥20).
6. The weekly wages of 1000 workers are normally distributed with mean of 70 and S.D
of 5. Estimate the number of workers whose wages lie between 69 and 72.
7. In an aptitude test administered to 900 students, the mean score is 50 and S.D is 20.
Find the number of students securing scores (a) between 30 and 70 (b) exceeding 65.
Find the value of the score exceeded by the top 90 students.
8. Construct a normal distribution to the following data of marks obtained by 100
students:
Marks 60-62 63-65 66-68 69-71 72-74
No. of Students 5 18 42 27 8
*********
57
Chapter 8
EXPONENTIAL DISTRIBUTION
Definition of Exponential Distribution
REVIEW QUESTIONS:
Chapter 9
UNIFORM DISTRIBUTION
Definition of Uniform Distribution
A discrete random variable, x, follows uniform distribution if its probability density function
is :
For example, when a die is thrown, let x stands for the numbers obtained.
REVIEW QUESTIONS:
CHAPTER 10
STATISTICAL INFERENCE
Basic Concepts
Population: In statistics, ‘Population’ refers to collection of all individuals or objects or
items or things under consideration.
Finite Population: If a population contains a finite number of objects, it is called finite
population. Eg: Students in a college.
Infinite Population: If a population contains a infinite number of objects, it is called infinite
population. Eg: Stars in the sky.
Sample: A sample is a representative part of the population.
Sample size: Number of units in a sample group is called sample size. If sample size is too
small, it may not represent the population. If it is very large, it may require more time and
money for investigation. Hence, the size of a sample should be optimum.
Large Sample: If the size of a sample exceeds thirty, it is called as large sample.
Small Sample: If the size of a sample does not exceed thirty, it is called as small sample.
Parameter: It is a statistical measure derived from population elements. If the arithmetic
mean is computed from all the elements of a population, it is a population parameter. Here it
is called population mean. Population mean is denoted by the symbol μ. Population standard
deviation is denoted by σ.
Statistic: It is a statistical measure derived from sample elements. If the arithmetic mean is
computed from the elements of a sample group, it is a sample statistic. Here it s called sample
mean. Sample mean is denoted by the symbol x̄. Sample standard deviation is denoted by ‘s’.
Statistical inference
Statistical inference refers to the process of selecting samples and using sample statistic to
draw inference or conclusion about the population parameter or population distribution. The
two main branches of statistical inference are:
(a) Testing of Hypothesis
(b) Estimation
Testing of Hypothesis
Testing of hypothesis is the process under which a statistical hypothesis about a population is
formulated and its validity is tested on the basis of a random sample drawn from that
population. For testing the validity of a hypothesis, a number of tests are used. All these tests
can be classified into two categories, namely (i) parametric tests and (ii) non-parametric
tests. Z-test, t-test, Chi-square test, F-test, etc. are commonly used statistical tests.
(6) Locate the table value (critical value) of the test statistic at specified level of
significance
(7) Compare the calculated value of the test statistic with the corresponding table value
(critical value) and decide whether to accept or reject the null hypothesis. If calculated
value of the test statistic is numerically less than the table value, the null hypothesis is
accepted. If calculated value of the test statistic is numerically more than the table
value, the null hypothesis is rejected.
Hypothesis
Hypothesis is a tentative solution or assumption or proposition about the parameter or
nature of the population. It is a logically drawn conclusion about the population.
Null Hypothesis
This is the original hypothesis. A null hypothesis is a hypothesis which formulated for
the purpose of rejection. The term “null” refers to ‘nil’ or ‘no’ or ‘amounting to
nothing’. This hypothesis is generally set up as there is no significant difference
between the sample statistic and population parameter. A null hypothesis is denoted
by H0
Alternative Hypothesis
Any hypothesis other than null hypothesis is called alternative hypothesis. It is the
hypothesis which is accepted when the null hypothesis is rejected. An alternative
hypothesis is denoted by H1 or Ha
Sampling Distribution
Sampling distribution is a distribution of sample statistic derived from various samples
drawn from the same population. Since sample statistic is a random variable,
sampling distribution is a probability distribution.
Standard Error
Standard Error (SE) of a statistic is the standard deviation of the sampling distribution
of that statistic. For example, the Standard deviation of the sampling distribution of
the sample mean is σ/√n, where σ = population S.D. and n = sample size. Therefore
the Standard Error (SE) of sampling distribution of mean is σ/√n .
Uses of Standard Error
(1) Standard Error is used for testing a given hypothesis.
(2) Standard Error gives an idea about the reliability of a sample. The reciprocal of
Standard Error is a measure of reliability of the sample.
(3) Standard Error can be used to determine the confidence limits for population
values like mean, proportion and standard deviation.
Errors in Testing of Hypotheses
In any test of hypothesis is the decision is to accept or to reject a null hypothesis. The
61
decision is based on the information supplied by the sample data. The four
possibilities of the decision are:
(1) Accepting a null hypothesis when it is true
(2) Rejecting a null hypothesis when it is false
(3) Rejecting a null hypothesis when it is true
(4) Accepting a null hypothesis when it is false
It is clear that the possibilities (1) and (2) are correct decisions. But the
possibilities (3) and (4) are errors.
Type I Error:
The error which is committed by rejecting the null hypothesis even when it is
true is called Type I error. It is denoted by alpha (α).
Type II Error:
The error which is committed by accepting the null hypothesis even when it is
wrong is called Type II error. It is denoted by beta (β).
When we try to reduce the possibility for one error, the possibility for the
other will be increased. Therefore, a compromise of these two is to be ensured. Type
II error is more dangerous than Type I error.
Power of a Test
Probability for rejecting the null hypothesis when the alternative hypothesis is true is
called power of a test.
Power of a test = 1 – P(Type II Error)
Level of Confidence
Level of confidence is the probability of accepting a true null hypothesis.
Level of Confidence = 1 – Level of significance.
If Level of significance is 5%, Level of Confidence = 95%.
Level of Significance
Level of Significance is the probability of rejecting a true null hypothesis. Level of
Significance is denoted by alpha (α). If nothing is mentioned about the level of
significance, it is taken as 5%.
Level of Significance (α) = 1 – level of acceptance
Acceptance Region
The area under the normal curve which represents the acceptance of a null hypothesis
(i.e; level of confidence) is called the Acceptance Region or Acceptance Area.
Acceptance Region = 100% -- Rejection Region
Rejection Region (Critical Region)
The area under the normal curve which represents the rejection of a null hypothesis
(i.e; level of significance) is called the Rejection Region or Critical Region.
Rejection Region = 100% -- Acceptance region
Degree of Freedom
Degree of freedom is defines as the number of independent observations which is
obtained by subtracting the number of constraints from the total number of
observations.
Degree of freedom (d.f) = Total No. observations – No. of constraints.
62
Two-tailed Test
A two tailed test is one in which we reject the null hypothesis if the computed value
of the test statistic is significantly greater than or lower than the critical value (table
value) of the test statistic. Thus in two tailed tests the critical region is represented by
both tails. If we test the hypothesis at 10% level of significance, the size of the
acceptance region is 90% and the size of the rejection region is 10% on both sises
together.
45% 45%
5% Acceptance region 5%
Rejection Region
One-tailed Test
One tailed test is one in which the rejection region is located in only one tail of the
normal curve. It may be at left tail or right tail, depending on the alternative
hypothesis. If the alternative hypothesis is with ‘<’ (less than) sign, the rejection
region is placed on the left tail, and the test is called left-tailed test. If the alternative
hypothesis is with ‘>’ (more than) sign, the rejection region is placed on the right tail,
and the test is called right-tailed test.
45% 50%
5% Acceptance region
Rejection Region
(Left-tailed Test)
50% 45%
Acceptance region 5%
Rejection Region
(Right-tailed Test)
63
Parametric Tests
When testing of hypothesis is done, if some assumptions are made about the nature of
population distribution, then the test statistic applied there is called parametric test.
There are number of parametric tests. Eg: t-test, Z test, F test, etc.
Non-Parametric Tests
When testing of hypothesis is done, if no assumptions are made about the nature of
population distribution, then the test statistic applied there is called non-parametric
test. There are number of non-parametric tests. Eg: Chi-square Test, Sign tests, Signed
Rank Tests, Rank Sum Tests, Run Test, Kolmogrov Smirnov Test, etc. Since, no
assumptions are made about the nature of population, non-parametric tests are also
called distribution-free tests.
Z / t = Difference/Standard Error
Difference = Difference between sample mean and the given population mean
Standard Error = σ / √n ( If population S.D is known)
Standard Error = s / √n ( If population S.D is unknown, but sample is large)
64
Qn: The mean life of random sample of 100 tyres is 15269 km. The manufacturer
claims that the average life of tyres manufactured by the company is 15200 km with
SD of 1248 km. Test the validity of company’s claim.
Sol:
H0 : There is no significant difference between sample mean and population mean
( i.e; μ = 15200)
H1 : There is significant difference between sample mean and population mean
( i.e; μ ≠ 15200)
Since population S.D is known, the test statistic applicable here is Z-test
Z = D/SE
D = x̄ - μ = 15269-15200 = 69
S E = σ/√n = 1248/√100 = 1248/10 = 124.8
Z = 69/124.8 = 0.553
Level of significance = 5%
Degree of freedom = infinity (population S D is known)
Table value (Critical value) at 5 % level of significance and infinity degree of
freedom is 1.96
Since calculated value of Z is less than the critical value, H0 is accepted. That is, there
is no significant difference between sample mean and population mean. μ =
15200. So, we may conclude that the claim of the company is valid.
Qn: A sample of size 400 was drawn and the sample mean was found to be 99. Test
whether this sample could have come from the normal population with mean = 100 ad
S.D = 8 at 5% level of significance.
65
Sol:
H0 : There is no significant difference between sample mean and population mean
( i.e; μ = 100)
H1 : There is significant difference between sample mean and population mean
( i.e; μ ≠ 100)
Since population S.D is known, the test statistic applicable here is Z-test
Z = D/SE
D = x̄ - μ = 100 -- 99 = 1
S E = σ/√n = 8/√400 = 8/20 = 0.4
Z = 1/0.4 = 2.5
Level of significance = 5%
Degree of freedom = infinity (population S D is known)
Table value (Critical value) at 5 % level of significance and infinity degree of
freedom is 1.96
Since calculated value of Z is more than the critical value, H0 is rejected. H1 is
accepted. That is, there is significant difference between sample mean and
population mean. So, we may conclude that μ ≠ 100
Qn: A random sample of 200 bottles of talcum powder gave an average weight of
49.5 gram with a S.D of 2.1 gram. Do we accept the hypothesis of weight per
bottle is 50 gram at 1% level of significance?
Sol:
H0 : There is no significant difference between sample mean and population mean
( i.e; μ = 50)
H1 : There is significant difference between sample mean and population mean
( i.e; μ ≠ 50)
Since sample is large, the test statistic applicable here is Z-test
Z = D/SE
D = x̄ - μ = 50 – 49.5 = 0.5
S E = s/√n = 2.1/√200 = 2.1/14.142 = 0.148
Z = 0.5/0.148 = 3.378 (Calculated value)
Level of significance = 1%
Degree of freedom = infinity (population is large)
Table value (Critical value) at 1 % level of significance and infinity degree of
freedom is 2.58
Since calculated value of Z is more than the critical value, H0 is rejected. H1 is
accepted. That is, there is significant difference between sample mean and
66
Qn: The average life of 26 bulbs were found to be 1200 hours with a S.D of 150 hours. Test
whether these bubs could be considered as a random sample from a normal population with
mean 1300 hours.
Sol: H0 : There is no significant difference between sample mean and population mean
( i.e; μ = 1300)
H1 : There is significant difference between sample mean and population mean
( i.e; μ ≠ 1300)
Since sample is small, the test statistic applicable here is t-test
t = D/SE
D = x̄ - μ = 1300 – 1200 = 100
S E = s/√n-1 = 150/√26-1 = 150/5 = 30
t = 100/30 = 3.333 (Calculated value)
Level of significance = 5%
Degree of freedom = 26-1 = 25 (sample is small)
Table value (Critical value) at 5% level of significance and 25 degree of freedom is
2.06
Since calculated value of Z is more than the critical value, H0 is rejected. H1 is
accepted. That is, there is significant difference between sample mean and
population mean. So, we may conclude that the bulbs could not e drawn from the
normal population with mean 1300 hours ( i.e; μ ≠ 1300).
Qn: A typist claims that he can type at a speed of more than 120 words per minute. Of the 12
tests given to him, he could perform an average of 135 words with a S.D of 40. Is his claim
valid at 1% level of significance?
Sol: H0 : There is no significant difference between sample mean and population mean
( i.e; μ = 120)
H1 : There is significant difference between sample mean and population mean
( i.e; μ > 120)
Here, the test One-tailed test ( Right tailed test)
Since sample is small, the test statistic applicable here is t-test
t = D/SE
D = x̄ - μ = 135 – 120 = 15
S E = s/√n-1 = 40/√12-1 = 40/√11 = 40/3.32 = 12.05
t = 15/12.05 = 1.245 (Calculated value)
Level of significance = 1%
67
This testing of hypothesis is used to test whether the difference between two sample means
are significant or not. If the difference is not significant, they are treated as equal; or we may
think that the two samples are drawn from the same population.
Procedure:
1. Set up H0 and H1
H0 : There is no significant difference between two sample means( i.e; μ1 = μ2)
H1 : is no significant difference between two sample means( i.e; μ1 ≠ μ2)
2. Decide the test statistic:
The test statistic applicable here is Z-test or t-test.
If population S.D.(i.e; σ) is known, apply Z-test
If population S.D.(i.e; σ) is unknown but sample is large, apply Z-test
If population S.D.(i.e; σ) is unknown but sample is small, apply t-test
3. Apply the appropriate formula for computing the value of the test statistic:
Z / t = Difference/Standard Error
6. Locate the table value (critical value) of the test statistic at specified level of
significance and fixed degree of freedom.
7. Compare the calculated value of test statistic with the table value and decide
whether to accept or reject the null hypothesis. If calculated value of the test
statistic is numerically less than the table value, the null hypothesis is accepted. If
calculated value of the test statistic is numerically more than the table value, the
null hypothesis is rejected.
Qn: The mean yield of wheat from District I was 210Kg per acre from a sample of 100 plots.
In another District II, the mean yield was 200 Kg per acre from a sample of 150 plots.
Assuming that the S.D of yield of the entire State was 11 Kg, test whether there is any
significant difference between the mean yields of the crop in the two districts.
Sol:
District I District II
n1 = 100 n2 = 150
x̄1 = 210 x̄2 = 200
σ 1 = 11 σ 2 = 11
Since population S.Ds are given, the test statistic applicable here is Z-test.
Z = Difference / S E
Difference = x̄1 – x̄2 = 210 – 200 = 10
SE = √(σ12 / n1) + (σ22 / n2) (Population S.Ds are known. For the entire State SD is 11).
= √(112 / 100) + (112 / 150) = √1.21+0.81 = √2.02 = 1.42
Z = 10/1.42 = 7.04
Level of significance = 5%
Since the calculated value of Z is more than the table value, H0 is rejected. We accept H1. So
we may conclude that there is significant difference in the mean yields of crops in two
districts.
Qn: Electric bulbs manufactured by X Ltd. and Y Ltd. gave the following results:
Sol: H0 : There is no significant difference between two sample means( i.e; μ1 = μ2)
H1 : is no significant difference between two sample means( i.e; μ1 ≠ μ2)
Since population S.Ds are unknown but samples are large, the test statistic applicable
here is Z-test.
Z = Difference / S E
Difference = x̄1 – x̄2 = 1300 – 1248 = 52
SE = √(s12 / n1) + (s22 / n2) (Population S.Ds are unknown)
= √(822 / 100) + (932 / 100) = √67.24+86.49 = √153.73 = 12.4
Z = 52/12.4 = 4.19
Level of significance = 5%
Since the calculated value of Z is more than the table value, H0 is rejected. We accept H1. (i.e;
μ1 ≠ μ2). So we may conclude that there is significant difference in the mean life of bulbs of
the two makes.
Qn: Two batches of same product are tested for their mean life. Assuming that lives of the
two products follow a normal distribution, test the hypothesis that the mean life is same for
both the batches, given the following information:
Since population S.Ds are unknown and samples are small, the test statistic applicable
here is t-test.
t = Difference / S E
Difference = x̄1 – x̄2 = 820 – 750 = 70
SE = √(n1s12+n2s22)/n1+n2 -2 x (1/n1 + 1/n2) (Population S.Ds are unknown and
samples are small)
= √(10*122)+ (8*142) / 10+8-2) x (1/10 + 1/8)
= √3008/16 x 0.225 = √42.3 = 6.5
t = 70/6.5 = 10.77 (Calculated Value)
Level of significance = 5%
Since the calculated value of t is more than the table value, H0 is rejected. We accept H1. (i.e;
μ1 ≠ μ2). So we may conclude that the lives of products produced in two batches are not
same.
Qn: In a test given to 2 groups of students, the marks obtained were as follows:
Group I 18 20 36 50 49 36 34 49 41
Group II 29 26 28 35 30 44 46
Test whether the group means are equal.
Sol: Here we have to find the Means and S.Ds of the two samples.
Since population S.Ds are unknown and samples are small, the test statistic applicable
here is t-test.
t = Difference / S E
Difference = x̄1 – x̄2 = 37 – 34 = 3
SE = √(n1s12+n2s22)/n1+n2 -2 x (1/n1 + 1/n2) (Population S.Ds are unknown and
samples are small)
= √(9*126)+ (7*55.14) / 9+7-2) x (1/9 + 1/7)
= √1510.98/14 x 0.254 = √27.41 = 5.24
t = 3/5.24 = 0.573 (Calculated Value)
Level of significance = 5%
71
Since the calculated value of t is less than the table value, H0 is accepted. (i.e; μ1 = μ2). So
we may conclude that the difference in the group means are not significant. They are equal.
Here the observations in one sample are some way related to the observations in the other.
Therefore they are called paired observations. The test statistic applicable here is t-test.
Procedure:
1. Set up H0 and H1
H0 : There is no significant difference between samples
H1 : There is significant difference between samples
2. Decide test statistic:
Since the paired data are comparatively less, the test statistic applicable here is always
t-test.
3. Apply the appropriate formula for computing the value of the test statistic.
t = d/SE
Where:
4. Specify the level of significance. Take 5%, if nothing is mentioned in the question.
5. Fix the degree of freedom. d.f = n – 1 , where n= Number of pairs of observations.
6. Locate the critical value of the test statistic (t-test) at specified level of significance
and fixed degree of freedom.
7. Compare the calculated value of test statistic with the table value and decide whether
to accept or reject the null hypothesis. If calculated value of the test statistic is
numerically less than the table value, the null hypothesis is accepted. If calculated
value of the test statistic is numerically more than the table value, the null hypothesis
is rejected.
Qn: The marks scored by 10 students, before and after providing special coaching, are
given in the following table:
Before 67 24 57 55 63 54 56 68 33 43
After 70 38 58 58 56 67 68 72 42 38
Test whether there is any significant difference in their performance.
Sol: H0 : There is no significant difference between samples
72
t = d/SE
Computation of men and standard deviation of the difference between the values
Score (Before) Score (After) Difference (d) d2
67 70 3 9
24 38 14 196
57 58 1 1
55 58 3 9
63 56 -7 49
54 67 13 169
56 68 12 144
68 72 4 16
33 42 9 81
43 38 -5 25
εd= 47 εd2 = 699
Level of significance = 5%
Table value (critical value) of t at 5% level of significance and 9 degree of freedom is 2.262.
Since the calculated value of t is less than the critical value, the null hypothesis is accepted.
So, we may conclude that there is no significant difference in the performance of the
students.
This type of testing of hypothesis is used to test whether there is any significant difference
between the sample proportion and the given population proportion.
Procedure:
73
1. Set up H0 and H1 :
H0 : There is no significant difference between sample proportion and population
proportion ( i.e; H0 : P = P0)
H1 : There is significant difference between sample proportion and population
proportion ( i.e; H0 : P ≠ P0)
2. Decide the test statistic:
The test statistic applicable here is Z-test
3. Apply appropriate formulae for computing the value of Z ( i.e; calculated value):
Z = Difference / S E ie; Z = ( p -- P) / S E
Where p = sample proportion, P = Population proportion
S E = √ PQ / n
4. Decide the level of significance (Take 5%, if nothing is mentioned in the question).
5. Fix the degree of freedom ( Infinity d.f)
6. Locate the table value of Z at specified level of significance and fixed degree of
freedom.
7. Compare the calculated value of Z with the table value and decide whether to accept
or reject the null hypothesis. If calculated value of Z is numerically less than the table
value, the null hypothesis is accepted. If calculated value of Z is numerically more
than the table value, the null hypothesis is rejected.
Qn: It is found that out of 500 units of a product produced by a machine, 30 are
defectives. Test whether the machine produces 2% defective items on an average.
Sol:
H0 : There is no significant difference between sample proportion and population
proportion ( i.e; H0 : P = 0.02)
H1 : There is significant difference between sample proportion and population
proportion ( i.e; H0 : P ≠ 0.02)
Z = ( p -- P) / S E
P = 0.02, p = 30/500 = 0.06, Q = 1 – Q =1 - 0.02 = 0.98, n = 500
S E = √ PQ / n = √ 0.02 x 0.98 / 500 = √0.0196/500 = √0.0000392 = 0.0063
∴ Z = (0.06 – 0.02) / 0.0063 = 0.04/0.0063 = 6.349
Level of significance = 5%
Degree of freedom = infinity
Table value of Z at 5% level of significance and infinity degree of freedom is 1.96
Since the calculated value of Z is more than the table value, null hypothesis is rejected.
We accept alternative hypothesis. P ≠ 0.02. So, it is not possible to think that the
machine produces 2% defective items.
74
REVIEW QUESTIONS:
Sample I 25 32 30 32 24 14 32
Sample II 24 34 30 22 42 31 40 35 32 30
20. In a sample of 600 people in Bihar 336 are coffee drinkers and the rest are tea drinkers. Can
we assume that both coffee and tea are equally popular in the State at 1% level of
significance?
21. In a sample of 900 men from a certain large city 675 were found to be smokers. In a random
sample of 1350 men from another large city 675 were found to be smokers. Do the data
indicate that the cities are significantly different in respect of the prevalence of smoking
among men?
22. A sample of size 50 has S.D of 10.5. Can you contradict the hypothesis that the population
S.D. is 12?
77
CHAPTER 11
CHI-SQUARE TEST
What is Chi-Square Value?
The word “Chi-square” is denoted by the symbol, χ2. Chi-square is a value (quantity)
which describes the magnitude of the difference between observed frequencies and expected
frequencies.
Chi-Square Test
Chi-square test is a statistical test used to test the significance of the difference between
observed frequencies and the corresponding theoretical frequencies (expected frequencies) of
a distribution, without any assumption about the nature of distribution of the population. This
is the most popular widely used non-parametric test. It was developed by Prof. Karl Pearson.
1. Used to test goodness of fit: As a test for goodness of fit, χ2 test can be used to test
how far the theoretical frequencies fit to the observed frequencies.
2. Used to test independence: As a test of independence, χ2 test is used to test whether
the attributes of a sample are associated or not.
3. Used to test homogeneity: As a test of homogeneity, χ2 test is used to test whether
different samples are homogeneous as far as a particular attribute is concerned.
4. Used to test population variance: Here, Chi-square test is used for testing the given
population variance when the sample is small. In other words, it used to test whether
there is any significant difference between sample variance and population variance.
Here, the test statistic value (Chi-square value) is obtained by using the following
formulae (ns2/σ2).
Procedure:
1. Set up H0 and H1
H0 : There is goodness of fit between observed frequencies and expected frequencies.
H0 : There is no goodness of fit between observed frequencies and expected
78
frequencies.
2. Decide the test statistic. Here, the test statistic is Chi-Square test.
3. Apply the appropriate formula:
χ2 = [(O—E)2/E]
where o = Observed frequencies and E = Expected frequencies
4. Specify the level of significance. If nothing is mentioned, take 5% level of
significance.
5. Fix the degree of freedom. Degree of freedom = n – r – 1
Where n = number of pairs of observations
r = number of parameters computed from the given data to find the expected
frequencies.
6. Obtain the table value of Chi-square at specified level of significance and fixed
degree of freedom.
7. Compare the actual value of Chi-Square with the table value and decide whether to
accept or reject the null hypothesis. If calculated value is less than the table value, null
hypothesis is accepted and otherwise it is rejected.
Qn: The numbers of road accidents per week in a certain city were as follows:
Are these frequencies in agreement with the belief that the accidents occurred were
the same during the 10 week period?
Sol:
χ2 = [(O—E)2/E]
Here the Observed values (Actual values) are 12, 8, 20, 2, 14, 10, 15, 6, 9 and 4.
If accidents occurred are same, then the number of accidents per week which we may
expect is 10 (i.e; the average of the given values).
i.e; E = 10
2 10 64 6.4
14 10 16 1.6
10 10 0 0.0
15 10 25 2.5
6 10 16 1.6
9 10 1 0.1
4 10 36 3.6
χ2 = 26.6
Level of significance = 5%
Degree of Freedom = n -- r -- = 10 – 0 -- 1 = 9
Since calculated value is more than the table value, null hypothesis is rejected. We
accept alternative hypothesis. So we may conclude that the given figures do not agree
with the belief that accident occurred were same during the 10 weeks period.
Sol:
H0 : There is goodness of fit between the given figures and the figures expected in
general examination
H0 : There is no goodness of fit between the given figures and the figures expected in
general examination
χ2 = [(O—E)2/E]
Here the Observed values (Actual values) for first, second, third and failed categoris
of students are respectively 24, 62, 68 and 46.
If results are in the ratio of 2:3:3:2, then the number of students for above categories
may be expected as follows:
Level of significance = 5%
Degree of Freedom = n -- r -- = 4 – 0 -- 1 = 3
Since calculated value is more than the table value, null hypothesis is rejected. We
accept alternative hypothesis. So we may conclude that the given figures do not
commensurate with the general examination result which is in the ratio of 2:3:3:2.
Testing of Independence
Procedure:
1. Set up H0 and H1
H0 : There is independence between observed frequencies and expected frequencies.
H0 : There is no independence between observed and expected frequencies.
2. Decide the test statistic. Here, the test statistic is Chi-Square test.
3. Apply the appropriate formula:
χ2 = [(O—E)2/E]
where o = Observed frequencies and E = Expected frequencies
Here E values are obtained by using the following formula:
E Value = [(Row Total x Column Total)/Grand Total]
E Values are computed by preparing a table called Contingency Table.
4. Specify the level of significance. If nothing is mentioned, take 5% level of
significance.
5. Fix the degree of freedom. Degree of freedom = (r – 1) x (c – 1)
Where r = number of rows; c = number of columns
6. Obtain the table value of Chi-square at specified level of significance and fixed
degree of freedom.
7. Compare the actual value of Chi-Square with the table value and decide whether to
accept or reject the null hypothesis. If calculated value is less than the table value, null
hypothesis is accepted and otherwise it is rejected.
81
Qn: From the following data, can you say that there is relation between the habit of
smoking and literacy:
Smokers Non-smokes
Literates 83 57
Illiterates 45 68
Sol:
χ2 = [(O—E)2/E]
Here the Observed values (Actual values) are 83, 57, 45 and 68.
The E Values corresponding to the above ‘O’ values can be found out by preparing a 2
X 2 contingency table:
2 X 2 Contingency Table
Smokers Non-smokes Total
Literates [(83+57) x (83+45)] /253 (140 x 125) / 253
= 71 = 69 140
Illiterates (113 x 128) /253 (113 x 125) / 253
= 57 = 56 113
Total 128 125 253
So, the E values are 71, 69, 57 and 56.
Level of significance = 5%
Degree of Freedom = (2 – 1) x (2 – 1) = 1 x 1 = 1
Since calculated value is more than the table value, null hypothesis is rejected. We
accept alternative hypothesis. So we may conclude that there is no independence
between smoking habit and literacy. In other words, smoking habit and literacy are
related.
82
Qn: In a sample study about the tea drinking habit in a town, following data are
observed in a sample of size 200.
46%nwerenmale, 26% were tea drinkers and 17% were male tea drinkers.
Sol:
χ2 = [(O—E)2/E]
Here all the Observed values (Actual values) are not directly given in the question.
So, we have to find the missing figures with the help of a 2 x 2 contingency table:
The E Values corresponding to the above ‘O’ values can be found out by preparing a 2
X 2 contingency table:
2 X 2 Contingency Table ( ‘E’ values)
Tea drinkers Non-tea drinkers Total
Male (92 x 52) / 200 = 24 (92 x 148) / 200 = 68 92
Female (108 x 52)/200 = 28 (108 x 148)/200 = 80 108
Total 52 148 200
Level of significance = 5%
Degree of Freedom = (2 – 1) x (2 – 1) = 1 x 1 = 1
83
Since calculated value is more than the table value, null hypothesis is rejected. We
accept alternative hypothesis. So we may conclude that there is no independence
between gender and smoking habit. In other words, gender and smoking habit are
closely associated.
Testing of Homogeneity
Procedure:
1. Set up H0 and H1
H0 : There is homogeneity between the samples on the basis of the attribute.
H0 : There is no homogeneity between the samples on the basis of the attribute.
2. Decide the test statistic. Here, the test statistic is Chi-Square test.
3. Apply the appropriate formula:
χ2 = [(O—E)2/E]
where o = Observed frequencies and E = Expected frequencies
Here ‘E’ values are obtained by using the following formula:
‘E’ Value = [(Row Total x Column Total)/Grand Total]
‘E’ Values are computed by preparing a table called Contingency Table.
4. Specify the level of significance. If nothing is mentioned, take 5% level of
significance.
5. Fix the degree of freedom. Degree of freedom = (r – 1) x (c – 1)
Where r = number of rows; c = number of columns
6. Obtain the table value of Chi-square at specified level of significance and fixed
degree of freedom.
7. Compare the actual value of Chi-Square with the table value and decide whether to
accept or reject the null hypothesis. If calculated value is less than the table value, null
hypothesis is accepted and otherwise it is rejected.
Hindus Muslim
No. of families drinking tea 124 16
No. families not drinking tea 56 10
Is there any difference between the communities in the matter of tea drinking?
Sol:
χ2 = [(O—E)2/E]
Here the Observed values (Actual values) are 124, 16, 56, and 10
84
The ‘E’ values corresponding to the above ‘O’ values can be found out by preparing a 2
X 2 contingency table:
2 X 2 Contingency Table
Smokers Non-smokes Total
No. of families (140 x 180) /206 = 122 (140 x 26)/206 = 18 140
drinking tea
No. of families (66 x 180) /206 = 58 (66 x 26) / 206 = 8
not drinking tea 66
Total 180 26 206
So, the ‘E’ values are 122, 18, 58 and 8.
Level of significance = 5%
Degree of Freedom = (2 – 1) x (2 – 1) = 1 x 1 = 1
Since calculated value is less than the table value, null hypothesis is accepted. So we
may conclude that there is homogeneity between communities in the matter of tea
drinking.
Testing of Variance
Procedure:
1. Set up H0 and H1
H0 : There is no significant difference between sample variance and population
variance.
H1 : There is significant difference between sample variance and population variance.
2. Decide the test statistic. Here, the test applicable is Chi-square test.
3. Apply the appropriate formula for computing the value of test statistic.
χ2 = ns2/σ2 , where n = sample size, s2 = sample variance, σ2 = population variance.
4. Specify the level of significance. Take 5%, unless specified otherwise.
5. Fix the degree of freedom. d.f = n –1.
6. Locate the table value of Chi-square at specified level of significance and fixed
degree of freedom.
85
7. Compare the actual value of Chi-Square with the table value and decide whether to
accept or reject the null hypothesis. If calculated value is less than the table value, null
hypothesis is accepted and otherwise it is rejected.
Qn: A sample is drawn from a population which follows normal distribution. The size of
sample and S.D are respectively 10 and 5. Test whether this is consistent with the
hypothesis that the S D of the population is 5.3
Sol:
H0 : There is no significant difference between sample S.D and population S.D. (i.e; H0 :
S.D of population = 5.3)
H1 : There is significant difference between sample S.D and population S.D. (i.e; H1 : S.D
of population ≠ 5.3)
Since calculated value is less than the table value, null hypothesis is accepted. So we may
conclude that there is no significant difference between sample S.D and population S.D.
The population S.D = 5.3
Qn: A sample group of 10 students are selected randomly from a class. Their weights (in
K.g) are 49, 40, 53, 38, 52, 47, 48, 45, 55, and 43. Can we say that the population
variance is 20 Kg?
Sol:
H0 : There is significant difference between sample variance and population variance. (i.e;
H0 : Variance of population ≠ 20)
χ2 = ns2/σ2
Here, n = 10, σ2 = 20, Sample variance is to be computed from the given data.
86
2
Sample Variance (s2) = [ (X -- x̄) ]/n = 280/10 = 28.
Level of significance = 5%
Since calculated value is less than the table value, null hypothesis is accepted. So we may
conclude that there is no significant difference between sample variance and population
variance. ∴ The population variance = 20 Kg.
REVIEW QUESTIONS:
Chapter 12
ANALYSIS OF VARIANCE
Meaning of Analysis of Variance
The testing of hypotheses so far discussed consists of different sample groups which
do not exceed two. If there are three or more sample groups, the testing of equality of them
cannot be done in any of the methods which have already been discussed. The testing of
significance of the difference among three or more samples is generally done by using the
technique of analysis of variance. In case of analysis of variance, as part of testing procedure,
we have to prepare a separate statement called Analysis of Variance Table or ANOVA Table.
Therefore, this type of testing of hypothesis is also called analysis of variance. The test
statistic used for Analysis of Variance is F-test. F-test is a parametric test.
(iii) Draw one-way ANOVA Table and enter the values of SST and SSC
(iv) Find the value of SSE. SSE = SST – SSC
(v) Find the degree of freedom in the third column as indicated in the proforma.
(vi) Find MSC. MSC = SSC ÷ (C–1)
(vii) Find MSE. MSE = SSE ÷ (N–C)
(viii) Find F–Ratio.
F= Larger variance ÷ Smaller variance; [i.e; F= MSC÷MSE, or F= MSE÷MSC]
4. Specify the level of significance. Take 5% if nothing is mentioned.
5. Fix the degrees of freedom. Here we have to fix a pair of d.f.
If ‘F’ is obtained by using F= MSC÷MSE, then pair of df is (d f of MSC, d f of MSE)
If ‘F’ is obtained by using F= MSE÷MSC, then pair of df is (d f of MSE, d f of MSC)
6. Obtain table value of F at specified level significance and fixed degree of freedom.
7. Compare the Calculated value of F with the Table value, and decide whether to accept
or reject the null hypothesis. If calculated value is less than the table value, H0 is
accepted. If calculated value is more than the table value, H0 is rejected.
Qn: Four varieties of a crop was grown on 3 plots, and the following yield was obtained. You
are required to test whether there is significant difference in the productivity of seeds:
90
Variety of Seeds
Plot
P Q R S
I 10 7 8 5
II 9 7 5 4
III 8 6 4 4
Sol:
Qn: The following table shows the yield of 3 varieties. Perform analysis of variance and test
whether there is significant difference between varieties:
Plots
Varieties
A B C D E
I 30 27 42
II 51 47 37 48 42
III 44 35 41 36
Sol:
91
Here, we are asked to test whether there is significant difference between varieties. But
varieties are given in rows, not in columns. In one way ANOVA, the samples must be in
columns. Therefore, we have to rearrange the given data so as to bring the samples in
columns as shown below:
Varieties
Plots
I II III
A 30 51 44
B 27 47 35
C 42 37 41
D 48 36
E 42 H0 : There is no significant difference in the
productivity of varieties.
H1 : There is significant difference in the productivity of varieties.
Test Statistic applicable here is F-test
= 900+2601+1936+729+2209+1225+1764+1369+1681+2304+1296+1764–
2
(480 /12)
= 19778 – 19200 = 578
SSC = [(εx1)2/n1] + [(εx1)2/n1] + [(εx1)2/n1] + ....................... – (T2/N)
= [(30+27+42)2÷3]+[(51+47+37+48+42)2÷5]+[(44+35+41+36)2÷4] – (4802/12)
= (9801/3) + (50625/5) + (24336/4) – (19200)
= 3267 + 10125 + 6084 – 19200 = 19476 – 19200 = 276
In two way classification, observations are classified into different groups on the basis
of two criteria. Consider the example mentioned in one-way classification. If we study the
effect of both the quality of seeds and the type of fertilizers on the productivity of crop, the
data are to be classified on the basis of two criteria, namely type of seed and type of fertilizer.
This is called two-way analysis of variance. In case of two-way analysis of variance, we need
not make any kind of rearrangement in the given data. Since two criteria are considered, here,
there will be two sets of hypotheses.
Total SST = N –1
(i)Find SST.
SST = Sum of square of all items – (T2/N)
Where T = Total of all observations, N = Total Number of observations
(T2/N) is generally called correction factor
(ii)Find SSC.
SSC = [(εx1)2/n1] + [(εx1)2/n1] + [(εx1)2/n1] + ....................... – (T2/N)
Where εx1= sum of items in the first column
εx2= sum of items in the second column
n1 = number of items in the first column
n2 = number of items in the second column
(iii)Find SSR.
(iv)Draw one-way ANOVA Table and enter the values of SST, SSC and SSR
(v)Find the value of SSE. SSE = SST – (SSC+SSR)
(vi)Find the degree of freedom in the third column as indicated in the proforma.
(vii)Find MSC. MSC = SSC ÷ (c–1)
(viii)Find MSR. MSR = SSR ÷ (r–1)
(ix)Find MSE. MSE = SSE ÷ [(c–1) x (r–1)]
(x)Find F–Ratios (i. e; FC and FR)
94
Qn: Following table shows the yield of crops using 3 varieties of seeds:
Varieties of Seeds
Plots
P Q R
I 6 7 8
II 4 6 5
III 8 6 10
IV 6 9 9
Between Columns:
Calculated value of FC = 2.396
Level of Significance = 5%
Degrees of freedom = (2,6)
Table value of FC at 5% level of significance and (2,6) degrees of freedom = 5.14
Since calculated value is less than the table value, null hypothesis is accepted. So we
may conclude that there is no significant difference in the productivity of three
varieties of seeds.
Between Rows:
Calculated value of FR = 3.593
Level of Significance = 5%
Degrees of freedom = (3,6)
Table value of FC at 5% level of significance and (3,6) degrees of freedom = 4.76
Since calculated value is less than the table value, null hypothesis is accepted. So we
may conclude that there is no significant difference in the productivity of plots.
96
Coding Method
In analysis of variance, while preparing ANOVA table (both one-way and two-way),
at first, we have to find the values of SST, SSC, SSR, etc. But, if the individual observations
of the given data are of large values, the computation of SST, SSC, SSR, etc becomes a
tedious task. So, as to avoid this complication, we may apply coding method. Coding method
refers to the addition, subtraction, multiplication and division of individual observations of
the given data by a constant. The addition, subtraction, multiplication or division of all the
individual items by a constant will not affect the value of F.
Qn: The following table shows the number of units of a product produced by 5 workers using
4 different types of machines:
Machines
Workers
P Q R S
I 44 38 47 36
II 46 40 52 43
II 34 36 44 32
IV 43 38 46 33
V 38 42 49 39
Let us apply coding method by subtracting 45 from each observation of the given data. Then
we get;
Machines
Workers
P Q R S
I -1 -7 2 -9
II 1 -5 7 -2
III -11 -9 -1 -13
IV -2 -7 1 -12
V -7 -3 4 -6
=1+1+121+4+49+49+25+81+49+9+4+49+1+1+16+81+4+169+144+36–
(6400/20)
= 894 – 320 = 574
Since calculated value is more than the table value, null hypothesis is rejected.
Alternative hypothesis is accepted. So we may conclude that there is significant
difference in the mean productivity of machines.
Between Rows:
Calculated value of FR = 6.574
Level of Significance = 5%
Degrees of freedom = (4,12)
Table value of FC at 5% level of significance and (4,12) degrees of freedom = 3.26
Since calculated value is more than the table value, null hypothesis is rejected.
Alternative hypothesis is accepted. So we may conclude that there is significant
difference in the mean productivity of workers.
REVIEW QUESTIONS:
Methods Scores
I 84 71 84 76 85
II 85 76 88 86 90
III 81 68 73 71 82
Test whether there is significant difference in the scores under three methods.
11. A company had 4 salesmen P,Q,R and S, each of whom was sent for a period of one
moth to three types of areas, namely, urban area, rural area and semi-urban area. The
sales (in thousand rupees) achieved by the salesmen are shown in the following table:
99
Salesmen
Area
P Q R S
Urban 80 80 60 100
Rural 30 30 70 30
Semi-urban 70 40 50 80
Carry out an analysis of variance and interpret the results.
100
Chapter 13
NON-PARAMETRIC TESTS
Meaning:
A test which is not concerned with testing of parameters is called Non-parametric test.
Non-parametric test does not make any assumption about the nature of distribution.
Therefore, non-parametric tests are called distribution-free tests.
1. They can be used only if the observations are measured on ordinal or nominal scale.
2. They cannot be used for estimating population parameters
3. The application of all non-parametric tests is not very simple.
1. Chi-square Test
2. Sign Tests
3. Signed Rank Test (Wilcoxon Matched Pairs Test)
4. Rank Sum Tests
5. One Sample Runs Test (Wald Wlfowitz’ Runs Test)
6. Kolmogrov - Smirnov Test (K-S Test)
Sign tests
101
t-test is generally used when sample is small and there is an assumption that the population is
normal. Therefore, when sample is small but it is not possible to make an assumption about
the nature of population distribution, t-test cannot be applied. In such a case sign test is used.
In sign test, to find the value of test statistic, we use the proportion of signs (+ve or -ve
signs), not the numerical magnitude. That is why, the test is known as sign test. There are two
types of sign tests. They are (a) One sample sign test and (b) Two sample sign test.
One sample sign test is used to test whether the sample belongs to a particular population.
Procedure:
Qn: Mr. A had to wait for following time (in minutes) for bus in 15 occasions:
Sol:
H0: There is no significant difference between sample mean and population mean (i.e;
μ = 5)
H1: There is significant difference between sample mean and population mean ( i.e; μ ≠
5)
The test statistic applicable is one sample sign test.
Test statistic = (p – P)/SE where p = proportion of + signs, P = ½
S E = √(PQ)/n
102
Two sample sign test is used to test whether two populations are identical. In case of two
sample sign test, each pair is replaced by +ve or --ve sign. If first vale in a pair is larger,
assign + ve sign to that pair, and otherwise assign – ve sign. Procedure the procedure is same
as in the case of one sign test.
Qn: The following are the scores obtained by 2 students in different tests:
Student I 7 10 14 12 6 9 11 13 7 6 10
Student II 10 13 14 11 10 7 15 11 10 9 8
Use the sign test at 1% level of significance to test the null hypothesis that on an average the
two students are identical.
Sol:
H0: There is no significant difference between students ( i.e; performance of the students are
identical)
H1: There is significant difference between students
103
S E = √(PQ)/n
Signed Rank Test is another important non-parametric test used to test whether matched
paired samples are identical or not. Here we use the signed ranks for testing. Wilcoxon
Matched Pairs Test is used differently depending upon following two situations:
Here, we find the difference of matched pairs and assign them ranks. Then ranks are
classified into two categories based on their respective signs. Then take the sun of two
categories of ranks. The minimum of the two is considered as the value of test statistic.
Procedure:
104
Qn: The following table shows the details of number of units of a product produced by
two workers. Test whether there is significant difference between the performances of the
workers using Wilcoxon matched pairs test.
Worker P 73 43 47 53 58 47 52 58 38 61 56 56 43 55 65 75
Worker Q 51 41 43 41 47 32 24 58 43 53 52 57 44 57 40 68
Sol:
56 52 4 4 4.5 4.5
56 57 -1 1 1 -1
34 44 -10 10 9 -9
55 57 -2 2 2.5 -2.5
65 40 25 25 14 14
75 68 7 7 7 7
Total of Signed Ranks 101.5 18.5
The calculated value of T = 101.5 or 18.5 whichever is lower.
∴ T value = 18.5
Level of significance = 5%
Degree of freedom = n-1, (n= number of vlues who have either + or –ve sign)
N = 15
Table value of Wilcoxon’s T test at 5% level of significance and 15 df = 25
Since calculated value is less than the table value, null hypothesis is accepted. So
we may conclude that there is no significant difference in he performances of
workers are P and Q.
Signed Rank Test (When the number of matched pairs > 25)
Here, we find the difference of matched pairs and assign them ranks. Then ranks are
classified into two categories based on their respective signs. Then take the total of two
categories of ranks. The test statistic is Z test.
Procedure:
Qn: The following are the marks obtained by 26 students before and after giving a special
coaching to them:
106
Marks (before) : 70, 35, 21, 16, 75, 63, 70, 54, 77, 82, 68, 19, 13, 72, 78, 17, 24, 3, 45, 80, 15,
20, 58, 65, 35, 52.
Marks (after) : 79, 62, 90, 37, 35, 14, 26, 32, 90, 54, 85, 44, 83, 90, 92, 32, 34, 28, 34, 79, 35,
32, 62, 63, 30, 68.
Use the signed rank test to test at whether there is significant difference in the marks of
students before and after providing special coaching (∝ = 5%).
Sol:
H0: There is no significant difference in the marks of students before and after giving
special coaching.
H1: There is no significant difference in the marks of students before and after giving
special coaching.
The test statistic is Wilcoxon matched pairs test ( i.e; Z test)
Z = [(T – μ)/σ ] where T = Sum of Positive Ranks or Sum of Negative Ranks,
whichever is less; μ = [n(n+1)]/4; σ = √ [n(n+1)(2n+1)]/24
45 34 11 11 8 8
80 79 1 1 1 1
15 35 -20 20 16 16
20 32 -12 12 9 9
58 62 -4 4 3 3
65 63 2 2 2 2
35 30 5 5 4 4
52 68 -16 16 13 13
Total of Signed Ranks 128 223
T = 128 (128 or 223, whichever is low)
μ = [n(n + 1)]/4, = 26(26+1)/4 = (26 x 27)/4 = 702/4 = 175.5
σ = √[n(n+1)(2n+1)]/24 = √[26(26+1)(52+1)]/24 = √(26 x 27 x 53)/24
= √37206/24 = √1550.25 = 39.373
∴ Z = (128 – 175.5)/39.373 = – 47.5/39.373 = – 1.21 = 1.21 (numerically)
Level of significance = 5%
Degree of freedom = infinity
Table value of Z at 5% level of significance and infinity degree of freedom = 1.96
Since calculated value is less than the table value, null hypothesis is accepted. So
we may conclude that there is no significant difference in the performances of
students before and after giving special coaching.
This method is used when there are two group of samples. The testing procedure is:
U = U1 or U2 whichever is less.
U1 = n1.n2 + [n1(n1+1)]/2 – R1
U2 = n1.n2 + [n2(n2+1)]/2 – R2
R1 = Rank sum of Sample I
R2 = Rank sum of Sample II
n1 = Number of observations in Sample I
n2 = Number of observations in Sample II
SE= √ [n1.n2 (n1+ n2+1)]/12
4. Specify the level of significance. Take 5% unless specified otherwise.
5. Fix the degrees of freedom. df = infinity
6. Locate the table value of test statistic (i.e; Z test) at specified level of significance and
fixed degrees of freedom.
7. Compare the calculated value with table value and decide whether to accept or reject
the hypothesis. If calculated value is less than the table value, null hull hypothesis is
rejected and otherwise, it is rejected.
Qn: Apply Wilcoxon- Mann-Whitney Test to test whether the following samples come from
populations with same mean (i.e; they are identical):
Sample I 54 39 70 58 47 40 74 49 74 75 61 79
Sample II 45 41 62 53 33 45 71 42 68 73 54 73
Sol:
H0: There is no significant difference between two samples (i.e; they are identical)
H1: There is significant difference between two samples (i.e; they are not identical)
The test statistic is Wilcoxon Mann Whitney test ( i.e; U- test)
Value of Test Statistic = [(μ–U)/SE ]
where μ = (n1.n2)/2
U = U1 or U2 whichever is less.
U1 = n1.n2 + [n1(n1+1)]/2 – R1
U2 = n1.n2 + [n2(n2+1)]/2 – R2
R1 = Rank sum of Sample I
R2 = Rank sum of Sample II
SE= √ [n1.n2 (n1+ n2+1)]/12
Computation of Rank Sums
Values of Samples Rank Rank
together (ascending
order) sample I sample II
33 1 1
39 2 2
109
40 3 3
41 4 4
42 5 5
45 6.5 6.5
45 6.5 6.5
47 8 8
49 9 9
53 10 10
54 11.5 11.5
54 11.5 11.5
58 13 13
61 14 14
62 15 15
68 16 16
70 17 17
71 18 18
73 19.5 19.5
73 19.5 19.5
74 21.5 21.5
74 21.5 21.5
75 23 23
79 24 24
Rank Sum R1=167.5 R2=132.5
U1 = 12 x 12 + [12(12+1)]/2 – 167.5 = 144 + 78 – 167.5 = 54.5
U2 = 12 x 12 + [12(12+1)]/2 – 132.5 = 144 + 78 – 132.5 = 89.5
U = 54.5 or 89.5 whichever is lower, ∴ U = 54.5
μ = (12 x 12)/2 = 144/2 = 72
SE = √[(12 x 12)(12 +12 + 1)]/12 = √(144 x 25)/12 = √300 = 17.32
∴ Test Statistic = (72 – 54.5)/17.32 = 17.5/17.32 = 1.011
Table value of test statistic (i.e; Z test) at 5% level of significance and infinity degrees
of freedom = 1.96
Since calculated value is less than the table value, null hypothesis is accepted. So
we may conclude that there is no significant difference between two samples. Both
the samples come from populations with the same mean.
Sol:
H0: There is no significant difference between salesmen
H1: There is significant difference between salesmen
The test statistic applicable here is Kruskal – Wallis test ( i.e; H- test)
Use the appropriate formula for computing the value of test statistic.
Test Statistic H = [12/n(n+1)] x [εR12/n1+ εR22/n2)+........... ] – 3(n+1)
111
σ = √[2n1n2(2n1n2–n1–n2)]/ (n1+n2)2(n1+n2–1)
Qn: Test the randomness of following arrangement of students (Boys and Girls) in a class:
B,G,B,G,B,B,B,G,B,G,B,B,B,G,G,B,B,B,B,G,G,B,G,B,B,B,G,B,B,B,G,G,G,B,G,B,B,B
,G,B,G,B,B,B,B,G,G,B
Sol:
H0: There is randomness
H1: There is no randomness
The test statistic applicable here is Z-test.
Use the appropriate formula for computing the value of test statistic.
113
Z = (r – μ)/σ
r = Number of runs
μ = [2n1n2/(n1+n2)] + 1
σ = √[2n1n2(2n1n2–n1–n2)]/ (n1+n2)2(n1+n2–1)
σ = √[2n1n2(2n1n2–n1–n2)]/ (n1+n2)2(n1+n2–1)
= √[2 x 30 x 18(2*30*18 – 30–18)]/ [(30+18)2(30+18–1)]
Level of significance = 5%
Degree of freedom = infinity
Table value of Z at 5% level of significance and infinity d f = 1.96
Since calculate vale is less than table value, null hypothesis is accepted. So we may
conclude that the arrangement is made at random.
REVIEW QUESTIONS:
10. Explain the hypothesis testing procedure under two sample sign test.
11. What do you mean by Wilcoxon matched pairs test?
12. Explain the hypothesis testing procedure of Wilcoxon matched pairs test.
13. What is meant by Wilcoxon Mann Whitney U-test?
14. Explain the hypothesis testing procedure of Wilcoxon Mann Whitney U-test.
15. What is meant by Kruskal-Wallis H-test?
16. Explain the hypothesis testing procedure of H-test.
17. What do you mean by one sample runs test?
18. Explain the hypothesis testing procedure under one sample runs test.
19. The following are the measurements of the breaking strength of a certain commodity:
173, 187,163, 172, 166, 163, 165, 160, 189, 161, 171, 158, 151, 169, 162, 163, 139,
172, 165 and 148. Use sign test to test the null hypothesis that mean breaking strength
of the commodity is 160.
20. A driver buys petrol either at station X or at station Y. the following arrangement
shows the order of the stations from which the driver bought petrol over a certain
period of time:
X, X, X, Y, X,Y, X,Y, X, Y, Y, Y, X, Y, Y, Y, X, Y, Y, X, Y, X, Y, X, Y, Y, X, Y,
Y, X, X,Y, X, Y, Y, Y, X, Y, X, X, X, Y, X, X, Y, X, X, X, X, Y.
115
Chapter 14
SAMPLE SIZE DETERMINATION
Determination of size of sample is very important. If the sample size is very large, it
will be very difficult to manage the data. But, if the size is too small, the sample will not
represent the population, and the conclusion drawn may not be correct. Therefore, the size of
sample must be optimum.
Following are some of the important formulae commonly used for determining
sample size:
Sol:
n = (Zσ/e)2
Population S D (σ) = 15
Allowable difference (e) = 6
Value of Z at 1% level of significance and infinity d f = 2.576
∴ n = [(2.576 x 15)/6]2 = (38.64/6)2 = 6.442 = 41.474
= 41
Sol:
n = [Z2Nσ2] / {[(N-1)e2] + [Z2σ2]}
Population Sixe (N) = 5000
Population S D = √4 = 2
Allowable difference (e) = 0.4
Value o at 1% level of confidence and infinity df = 2.576
∴ n = [2.5762 * 5000*22 ] / {[(5000-1)0.42] + [2.5762*22]}
= (6.6358*5000*4) / (799.84 + 26.543) = 132716/826.383
= 160.599 = 161
C. Sample Size Determination While Estimating Population Proportion When
Population is Infinite
Sol:
n = [Z2pq/ e2]
n = 292
N = Size of population
e = allowable difference between population proportion and sample
proportion.
Qn: It is decided to draw an optimal sample from a population of 5000 units to estimate the
percentage of defectives on the basis of 3% defectives in the sample within 0.05 units of its
true value. Level of confidence desired is 95%.
Sol:
Sample Size (n) = [Z2Npq] / {[(N-1)e2] + [Z2pq]}
N = 5000
p = 3% = 0.03
q = (1 – 0.03) = 0.97
e = 0.05
= 44.33 = 44.
REVIEW QUESTIONS:
2. What are the important formulae used for determining sample size while estimating
population mean?
3. What are the important formulae used for determining sample size while estimating
population proportion?
118
Chapter 15
STATISTICAL ESTIMATION
Statistical estimation is one of the important branches of Statistical inferences. It is
concerned with estimation of population parameters with the help of samples drawn from that
population. The accurate value of population parameter can be computed only by an
exhaustive study of the population. But, it is infeasible to collect date from each and every
element of the population. Therefore, we estimate that population parameters through sample.
This is the actual process of statistical estimation.
Two types of estimates are generally used for estimating population parameter. They
are (a) Point Estimate and (b) Interval Estimate.
Point Estimation
If a single statistic is used as an estimate of an unknown parameter, it is called point estimate
of that parameter. Eg; when the particular value of the sample mean is called the “estimate”,
sample mean is called the “estimator”.
Interval Estimation
An estimate which suggests the lowest and highest values within which population parameter
is expected to lie, they are called the interval estimates. Here, the two limits (lower and
upper) give an interval.
Qn: from the following data, find the limits within which population mean may lie:
Sol:
10. In a sample of 500 units of a commodity from a large consignment, 40 units were
considered defective. Estimate the percentage of defective in the whole consignment
and limits within which the percentage will probably lie.
.*********.
121
Chapter 16
With so many specialist software packages available, why use Excel for statistical
analysis? Convenience and cost are two important reasons: many of us have access to Excel
on our own computers and do not need to source and invest in other software. Another
benefit, particularly for those new to data analysis, is to remove the need to learn a software
program as well as getting to grips with the analysis techniques. Excel also integrates easily
into other Microsoft Office software products which can be helpful when preparing reports or
presentations.
Limitations of Excel
Even though it has wide applications and usage in data analysis, Excel is not free from
limitations. It remains first and foremost a spreadsheet package. Inevitably it does not cover
many of the more advanced statistical techniques that are used in research. More surprisingly,
it lacks some common tools (such as box plots) that are widely taught in basic statistics.
There is also concern amongst some statisticians over the format of specific output in some
122
functions. The extensive range of graph (chart) templates is also criticised for encouraging
bad practice in data presentation through inappropriate use of colour, 3-D display, etc.
Despite these limitations Excel remains a very valuable tool for quantitative data analysis as
you will see.
Excel offers a broad range of built-in statistical functions. These are used to carry out
specific data manipulation tasks, including statistical tests. An example is the AVERAGE 1
function that calculates the arithmetic mean of the cells in a specified range. A list of Excel
functions referred to in this and other guides is included in Appendix A along with
instructions on how to access them.
The Data Analysis Tool Pak is an Excel add-in. It contains more extensive functions,
including some useful inferential statistical tests. An example is the Descriptive Statistics
routine that will generate a whole range of useful statistics in one go. An introduction to
loading and using the Tool Pak add-in is included at Appendix B. The ToolPak is not
available in Excel for Mac. See Appendix B for an alternative.
(3) Charts:
Excel’s in-built charts (graphs) cover most of the chart types introduced in Chapter 13
and are invaluable in data exploration and presentation. We illustrate their use in Chapter 13
and also in the other guides.
Pivot tables provide a way of generating summaries of your data and organising data
in ways that are more useful for particular tasks. They are extremely useful for creating
contingency tables, cross-tabulations and tables of means or other summary statistics. A brief
introduction to creating pivot tables is given in the guide Data exploration in Excel:
univariate analysis.
1. Import the data in a suitable format from, for example, an online survey tool.
2. Enter the data manually.
If you are going to enter your data manually use a single worksheet to hold all the
data in your dataset and set up the worksheet with variables (questions) as the columns and
the cases (e.g. respondents) as the rows. An individual cell, therefore, contains a respondent’s
answer to a specific question.
If they do not have one already, allocate each case in the dataset a unique numerical
identifier (ID). The easiest way to do this is simply to number them consecutively from 1
through to n (where n is the number of cases). For clarity, it is best to put the ID as the first
column in the worksheet. Giving each respondent a unique ID aids in sorting and tracking
individual responses when (for example) cleaning the data or checking outliers. A simple,
consecutive number ID system also makes it easy to reorder the data if needed. If you are
transferring data from paper copies of a questionnaire, it is useful to write the ID number
onto the paper copy to make it easier to check any errors.
Enter the re-coded numerical values (e.g. 0/1 for male/female), ensuring you keep a
record in a code book (Chapter 13). A worksheet in the workbook is a useful place to
record details of your variables and to store your code book as shown in Figure 3.
Which to do depends on your analysis needs. Some tools in Excel (e.g. pivot tables) work
well with text and generate meaningful output but some analysis tasks may require
numerically coded data. If you are exporting your data to another software package, check
the format required by that package. In some cases, it may be helpful to have both formats.
You can do this by creating a copy of the column containing the original data, then selecting
the new column and using Home > Find & Select > Replace to replace the original values
with the new ones. Ensure you give the new column a unique header.
Importing data
If you are importing the data from another electronic file, check that the layout is suitable (i.e.
respondents as rows, variables as columns), add or modify variable names if required, add
respondent ID if needed and check that the data has imported correctly.
Give files a meaningful name. It is also helpful to date them as this makes it easier to track
back if you need to do so. Worksheet tabs can also be named to help you manage your
data.
AVERAGE to help you with creating summated scales.) If you are creating new variables
during data transformation ensure they are given unique column headers.
Using a function
We will introduce specific functions in the other guides but the following example of
applying the AVERAGE function to calculate the mean age in the sample dataset in Figure
2 illustrates their use:
Select the cell in which you wish the calculation to be placed (Hint: if you are
using the same worksheet as your dataset, avoid cells that are immediately
adjacent to your data).
126
Select Formulas > More Functions > Statistical > AVERAGE to open the
Function Argument dialogue box
With the help of the obtained statistical information, researchers can easily understand
the demand for a product in the market, and can change their strategy accordingly. Basically,
SPSS first store and organize the provided data, then it compiles the data set to produce
suitable output. SPSS is designed in such a way that it can handle a large set of variable data
formats.
SPSS is revolutionary software mainly used by researchers which help them process
critical data in simple steps. Working on data is a complex and time consuming process, but
this software can easily handle and operate information with the help of some techniques.
These techniques are used to analyze, transform, and produce a characteristic pattern between
different data variables. In addition to it, the output can be obtained through graphical
representation so that a user can easily understand the result. Read below to understand the
factors that are responsible in the process of data handling and its execution.
1. Data Transformation: This technique is used to convert the format of the data. After
changing the data type, it integrates same type of data in one place and it becomes easy to
manage it. You can insert the different kind of data into SPSS and it will change its structure
127
as per the system specification and requirement. It means that even if you change the operating
system, SPSS can still work on old data.
processes, and find out the difference between them. It can help you understand which method
is more suitable for executing a task. By looking at the result, you can find the feasibility and
effectiveness of the particular method.
5. T-tests: It is used to understand the difference between two sample types, and researchers
apply this method to find out the difference in the interest of two kinds of groups. This test can
also understand if the produced output is meaningless or useful.
This software was developed in 1960, but later in 2009, IBM acquired it. They have made
some significant changes in the programming of SPSS and now it can perform many types of
research task in various fields. Due to this, the use of this software is extended to many
industries and organizations, such as marketing, health care, education, surveys, etc.
Advantages of SPSS:
• Can be expanded.
Limitations of SPSS:
• Usually involves added training to completely exploit all the available features.
SPSS statistics has three main windows and a menu bar at the top. These allow to:
Students are directed to acquaint with the application of SPSS in performing testing of
hypotheses.
REVIEW QUESTIONS:
.*********.