0% found this document useful (0 votes)
84 views24 pages

Statistcs For Management 2 Marks

The document outlines the syllabus for the course BA 4101 Statistics for Management. It includes 5 units: [1] Introduction to probability, probability distributions, and random variables; [2] Sampling distributions, estimation, and determining sample size; [3] Hypothesis testing for means and proportions using parametric tests; [4] Non-parametric tests including chi-square, sign test, and Kolmogorov-Smirnov test; [5] Correlation, regression, and estimation of regression lines. The course aims to introduce foundational statistical concepts and different probability distributions, sampling techniques, hypothesis testing methods, and tools for correlation and regression analysis.

Uploaded by

Haresh k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views24 pages

Statistcs For Management 2 Marks

The document outlines the syllabus for the course BA 4101 Statistics for Management. It includes 5 units: [1] Introduction to probability, probability distributions, and random variables; [2] Sampling distributions, estimation, and determining sample size; [3] Hypothesis testing for means and proportions using parametric tests; [4] Non-parametric tests including chi-square, sign test, and Kolmogorov-Smirnov test; [5] Correlation, regression, and estimation of regression lines. The course aims to introduce foundational statistical concepts and different probability distributions, sampling techniques, hypothesis testing methods, and tools for correlation and regression analysis.

Uploaded by

Haresh k
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

BA 4101 Statistics for Management Mrs.R.

Devi AP/DOMS

UNIT I INTRODUCTION

Basic definitions and rules for probability, conditional probability independence of events,
Baye’s theorem, and random variables, Probability distributions: Binomial, Poisson, Uniform
and Normal distributions.

UNIT II SAMPLING DISTRIBUTION AND ESTIMATION

Introduction to sampling distributions, sampling distribution of mean and proportion,


application of central limit theorem, sampling techniques. Estimation: Point and Interval
estimates for population parameters of large sample and small samples, determining the
sample size.

UNIT III TESTING OF HYPOTHESIS - PARAMETIRC TESTS

Hypothesis testing: one sample and two sample tests for means and proportions of large
samples (z-test), one sample and two sample tests for means of small samples (t-test), F-test for
two sample standard deviations. ANOVA one and two way

UNIT IV NON-PARAMETRIC TESTS

Chi-square test for single sample standard deviation. Chi-square tests for independence of
attributes and goodness of fit. Sign test for paired data. Rank sum test. Kolmogorov-Smirnov –
test for goodness of fit, comparing two populations. Mann – Whitney U test and Kruskal Wallis
test. One sample run test.

UNIT V CORRELATION AND REGRESSION

Correlation – Coefficient of Determination – Rank Correlation – Regression – Estimation of


Regression line – Method of Least Squares – Standard Error of estimate.

REFERENCES:
1. Richard I. Levin, David S. Rubin, Sanjay Rastogi Masood Husain Siddiqui, Statistics for
Management, Pearson Education, 7th Edition, 2016.
2. Prem.S.Mann, Introductory Statistics, 7th Edition, Wiley India, 2016.
3. Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshirani, An Introduction to Statistical
Learning with Applications in R, Springer, 2016.

1
Mailam Engineering College , Mailam
Unit I

1. Define Probability

The term probability means “Measuring the degree of uncertainty and that of certainty also as
a corollary”.

It is also defined in a simple way that “It is a chance of occurrence of a certain event when
appeared quantitatively”.

2. What is mean by Random trial experiment?

Any experiment whose outcome cannot be predicted or determined in advance. E.g., tossing a
coin or throwing a dice.

3. What is mean by Sample space?

A set of all possible outcomes from an experiment is called a sample space. E.g., toss a coin, the
result is either head or tail. Let 1 denote head and 0 denote tail. The point 0, 1 on a straight line
is called sample point or event points.

4. What is mean by Discrete sample space?

A sample space whose elements are finite or infinite but countable is called a discrete sample
space. E.g., if we toss a coin as many times as we require for turning up one head, then the
sequence of points S1 = (1), S2 = (0,1), S3 = (0,0,1) etc..

5. What is mean by Continuous sample space?

A sample space whose elements is infinite and uncountable or assumes all the values on a red
line R or on an interval of R is called a continuous sample space. E.g., all the points on a line.

6. Define /What is mean by Event?

A sub – collection of a number of sample points under a definite rule or law is called an event.
E.g., throwing a dice it forces.

7. Define /What is mean by Null event?

An event having no sample point is called a null event and is denoted by ф.

8. Define /What is mean by Simple event?

An event consisting of only one sample point of a sample space is called a simple event. E.g., let
a dice be rolled once and A is the event that face number 5 is turned up, then A is a simple
event.

2
Mailam Engineering College , Mailam
9. Define /What is mean by Compound event?

When an event is decomposable into a number of simple events, then it is called a compound
event. E.g., the sum of the two numbers shown by the upper face of the two dice is seven in the
simultaneous throw of the two unbiased dice.

10. Define /What is mean by Exhaustive cases or events?

It is the total number of all the possible outcomes of an experiment. E.g., throw dice, any one
of the 6 faces may turn up and therefore, there are 6 possible outcomes.

11. What are mutually exclusive and independent events?

It the happening of one event includes the happening of other events, Then the events are
mutually exclusive.

Independent event are happening of one event is not affected or influenced by other Events,
then they are independent events.

12. Define /What is mean by Equally likely events?

Events are said to be equally likely if there is no reason to expect any one in preference to
other. E.g., in throwing a dice, all the 6 faces (1, 2, 3, 4, 5, and 6) are equally likely to occur.

13. Define /What is mean by Collectively exhaustive events

The total number of events in a population exhausts the population. So they are known as
collectively exhaustive events.

14. Define /What is mean by Equally probable events?

If in an experiment all possible outcomes have equal chances of occurrence, then such event
are said to be equally probable events. E.g., throwing a dice, all 6 faces have chances to occur.

15. Define /What is mean by Independent event?

If the happening of any one does not depend upon the happening of other event is called
independent event.

16. Define /What is mean by Dependent event?

The events which are not independent are called dependent events.

17. Name a few descriptive statistics.


 Measures of central tendency.
 Arithmetic Mean
 Median
 Mode

3
Mailam Engineering College , Mailam
18. Name a few discrete probability distributions.
 Probability histogram
 Mass distribution
 Continuous distribution
 Cumulative probability

19. Given λ =4.5 for a Poisson distribution, find p(x<=2)


Solution: e—m mx
P(x<=2) = ------------- , m=4.2
x!
e—4.2 4.20 e-4.2 4.21 e-4.2 4.22
= ----------- + ---------- + ---------
0! 1! 2!

e—4.2 [4.20 4.21 4.22 ]


= --- + ---- + ----
0! 1! 2!

= 0.149[1+4.2+17.64]
-----
2
= 0.149[1+4.2+8.82]

= 0.149[14.02]

P(x<=2) = 0.2088.

20. What are the different types of variables used in statistics?

i) Qualitative variable ii) quantitative variable


ii) Continuous Variable
ii) Discontinuous /Discrete variable

21. Write the usefulness of Poisson distribution?

The Poisson distribution can be considered to be a good approximation of the binomial


distribution when the number of trials (n) is large and the probability of success (p) is very small

(i.e) as n∞ and p0.

It is given by the function.

4
Mailam Engineering College , Mailam
e-m mx
P(X=x) = -----, x=0,1,2…..
x!

22. What is probability distribution?


Let X be discrete random variable and which takes the value X1 , X2,Xn Such that
p[X=x1]=p[X=x2]=…..pn
The function p(X) is called probability mass function [p.m.f]
It satisfies the following conditions
1. P[X=X]=0
2. ∑ P(x)=1
23. Define conditional probability.
Let A be any event in the sample space S and P(A) >0. The probability that an event B occurs
subject to the A has already occur is called the conditional probability B given that A has already
occur. It is denoted by P(B/A).

24. What is random variable?


A random variable is single real valued function defined on the sample space.
Eg: In tossing a coin the outcome head may be assigned the value ‘1’ and the outcome of tail
may be assigned the value ‘0’.

25. Two cards are drawn from a deck of 52 cards calculate the probability that the drawn
includes ace and a ten.

Total no .of possible selection = 52C2


No.of ways of selecting an ace = 13C1
No.of ways of selecting an Ten = 13C1
Probability = 13C1 *13C1/52C2
=13*13/1326
=169/1326
Probability = 0.1274

26. The average number of traffic accidents in a certain selection of highway is two per week.
Assume that number of accidents follows a Poisson distribution with parametric α. Find the
probability of at most three accidents in this section of high way during a 2week period.
α =2
e-m mx
p(X=x)= -----
x!
at most three:
p(x<=3)=p(x=0)+p(x=1)+p(x=2)+p(x=3)

5
Mailam Engineering College , Mailam
e-2 20 + e-2 21 + e-2 22 + e-2 23
= ----- ------ ------ -----
0! 1! 2! 3!

= e-2 [1+2+2+8]
--
6
= 0.1353[6+12+12+8]
-------------
6
= 0.1353[38/6]
= 0.1353[6.333]
P(x<=3) = 0.8569
27. State baye’s theorem on rule of inverse probability?
P(Ei).P(A/Ei)
P(Ei/A) = ---------------
n
∑ P(E).P(A/Ei)
i=1

where E1,E2….En are mutually exclusive and exhaustive set of events.

28. Find the probability of getting a total of 5 at least once in three tosses of a pair of fair
dice?

P=1/6,q=5/6
P(no of 5 in 3 times)=3C0(1/6)0(5/6)3
=0.02315
P(5 at least once) =1-0.02315
=0.9769

29. Probability (Classical Definition):


If an experiment has n mutually exclusive, equally likely and exhaustive cases, out of which m
are favorable to the happening of the event A, then the probability of the happening of A is
denoted by P(A) and is defined as:

No. of cases favorable to A (m)


P (A) = -------------------------------------------
Total (Exhaustive) number of cases (n)
Probability of an event which is certain to occur is 1

6
Mailam Engineering College , Mailam
Probability of an impossible event is 0
The probability of occurrence of any event lies between 0 and 1, both
inclusive.
Total number of favorable cases
Probability of an event = ---------------------------------------
Total number of equally likely cases
30. Define Baye’s theorem.
Let A1, A2. .A11 be a collection of events such that P (A) > 0 for all i. P (U n
Ai =1) and Ai∩Aj = ф for i ≠ j also let B be an event such that P (B) > 0 then i=1.

P (Ai) P (B/Ai)
P (Ai/B) = --------------------------
∑ⁿ P (Ai) P (B/Ai)
i = 1.
31. Define /What is mean by Binomial distribution?
It is a discrete probability distribution which is obtained when the probability P of the
happening of an event is same in all the trials and there are only two events in each trial.
E.g., the probability of getting a head, when a coin is tossed a number of times, must remain
same in each toss i.e., P = ½.
The probability distribution of binomial distribution for “r” successes in “n” trials
is given by P (r) =ncr qn-r pr
∑fx
Mean = np = -------- (i.e ∑f=N)
∑f
Variance = npq
P+Q = 1, Q=1-P
P(x) = N . ncr qn-r pr
32. Define /What is mean by Poisson distribution? Write the probability mass function of
poisson distribution?
The probability distribution of a random variable “X” is said to have a Poisson distribution, if
it takes only non – negative values and if its distribution is given by

7
Mailam Engineering College , Mailam
e-m mx
P(X=x)= ----------- (i.e x=1,2,3….)
x!
Mean = m , Variance = m S.D = √m
N. e-λ λx (or) N. e-m mx r = 0, 1, 2, 3……..
P(x)= --------- ----------
x! x!
33. Explain Normal distribution.
Normal distribution is an approximation to binomial distribution, whether or not “p = q”. The
binomial distribution tends to the form of the continuous curve and when “n” becomes large,
the normal distribution may be expressed through the following formula:
1 -(x – μ) ²
P (x) = ----- e ------
zσ√2π 2σ²
x–µ
Z = -------,
σ
Where x denotes the value of the continuous random variable, σ denotes the standard
deviation, then “μ” denotes the mean of the random variable, the “e” denotes mathematical
constant approximated by 2.7183, then “π” denotes mathematical constant approximated by
3.1416.

34. What’s the definition of Statistics?


Statistics are usually defined as
1. A collection of numerical data that measure something.
2. The science of recording, organizing, analyzing and reporting quantitative information.

35. What is a random experiment?


An experiment is said to be a random experiment, if it’s out-come can’t be predicted with
certainty.

8
Mailam Engineering College , Mailam
36. Give two examples for categorical variables?
1. Type of climate –hot or cold.
2. Favorite ice cream flavor –variable , strawberry.

37. Uniform distribution:-


1
𝑓 (𝑥 ) = 𝑏−𝑎for𝑎 < 𝑥 < 𝑏,

𝑎+𝑏 (𝑏−𝑎)2
Mean= , Variance=
2 12

38. Characteristics of Binomial Distribution:


1. It is a discrete distribution which gives the theoretical probabilities.
2. It depends on the parameters p or q, the probability of success or failure and n (the
number of trials). The parameter n is always a positive integer.
3. The distribution will be symmetrical if p=q. it is skew-symmetric if p≠q although with
n tending to be large it is approximately so.
4.The statistics of the binomial distribution are mean=np; variance=npq; and
standard deviation=√npq.

39. Characteristics of Poisson Distribution:


1. Poisson distribution is a discrete distribution. It gives theoretical probabilities and
theoretical frequencies of a discrete variable.
2. It depends mainly on the value of the mean m.
3. This distribution is positively skewed to the left. With the increase in the value of the
mean m, the distribution shifts to the right and the skewness diminishes.
4. Its arithmetic mean in relative distribution is P and in absolute distribution is np.
40. Properties of a Normal Cure:
The normal probability cure with mean µ and standard deviation σ has the
following properties:
1. The equation of the cure is bell-shaped. The top of the bell is directly above the mean
µ.
2. The cure is symmetrical about the line x=µ and x ranges from -∞<x<∞
3. Mean, mode and median coincide at x=µ as the distribution is symmetrical.
4. It can be show that it has arithmetic mean = µ and variance =σ2.
5. X-axis is asymptote to the curve.
6. The points of inflexion of the curve are at x=µ+σ, x=µ-σ are the curve changes from
concave to convex at x= µ+σ to x=µ-σ.
Unit-2
1. Sampling techniques:-
I) Probability sampling:-
Every item of the universe has an equal chance of inclusion in the sample
a) Simple probability sampling: (equal chance)
Eg:- 1) lottery method
2) Random method

9
Mailam Engineering College , Mailam
b) Stratified probability sampling:
(Random selection is not from the Heterogeneous universe from Homogeneous)
c) Systematic sampling:-
One unit in selected at random from the Universe and the other units are at a
specified interval from the selected unit.
d) Cluster sampling:-
Universe is divided into some recognizable sub-groups which are called cluster.
e) Multi-stage sampling:-
Sample units are selected in 2 or 3 or 4 stages.
f) Area sampling:-
Lists or registers are used as the sampling frame. The plus and minus points of cluster
sampling are also applicable to area sampling.
II) Non-probability sampling:-
The organizers of the inquiry purposively choose the particulars units of the universe
for constituting a sample on the small mass that they select out of a huge one will
be typical or representative of the whole.
a) Convenience sampling:-
The reasonable chooses the sampling units are the basis of convenience or accessibility.
b) Judgment sampling:-
It is sometimes advocated is the selection of universe items by means of expect
judgement specialist in the subject.
c) Quota sampling:-
It uses the principle of stratification bases for stratification in consumer survey are
commonly demographic.
d) Panel sampling:-
The initial samples are drawn on random basis and information from these are collected
by regular basis. It is a facility to select and quickly contact such well balanced samples
and to have relatively high response rate even by mail.
e) Snowball sampling:-
It relies on referrals from initial subjects to generate additional subjects.

2.Sampling errors:-
Sampling errors or variations among sample statistics are due to difference between each
sample and the population and among several samples.
o Biased sampling error,
o Unbiased sampling error.

3.Non sampling error:-


It occurs at the same time of observation, approximation and processing of
data. This error is common to both the sampling and census survey. It is due to faculty sampling
plan, lack of trained and qualified investigators, inaccuracy, completion or publication.

4. Sampling distribution:-
It is the probability distribution, under repeated sampling of the population of a given statistic.

10
Mailam Engineering College , Mailam
5. Sampling distribution of the mean (𝒙 ̅):-
The probability distribution of all possible values of the sample mean 𝑥
̅̅̅is called sampling
distribution of 𝑥̅ .

6. Sampling distribution of sample proportion (𝑷 ̅ ):-


𝐸𝑙𝑒𝑚𝑒𝑛𝑡𝑠 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒 ℎ𝑎𝑣𝑖𝑛𝑔 𝑐ℎ𝑎𝑟𝑎𝑐𝑡𝑒𝑟𝑖𝑠𝑡𝑖𝑐𝑠(𝑥)
𝑃̅ =
𝑠𝑎𝑚𝑝𝑙𝑒 𝑠𝑖𝑧𝑒(𝑛)
7. Standard error of mean (𝒙̅)
The standard deviation of a sampling distribution of a statistic is often called its standard
error.
𝜎𝑥̅ = 𝜎 (Known)
√𝑛
𝜎𝑥̅ = 𝑠 (Unknown)
√𝑛−1
If it is finite population,
𝑁−𝑛
𝜎𝑥̅ = 𝜎 √
√𝑛 𝑁 − 1
𝜎
Confidence interval for the population means for large sample ̅𝑥 ± 𝑍𝛼⁄2
√𝑛
𝑆
Confidence interval for the population mean for small sample ̅𝑥 ± 𝑡𝜎⁄2
√𝑛−1
8. Sampling error of proportions
𝜎 𝑃𝑄
𝑝=√
𝑛

𝑝𝑞
Confidence interval for the population proportion for large sample 𝑝 ± 𝑍𝛼 ⁄2 √ 𝑛
9. Central limit theorem:-
When sampling is done from a population with Mean𝜇and 𝜎 finite standard deviation, 𝜎 the
sampling distribution of the sample mean 𝑥̅ will tend to a normal distribution with mean 𝜇 and
𝜎
standard deviation 𝑛 as the sample size 𝑛 be comes large.

2
𝑥̅ ~𝑁(𝜇, 𝜎 ⁄𝑛) (i.e) =p[ ̅𝑥-µ/σ/√n]
10. Estimation:-
It is the techniques and methods by which population parameters are estimated
from sample studies.

11. Types of estimation:-


a) Point estimation:-
When a single value is used as estimation, the estimate is called as point estimate of
the population parameter.
b) Interval estimation:-
An estimation of population parameter given by two numbers between which the
parameter may be considered to lie is called Interval estimation. (Or)When range of
values s used as estimation, the estimate is called as interval estimation

12. Characteristics of a good estimator:-

11
Mailam Engineering College , Mailam
a) Unbiasedness
b) Efficiency,
c) Consistency,
d) Sufficiency.

13. Determining the sample size.

𝑍 2𝑝𝑞
n=(σz/E)2 (or) 𝑛 = 𝐸2

Unit-3

1. Hypothesis:-
A hypothesis is a statement about the population parameter.

2. Tests of hypothesis:-
It is a procedure that helps us to ascertain the likelihood of hypothesized population parameter
being correct by making use of the sample statistic.

3. Null hypothesis(𝑯𝟎 ):-


The statistical hypothesis that is set up for testing a hypothesis is known as Null hypothesis.
𝐻0 : 𝜇1 = 𝜇2

4. Alternative hypothesis(𝑯𝟏 ):-


The negation of null hypothesis is called the alternative hypothesis.
𝐻1 : 𝜇1 ≠ 𝜇2 , 𝜇1 > 𝜇2 , 𝜇1 < 𝜇2

5. Type 1 error:-
It is the error of rejecting null hypothesis is 𝐻0 when it is true.

6. Type 2 error:-
It is the error of accepting the null hypothesis 𝐻0 when it is false.

7. Level of significance:-
The level of significance is the maximum probability of making a type 1 error and it is denoted
by 𝛼.

8. Critical region /Rejection region:-


The rejection region statistics which leads to rejection of null hypothesis 𝐻0 gives us a region
known as rejection region.

9. Critical value (value of accepting & rejection region)


The test statistics computed to test the null hypothesis 𝐻0 is known as the critical
value.

10. One tailed test:-

12
Mailam Engineering College , Mailam
A test of statistical hypothesis where the alternative hypothesis is one sided is called as
one tailed test.

11. Two tailed test:-


Test of hypothesis is made on the basis of rejection region represented by both side of the
standard normal curve, it is called Two sided test (or) Two tailed test.

12. Large samples (n≥ 𝟑𝟎) (Z-test)(find one tailed or two tailed for table value)
a) Test of significance of a single mean:-
𝑥̅ −𝜇 𝜎 𝑠
𝑍 = 𝑆.𝐸(𝑥̅ ) , 𝑆. 𝐸 (𝑥̅ ) = 𝑛 , 𝑛
√ √

b) Single proportion:-
𝑝−𝑃 𝑃𝑄
𝑍 = 𝑆.𝐸(𝑃̅), 𝑆. 𝐸 (𝑃̅) = √ 𝑛 , 𝑄=(1-𝑃)

c) Difference between two Proportion:-


𝑝 −𝑝2
1 𝑝1𝑄1 𝑝2 𝑄2
𝑍 = 𝑆.𝐸(𝑝 , 𝑆. 𝐸(𝑝1 − 𝑝2 )=√ +
1 −𝑝2 ) 𝑛1 𝑛2

d) Difference between two means:-


𝑥
̅̅̅1̅−𝑥
̅̅̅2̅ 𝜎1 2 𝜎2 2
𝑍= , 𝑆. 𝐸 (̅̅̅ 𝑥2 ) = √
𝑥1 − ̅̅̅ + (known)
𝑆.𝐸(𝑥
̅̅̅1̅−𝑥
̅̅̅̅̅
2) 𝑛1 𝑛2
𝑠 2 𝑠2 2
𝑆. 𝐸 (̅̅̅ 𝑥2 ) = √ 𝑛1 +
𝑥1 − ̅̅̅ (unknown)
1 𝑛2

e) Difference of two standard deviation:


𝑆1 −𝑆2 𝜎12 𝜎2 22
𝑍= , 𝑆. 𝐸(𝑆1 − 𝑆2 )=√ + (known)
𝑆.𝐸(𝑆1 −𝑆2 ) 2𝑛1 2𝑛2

𝑆 2 𝑆 2
𝑆. 𝐸(𝑆1 − 𝑆2 )=√2𝑛1 + 2𝑛2 , (unknown)
1 2

13. Small samples (𝒏 < 30) , (t-test) (Ref table student ‘t’ test)

a) Test of significance of Single mean:-

𝑥̅ −𝜇 𝑆
𝑡 = 𝑆𝐸(𝑥̅ ) , 𝑆. 𝐸 (𝑥̅ ) =
√𝑛−1
1 ∑(𝑥
̅̅̅1̅−𝑥2 ) ̅̅̅̅̅2
2
𝑆 =𝑛−1 ∑(𝑥 𝑥2 2 ,
̅̅̅1 − ̅̅̅) 𝑆 = √ 𝑛−1

Degree of freedom t=n-1(for table value)

b) Difference of Student ‘t’ test/paired t-test:-

13
Mailam Engineering College , Mailam
̅
𝐷 ∑𝐷
𝑡 = 𝑆𝐸(𝐷) ̅
̅̅̅̅ , 𝐷 = 𝑛

̅ )2
∑(𝐷−𝐷 𝑆
𝑆=√ , ̅ )=
𝑆. 𝐸(𝐷
𝑛−1 √𝑛
c) Difference between two means:-
𝑥
̅̅̅̅−𝑥
̅̅̅̅ 1 1
𝑡 = 𝑆𝐸(𝑥1̅̅̅̅−𝑥2̅̅̅̅) , 𝑆𝐸 (̅̅̅ 𝑥2 ) = 𝑠 × √𝑛 + 𝑛
𝑥1 − ̅̅̅
1 2 1 2

Estimated standard deviation of population we use


𝑛1 𝑆1 2 + 𝑛2 𝑆2 2
𝑠=√
𝑛1 + 𝑛2 − 2
If it is unbiased estimation of the common population we use
(𝑛1− 1)𝑆1 2 + 𝑛2− 1)𝑆2 2
𝑠=√
𝑛1 + 𝑛2 − 2
Degree of freedom t=n1+n2-1(for table value)

14. F-test/Fisher’s test:-


It is the ratio of independent estimates of population variance and expressed as
𝜎 2
𝐹 = 𝜎12 ,
2
𝑛 𝑆1 2 𝑛 𝑆 2
𝜎1 2 = 𝑛1 −1 , 𝜎2 2 = 𝑛2 −1
2
1 2

∑(𝑥−𝑥̅ )2 ∑(𝑦−𝑦̅)2
𝑆1 2 = , 𝑆2 2 =
𝑛1 −1 𝑛2 −1

15. ANOVA:- (Analysis Of Variance)


It is a method of splitting the total variation of a data into constituent parts which
measure different sources of variations.

16. ANOVA Table:-


A table showing the source of variations the sum of squares, degree of freedom, mean
square and formula for F-ratio is known as ANOVA table.

17. ANOVA Uses:-


It is used to test whether the means of a number of populations are equal.

18. Classification of ANOVA:-


I) One way classification:-
𝑇2
Correction factor(C.F)= 𝑟𝑐 ,
𝑟 → Row 𝑐 →column
Where 𝑇 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗

14
Mailam Engineering College , Mailam
∑ 𝑇 2
SSC= 𝑗 𝑟 𝑗 − 𝐶.F
SST=∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − C.F
SSE=SST-SSC
II) Two way classification:-
𝑇2
Correction factor(C.F) = 𝑟𝑐
Where 𝑇 = ∑𝑖 ∑𝑗 𝑥𝑖𝑗
∑𝑗 𝑇 𝑗 2
SSC= − 𝐶.F
𝑟
∑𝑖 𝑇 𝑖 2
SSR= 𝑐 − 𝐶.F
SST=∑𝑖 ∑𝑗 𝑥𝑖𝑗 2 − C.F
SSE=SST-(SSC+SSR)

19. ANOVA Table(One Way Classification):-


Source of variation Sum of Degree of Mean square F-ratio
square freedom
𝑆𝑆𝐶
Between samples SSC c-1 MSC=𝐶−1
𝑀𝑆𝐶
(column) F=𝑀𝑆𝐸
𝑆𝑆𝐸
With in samples SSE c(r-1) MSE=𝑐(𝑟−1)
(errors)
Total SST cr-1

20. ANOVA Table(Two Way Classification):-


Source of variation Sum of Degree of Mean square F-ratio
square freedom
𝑆𝑆𝐶 𝑀𝑆𝐶
Between samples SSC c-1 MSC=𝐶−1 𝐹0 =𝑀𝑆𝐸
(column)
𝑆𝑆𝑅 𝑀𝑆𝑅
Between samples SSR r-1 MSR=(𝑟−1) 𝐹1 =𝑀𝑆𝐸
(rows)
𝑆𝑆𝐸
Which samples SSE (c-1)(r-1) MSE=(𝑟−1)−(𝑐−1)
(errors)
Total SST cr-1

21. Explain the procedure for testing the two sample proportion comparison.

Procedure:
1. Hypothesis:
Null hypothesis : Two proportion are equal (or) µ1=µ2.
Alternate hypothesis: Two proportion are unequal (or) )µ1≠µ2.

2. Test statistics

15
Mailam Engineering College , Mailam
𝑝1−𝑝2
Z=𝑆.𝐸(𝑝1−𝑝2)
𝑝1𝑞1 𝑝2𝑞2
S.E.(p1-p2)=√ + where q=1-p
𝑛1 𝑛2

3. To find the level of significance we can use 5%,1%, or 2%.


4. Find the table value using one tailed test or two tailed test.
5. Find the decision by comparing calculated value and table value and write the
decision whether it is accepted or rejected.

UNIT-4

Chi –square test:- (𝝉𝟐 , 𝝋𝟐)


1) Chi-square test for goodness of fit:
It is not a parameter as its value is not derived from the observation is a population.
(𝑂 −𝐸 ) 2
2
𝜑 = ∑𝑛𝑖=1 𝑖 𝑖 , (𝑛 − 1)
𝐸 𝑖
𝑂𝑖 Observed frequency
𝐸𝑖 Expected frequency
Note:- (𝑛 ≥ 30)
It is used to find the independence of attributes.

2) Chi-square test for independence of attributes:


(𝑂𝑖 −𝐸𝑖 )2
𝜑2 = ∑𝑛𝑖=1 , (𝑟 − 1)(𝑐 − 1)
𝐸𝑖

3) Chi-square distribution of sample variance/Homogeneity test (or)


Standard deviation:
(𝑛−1)𝑆 2
𝜑2 = , (n-1)
𝜎2

Non parametric methods:-


It does not require parametric assumptions because interval data are converted to rank
ordered data.

1. Sign test for paired data:-


It is based on the direction of a part of observation and not on their numerical
magnitude.
When sign value is less than 5 then we use Binomial (or) Poisson distribution,
𝑒 −𝛼 𝛼𝑥
𝑃(𝑋 = 𝑥 ) = 𝑛𝑐𝑟 𝑝𝑟 𝑞 𝑛−𝑟 , 𝑝(X=x) = 𝑥!
When sign value is greater than 5 or equal to 5 then we use normal distribution
𝑥−𝜇 𝑝𝑄
𝑧= , 𝑆𝐸(𝑃̅)=√ 𝑛
𝜎
For table value see one tailed test or two tailed test.
2. One sample sign test:-

16
Mailam Engineering College , Mailam
The population sample is continuous and symmetrical
If n value is less than 30 (n<30) we use Binomial (or) Poisson distribution,
𝑒 −𝛼 𝛼𝑥
𝑃(𝑋 = 𝑥 ) = 𝑛𝑐𝑟 𝑝𝑟 𝑞 𝑛−𝑟 , 𝑝(X=x) = 𝑥!
If n value is greater than 30 (n>=30) we use normal distribution
𝑥 − 𝑛𝑄
𝑍=
√𝑛𝑄(1 − 𝑄)
For table value see one tailed test or two tailed test.

3. Rank sum test:-


a) Mann-Whitney U test
b) Kruskal- Wallis H test

a) Mann-Whitney U test
It is used to determine whether two independent samples have been drawn from
populations with same distribution.
Rank the data and arrange the data in ascending order.
𝑛 (𝑛 +1)
𝑈 = 𝑛1 𝑛2 + 1 21 - 𝑅1
𝑛2 (𝑛2+1)
𝑈 = 𝑛1 𝑛2 + - 𝑅2
2
𝑛1 𝑛2 (𝑛1+𝑛2 +1)
𝜇𝑢 = 𝑛1 𝑛2 , 𝜎𝜇 2 =
2 12
𝑈 − 𝜇𝑢
𝑍=
𝜎𝑢
For table value see one tailed test or two tailed test.

b) Kruskal-wallis test (or) H-test:-


It is used to compare 3 or more groups of sample data. It is also an important over the
sign test and wilxoson’s sign rank test which ignores the actual magnitude of the paired
magnitude.
(𝘪) Arrange the data in ascending order.
(𝘪𝘪) Rank the data
12 𝑅𝑖 2
𝐻 = 𝑛(𝑛+1) [ ∑𝑘𝑖=1 𝑛 ] − 3(𝑛 + 1) , for degree of freedom v= (𝑘 − 1)
𝑖
For table value see chi-square table.

4. One sample run test:-


A run is a succession of identified letters which is followed or preceded by different
letter or no letters at all.
Find the no of runs (V)
𝑉−𝜇
𝑍= 𝜎 𝑣
𝑣

2𝑛1 𝑛2
𝜇𝑣 = +1, V=no of runs
𝑛1+ 𝑛2
2𝑛1 𝑛2 (2𝑛1 𝑛2 −𝑛1−𝑛2 )
𝜎𝑣 2 =[ (𝑛 2 +(𝑛
]
1 +𝑛2 ) 1 +𝑛2 −1)
For table value see one tailed test or two tailed test.

17
Mailam Engineering College , Mailam
5. Run above and run below test:-
To find out the values falling above and below the median of the sample.
𝑉−𝜇
𝑍= 𝜎 𝑣
𝑣

2𝑛1 𝑛2
𝜇𝑣 = 𝑛 +1, V=no of runs
1+ 𝑛2
2𝑛1 𝑛2 (2𝑛1 𝑛2 −𝑛1−𝑛2 )
𝜎𝑣 2 =[ ]
(𝑛1 +𝑛2 )2 +(𝑛1 +𝑛2−1)
For table value see one tailed test or two tailed test.

6. Kolmogorov-Smirnov test/k-S test:-


To find out the significance difference between an observed frequency distribution
and a theoretical (or) expected frequency distribution.

𝐷𝑛 = 𝑚𝑎𝑥|𝐹𝑒 − 𝐹𝑜 |

For table value see Kolmogorov-Smirnov table

7. What is a non parametric test?


When the curve is not appropriate in such situation we have certain useful techniques
that do not make restrictive assumption about the shape of population distribution.
These are known as distribution-free test or non-parametric tests.

8. What are the types of variables used in the goodness of fit chi-square test?
i)observed frequency
ii)expected frequency

9. What is the primary shortcoming of non-parametric tests?


i)easier to do and to understand.
ii)formal ordering or ranking is not required.
iii)no accurate answer is not necessary for non-parametric test.

10. What are the major advantages of non-parametric methods?


i)it is simple to understand quicker and easier to apply when the sample size are
small.
ii)it do not require the assumption that a population is distributed in the shape
of a normal curve or another specified shape.
iii)it can be applied to all types of data-qualitative.
11. List out the Rank sum tests.
i)Mann-Whitney U test
ii)Kruskal-Wall’s H test

12. Name two non parametric tests of association?

18
Mailam Engineering College , Mailam
1.Rank correlation
2.Chi-square test.

13. Distinguish between Mann Whitney test and Kruskal Walis tests?
Mann whitney test tests two group similarities , while kruskal walis test tests three
Group similarities.

14. Explain the k-W test procedure with appropriate examples.

Kruskal – Wallis test is a non –parametric test, which is used to compare there or more
groups of sample data.

Procedure:

1. Arrange the data of both samples in a single series in ascending order.


2. Assign rank to them in ascending order, in the case of repeated values; assign ranks to
them by averaging their rank position.
3. Ranks for the different samples are separated and summed up as R1, R2, R3 ect.
4. To calculate the k-S test, the formula is
12 k Ri2
H = -------- ∑ ------ - 3(n+1)
n (n+1) i=1 ni

5. Find the level of significant for 5%


6. Find the table value by using chi-square test table value.
7. Find the decision for this problem by comparing calculated value and table value and
write the decision whether it is accepted or rejected.
15. Explain the Mann-whitney ‘U’ test procedure with appropriate examples.
Mann-whitney ‘U’ test It is used to determine whether two independent sample have
been drawn from population with same distribution.
Procedure:

1. Arrange the data of both samples in a single series in ascending order.


2. Assign rank to them in ascending order, in the case of a repeated value; assign ranks
to them by averaging their rank position.
3. Ranks for the different samples are separated and summed up as R1, R2, R3 ect.
4. To calculate the Mann-whitney ‘U’ test

U=n1n2+n1(n1+1)/2-R1

µu=n1n2/2, σu2=n1n2(n1+n2+1)/12

19
Mailam Engineering College , Mailam
Z=U- µu/ σu
5. Find the level of significant for 5%
6. Find the table value by using chi-square test table value.
7. Find the decision for this problem by comparing calculated value and table value and
write the decision whether it is accepted or rejected.

Unit-5
1. What is mean by Correlation analysis?
Correlation analysis is a statistical technique used to describe not only the degree of
relationship between the variables, but also the direction of influences.

2. Define Correlation.
Correlation analysis attempts to determine the degree of relationship between two
variables.
According to A.M.Tuttle “Correlation is an analysis of the co-variation between two or
more variables.”

rxy = n ∑XY – ( ∑X)( ∑Y)

√n ∑X2 – (∑X2) √n ∑Y2 –( ∑Y2)

3. What are the methods to study correlation?

 Scatter diagram
 Karl perason’s coefficient of correlation or Covariance method
 Spearman’s Rank Correlation method
 Two way frequency method
 Concurrent deviation method

4. Explain Coefficient of Determination.

It is the Square of the coefficient of correlation i.e r2, where r is the coefficient of
correlation.

5. Explain Types of Correlation:


Correlation can be classified into different ways. The 3 of the most important ways of
classifying correlations are;

Positive or negative correlation:


 If the increase in one variable causes the proportionate increase in the other variable,
then the variable is said to be positively correlated.
 If the increase in one variable causes the proportionate decrease in the other
variable, then the variable is said to be negatively correlated.

20
Mailam Engineering College , Mailam
Simple, partial & multiple correlations
 When only one variable are studied it is a problem of simple correlation.
 When three or more variable are studied it is a problem of either multiple or partial
correlation.

Linear and non linear correlation


 If the amount of change in one variable tend to bear constant ratio to the amount of
change in the other variable then the correlation is said to be Linear.
 If the amount of change in one variable does not bear constant ratio to the amount
of change in the other variable then the correlation is said to be non Linear.

6. Regression analysis:
Regression is the measure of the average relationship between 2 or more
variables in terms of the original units of the data.
E.g., If we know that advertising & sales are correlated, we find out expected
amount of sales for a given advertising expenditure or the required amount of expenditure
for attending a given amount of sales.

7. Difference between Correlation and Regression


S.No Basis of Correlation Regression
comparison
1 Meaning Correlation is a statistical Regression describes how an
measure which determines co- independent variable is
relationship or association of numerically related to the
two variables. dependent variable.
2 Usage To represent linear relationship To fit a best line and estimate
between two variables. one variable on the basis of
another variable.
3 Objective To find a numerical value To estimate values of random
expressing the relationship variable on the basis of the
between variables. values of fixed variable.

4 Dependent No difference Both variables are different


&
Independent
Variables

8. How should one forecast by linear regression?


Regression is the study of relationships among variables, a principal purpose of which is to
predict, or estimate the value of one variable from known or assumed values of other
variables related to it.

9. Types of Regression Analysis

21
Mailam Engineering College , Mailam
Simple Linear Regression: A regression using only one predictor is called a simple
regression.

Multiple Regressions: Where there are two or more predictors, multiple regression
analysis is employed.

10. what for regression analysis used?


Regression analysis shows us how to determine both nature and the strength of
relationship between two variables.

11. What do you interpret if the r=0 and r=-1?


r=0 then the variables are uncorrected.
r=-1 then it is a perfect negative correlation.

12. What do you mean by error variation?


Sampling error or estimation error is the amount of inaccuracy in estimating some
Value that is caused by only a portion of a population than the whole population.

13. What is the purpose of correlation analysis?


Correlation analysis shoes the extent to which two quantitative Variables vary together,
including the strength and direction of relationship. The strength & the relationship refer to
the extent to which one variable predicts the others.

14. Correlation analysis:-(-1<=r<=+1)


It is used to find out the statistical relationship between two variables exists or
not.
Covariance:-
1 1
𝑐𝑜𝑣(𝑥, 𝑦) = 𝑛(∑ 𝑥𝑖 𝑦𝑖 − 𝑛 ∑ 𝑥𝑖 ∑ 𝑦𝑖 )
15. Types of correlation:-
a) Positive correlation,
b) Negative correlation,
c) Linear correlation,
d) Perfectly linear correlation,
e) Perfect correlation,
f) Direct (or) perfect positive correlation,
g) Inverse (or) perfective negative correlation.

16. Methods of correlation:-


a) Karl pearson’s coefficient of correlation:-
∑ 𝑑𝑥 𝑑𝑦
𝑟=
√∑ 𝑑𝑥 2 × ∑ 𝑑𝑦 2
Short-cut method:-

22
Mailam Engineering College , Mailam
∑ 𝑑𝑥 ∑ 𝑑𝑦
∑ 𝑑𝑥 𝑑𝑦 −
𝑟= 𝑛
2 ∑𝑑 2
√∑ 𝑑𝑥 2 − (∑ 𝑑𝑥 ) √∑ 𝑑𝑦 2 − ( 𝑦 )
𝑛 𝑛
b) Spearman’s Rank correlation:-
6 ∑ 𝐷2
𝑟 = 1−
𝑛(𝑛2 − 1)
When rank is repeated,
1 1
6(∑ 𝐷2 + 12 (𝑚1 2 − 𝑚1 ) + 12 (𝑚2 3 − 𝑚2 ) + ⋯ )
𝑟 =1−
𝑛(𝑛2 − 1)

c) Coefficient of correlation:-
(2𝑐−𝑛)
𝑟𝑐 = ±√± , (short term oscillation)
𝑛

Probable error in correlation:-


2(1−𝑟 2)
𝑃𝐸 = 3 , 𝑛 = no of pairs of observation
√𝑛

d) Coefficient of determination:-
𝑟 = 𝑟2

17. Regression analysis:-


Regression means to return or to go back Regression analysis is a mathematical measure of
the average relationship between two or more variables in terms of the original units of the
data.

18. Types of Regression:-


a) Simple regression,
b) Multiple regression,
c) Linear regression,
d) Non-linear regression.
Lines of regression:-
Regression equation of x on y:
(𝑥 − 𝑥̅ ) = 𝑏𝑥𝑦 (𝑦 − 𝑦̅)
∑𝑥∑𝑦
∑ 𝑥𝑦−
𝑛
𝑏𝑥𝑦 = (∑ 𝑦)2
∑ 𝑦 2−
𝑛
∑𝑥 ∑𝑦
̅𝑥 = 𝑛 , 𝑦̅= 𝑛
Regression equation of y on x:
(𝑦 − 𝑦̅) = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )

23
Mailam Engineering College , Mailam
∑𝑥∑𝑦
∑ 𝑥𝑦−
𝑛
𝑏𝑦𝑥 = (∑ 𝑥)2
∑ 𝑥 2−
𝑛
Coefficient of regression:-
𝑟 = √𝑏𝑦𝑥 × 𝑏𝑥𝑦

19. Method of least square:-


Sum of square of the deviation of various points from the line of the best fit is the least
squares.
𝑦 = 𝑎 + 𝑏𝑥
Two normal equation:-
∑ 𝑦=𝑁𝑎 + 𝑏 ∑ 𝑥
∑ 𝑥𝑦=𝑎 ∑ 𝑥+𝑏 ∑ 𝑥 2
∑𝑦 ∑ 𝑏𝑏
𝑎 = 𝑁 , 𝑏 = ∑ 𝑏2
20. Second degree parabolic trend:-
𝑦 = 𝑎 + 𝑏𝑥 + 𝑐𝑥 2
Three normal equation:-
2
∑ 𝑦 = 𝑁𝑎 + 𝑏 ∑ 𝑥 + 𝑐 ∑ 𝑥
3
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥 2 + 𝑐 ∑ 𝑥
4
∑ 𝑥 2𝑦 = 𝑎 ∑ 𝑥 2 + 𝑏 ∑ 𝑥 3 + 𝑐 ∑ 𝑥

∑ 𝑦−𝑐(∑ 𝑥 2 )
𝑎= ,
𝑁
∑ 𝑥𝑦
𝑏= ,
∑ 𝑥2
2
𝑁(∑ 𝑥 2𝑦)−(∑ 𝑥 )(∑ 𝑦)
𝑐= 𝑁(∑ 𝑥 4 )−(∑ 𝑥 2)2

21. Difference between correlation and regression:

Correlation Regression
1.correlation measures the degree 1.it is mathematical measure of the
of relationship between 2 average relationship between 2 or
variables. more variables.
2.it cannot be used for grouped 2.it can be used for units of data.
frequency distribution.

24
Mailam Engineering College , Mailam

You might also like