0% found this document useful (0 votes)
65 views15 pages

Statistics Handout

1. The document discusses basic probability concepts like binomial distribution, discrete and continuous probability distributions, and the normal distribution. 2. It also covers Bayes' rule of probability and provides examples of how it can be used in decision making and game theory. 3. The key aspects of normal distribution are defined, including its probability density function and how it arises from the limiting case of the binomial distribution. Characteristics like the mean, variance, and normal curve shape are also outlined.

Uploaded by

Anish Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views15 pages

Statistics Handout

1. The document discusses basic probability concepts like binomial distribution, discrete and continuous probability distributions, and the normal distribution. 2. It also covers Bayes' rule of probability and provides examples of how it can be used in decision making and game theory. 3. The key aspects of normal distribution are defined, including its probability density function and how it arises from the limiting case of the binomial distribution. Characteristics like the mean, variance, and normal curve shape are also outlined.

Uploaded by

Anish Garg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Statistics

Basic Probability

1. A class contains 8 boys and 7 girls. The teacher selects 3 of the children at random and without
replacement. Calculate the probability that the number of boys selected exceeds the number
of girls selected.
2. A and B throws a dice one by one. The player who throws first ‘two’ wins the game. If A starts
the game, find the probability that B wins.
3. What is the probability of having 53 Sundays in a leap year?
4. A problem of mathematics is given to the three students A, B and C, whose chances of solving
are 1/2, 1/3 and 1/4 respectively. What is the probability that the problem will be solved?

Baye’s Rule of Probability

It’s an interesting theorem that establishes the relationship between two conditional probabilities.

If A1, A2,..,An are mutually exhaustive events of a sample space S and B is any arbitrary event of
𝐵
𝑃(𝐴𝑖 )𝑃( )
𝐴 𝑃(𝐴𝑖 ∩𝐵) 𝐴𝑖
S. Then 𝑃 ( 𝐵𝑖) = 𝑃(𝐵)
= 𝐵
∑𝑛
𝑖=1 𝑃(𝐴𝑖 )𝑃( )
𝐴𝑖

Solve the following question


1. There are three bags: first containing 1 white, 2 red, 3 green balls; second 2 white, 3
red, 1 green balls and third 3 white, 1 red, 2 green balls. Two balls are drawn from a
bag chosen at random. These are found to be one white and one red. Find the
probability that the balls so drawn came from the second bag.
2. Three machines A, B and C produce respectively 60%, 30% and 10% of the total
number of items of a factory. The percentages of respective outputs of these machines
are respectively 2%, 3% and 4%. An item is selected at random and is found to be
defective. Find the probability that the item was produced by machine C.
3. Of the three men, the chances that a politician, a businessman, or an academician will
be appointed as a vice-chancellor (VC) of a public University are 0.5, 0.3, and 0.2,
respectively and the probability that they will promote the research, if becomes VC
are 0.3, 0.7 and 0.8, respectively. If research is promoted, what is the probability that
VC is an academician?

Baye’s Rule in decision making

Decision making under uncertainty is called as Bayesian decision theory. Not going in depth, just
establish an analogy with Baye’s theorem statement itself. Every decision making model has some
nature of environment. These constraints are mutually exclusive and mutually exhaustive as well.
We may consider those as A’s and the decision that we want to take may be taken as B. Our
𝐵
objective is to determine the chances of decision B that is to calculate 𝑃(𝐵) when 𝑃(𝐴𝑖 ) and 𝑃 ( )
𝐴𝑖
are known.

Baye’s Rule in game theory

Similalry, Baye’s theorem is applicable in Game Theory where we would like to find the
probability of strategy to be adopted by Player 1 under the condition of different strategies by
another player.

Probability Distributions

Random Variables
A random variable is a function maps outcome of an experiment to a real value. For example, if
you toss two coins and outcome is getting head then number of heads becomes random variable
i.e. the value of random variable will be 0, 1 or 2 heads.

Discrete Probability Distributions


If the values of random variable are discrete in nature, the distribution is called discrete
probability distribution. In above case, the numbers of heads are discrete values therefore we may
consider it discrete random variable. Now, if check the probability of each value of random
variable, the result will be 0.25, 0.50 and 0.25 respectively for each value of random variable.

You may establish analogy with frequency distribution. Here, frequency is probability of an
outcome and summation of all the probabilities is 1.

Recall frequency distribution and find expressions for mean, variance and moments for discrete
probability distribution.

Binomial Distribution

Consider the experiment of tossing 3 coins, success of getting head and failure if getting tail. It is
clear that the probability of each success is 0.5

One way to get exactly 2 heads: HHT


What’s the probability of this exact arrangement?
P(heads) x P(heads) x P(tails) =(1/2)2 x (1/2)
Other ways to get exactly 2 heads: THH, HTH; Total ways – 03
In all these three ways, the probability remains same therefore we may write –

Total probability of two heads out of 3 tosses of coins = 3 x (1/2)2 x (1/2)

Similar will be done for occurrence of 0, 1 and 3 heads. How many such values will come:
Total number of trials (3 tosses of coin) – 03
Total number of successes (2 heads) – 02
Total number of ways for achieving 2 successes out of 3 trials - 3𝐶2 = 3

Hence we can deduce the probability of a value x of random variable X out of n number of trials
or experiments as
P (X = x) = 𝑛𝐶𝑥 × 𝑝 𝑥 × (1 − 𝑝)𝑛−𝑥
Where, p is the probability of success in each trial.

Find the mean and variance of Binomial Distribution.

Continuous Probability Distributions

We know that the binomial distribution and Poisson distribution are discrete probability
distribution whereas the normal distribution is the form of continuous probability distribution.

Before we start with normal distribution, we have to know that What Is Continuous Probability
Distribution And What Are Its Characteristics?

Defintions:-
Continuous variate: A variate that is not discrete i.e., which can take infinite number of values in
a given interval a  x  b, is called a continuous variate.

Probability Density function: Let X be a continuous random variable and let the probability of X
 1 1 
falling in the interval  x  dx, x  dx  be expressed by f(x)dx, where f(x) is a continuous
 2 2 
function of X and satisfies the following two conditions:

(i) f(x)  0  x  R where R is the collection of all points in the entire range of the variable X.

 b

(ii)  f ( x)dx  1 if -  X   and  f ( x)dx  1 if a  X  b.


 a

Then the function f(x) is called the probability density function and the continuous curve
y = f(x) is called the probability curve.

The probability that x falls in the interval a  x  b is then written as

P (a  x  b) =  f ( x)dx
a
The integral also represents the area under the probability curve y = f (x), between the ordinates
x = a and x = b, and the x-axis i.e., we may understand the concept of probability in relation of
the area under the probability curve.

Characteristics of continuous probability distribution:


If f (x), -  X  , be the probability density function of a continuous probability distribution,
we define:


(i) Mean = x =  x f ( x)dx


(ii) Variance = 2 =  (x  x )
2
f ( x)dx


(iii) Moments: The rth moment about x = a is given by


 r   ( x  a ) r f ( x)dx
'



and the rth moment about the mean x is given by


 r   ( x  x ) r f ( x)dx .


Here, if r = 2, this is equal to variance.

(iv) Mean deviation from the mean, for the above continuous probability distribution is


=  | x  x | f ( x)dx .


(v) Median Md is given by

Md  
1 1
 f ( x)dx  M f ( x)dx  2  f ( x)dx  2
d

(vi) Mode is the value of the variate for which

d d2
f ( x)  0 and f ( x)  0
dx dx 2

i.e., for which the probability f(x) is maximum.


Normal Distribution
The normal distribution can be derived from the binomial distribution in the limiting case
when n, the number of trials, is very large and p, the probability of success, is close to
1/2.
The probability density function for the normal distribution is given by

1
f ( x)  e ( x   ) ( 2 2 )
2
, - < x < 
 (2 )

Where  is the mean of the normal distribution,  the standard deviation are also know as the
parameters of the normal distribution.

The probability distribution with density function given above is called Normal
distribution or the Gaussian distribution. x is called the normal variate with mean  and
standard deviation  and is denoted by x : N( , ).

Normal Curve:-

Since the equation of normal curve is

f(x)

O  X

We may summarize all its properties as below:

The graph of the normal distribution as shown above is called the normal curve. It is symmetrical
about the line x =  when the ordinate has maximum value. Also mean, median and mode
coincide in the normal curve. The line x =  divides the area under the normal curve about x-axis
into two equal parts. Thus median also coincides with the mean and mode. The area under the
normal curve between any two given points x = x1 and x = x2 represents the probability of values
falling into the given interval. The total area under the normal curve about x-axis is 1.
Standard Normal Variate: If x is a normal variate with mean  and standard deviation , then
x
z is called standard normal variate. It has mean  = 0 and standard deviation  = 1. After

putting these values of parameters in the density function, we obtain

1
1  z2
f ( z)  e 2
, - < z < 
(2 )

Properties of the Normal Distribution:-

1. Area under the normal curve is Unity.


2. Mean = Median = Mode = 
3. The normal distribution has the point of inflexion at x    
4. The variance is 2
4
5. Mean deviation from the mean =
5
The values of P, for different values of t, are readily available in the form of tables
and may be seen in all books of statistics.
Since, the curve is symmetric about z = 0 at x =  as given above, therefore,

1
P (- < x  0) = P (0  x < ) =
2

x
1
1  z2
As when z 

, P (- < z < ) =
(2 ) 
e 2
dz = 1,

 1
1  z2 1
Whereas P (z  0) = P (z  0) =
(2 )
e
0
2
dz =
2
.

Suppose we wish to obtain the probability of x lying between x1 and x2 then


x2 z2 1
1 1  z2

 2 x1  e 2 dz
 ( x   ) 2 ( 2 2 )
P (x1  x  x2) = e dx =
(2 ) z1
x1   x2  
Where, z1  and z 2 
 
1  z2
  2 z2
1 z1 1
 z2 

Then P (x1  x  x2) =   e dz   e 2 dz = P2 (z) – P1 (z)
(2 ) 
0 0 

If x1 lies on the right side of the line of mean of the normal curve, then we may also
conclude that
 1 z1 1
1  z2 1  z12 1
Similarly P (x  x1) = 
(2 ) 0
e 2
dz - 
(2 ) 0
e 2
dz1 = - (z1)
2
0 1 z1 1
1  z2 1  z12 1
And P (x  x1) = e
(2 ) 
2
dz +
(2 ) 0
e 2
dz1 =
2
+ (z1)

Where Where z1 and z2 are same as defined above.


Also, (- x1) = - (x1).
(Exercise: Prove all the above properties)

Problems related to area under the curve:-


Problem 1: Let  = 10,  = 50, then find P (60  x  70)
60  50
Solution: When x = 60, z   1,
10
and similarly for x = 70, z = 2.
Then,
P (60  x  70) = Area from z = 1 to z = 2
= ( Area from z = 0 to z = 2) – ( Area from z = 0 to z = 1)
= (2) - (1) = 0.477 – 0.3413 = 0.1359

Problem 2: Let  = 10,  = 50, then find P (40  x  60)


40  50
Solution: When x = 40, z   1 ,
10
and similarly for x = 60, z = 1.
Then,
P (40  x  60) = Area from z = -1 to z = 1
= 2 (Area from z = 0 to z = 1) ( By symmetry)
= 2 (1) = 2  0.3413 = 0.6823

Problem 3: Let  = 10,  = 50, then find P (30  x  40)


Solution: When x = 30, z = -2 and for x = 40, z = -1.
P (30  x  40) = Area from z = -2 to z = -1
= (Area from z = 1 to z = 2) ( By symmetry)
= 0.1359 (Same as Problem 1)

Problems of Normal Distribution


Question 1: If the heights of 300 students are normally distributed with mean 64.5
inches and standard deviation 3.3 inches, how many students have heights
i) Less than 5 feet, i.e., 60 inches,
ii) Between 5 feet and 5 feet 9 inches.
Also find the height below which 99% of the students lie.

Question 2: The distribution of weekly wages for 500 workers in a factory is


approximately normal with the mean and standard deviation of Rs. 75 and Rs. 15. Find
the number of workers who receive weekly wages
i) More than Rs. 90 ii) Less than Rs. 45

Question 3: In a normal distribution, 31% of the items are under 45 and 8% are over 64.
Find the mean and standard deviation of the distribution.

Test of Hypothesis

Hypothesis Tests: An Introduction


To test a certain theory or belief about a population parameter say mean, variance, proportion..

Types of Hypothesis
There are two types of hypothesis
Null Hypothesis
Alternative Hypothesis

A null hypothesis is a claim or statement about a population parameter that is assumed to be true
until it is declared false. An alternative hypothesis is a claim about a population parameter that
will be true if the null hypothesis is false.

Hypothesis Building

Example 1: In the past a machine has produced washers having a mean thickness of 0.050 inch.
To determine whether the machine is in proper working order a sample of 10 washers is chosen
for which the mean thickness is 0.053 inch and the standard deviation is 0.003 inch. Test the
hypothesis that the machine is in proper working order.

The null hypothesis states that a given claim about a population parameter is true. In the given
example population parameter is mean. The claim to be tested is that the machine is in proper
working order may or may not be true. The claim is true when the mean is 0.050 inches. Therefore
the null hypothesis will be that the mean is 0.050 inches therefore alternative hypothesis is mean
is not 0.050 inches. Which we write as

H0: µ = 0.050

H1 or Ha: µ ≠ 0.050

Example 2: The percentage of people who prefer specific seat in the plane where they fly. A survey
shows that 61% of the adults prefer a window seat, 38% prefer an aisle seat, and the only 1%
prefer the middle seat. These results are based on a sample of 806 adults. Suppose that the result
were true for the population of such adults at the time of the survey and that we want to check if
the current percentage of all adults who prefer the window seat when they fly is still 61%.
Suppose we take a random sample of 1000 adults and ask them which seat is their favorite when
they fly. Of them, 640 say that they prefer a window seat.

Here population parameter is proportion i.e., p.

H0: p = 0.61

H1 or Ha: p ≠ 0.61

Example 3: The lapping process which is used to grind certain silicon wafers to the proper
thickness is acceptable only if σ, the population standard deviation of the thickness of dice cut
from the wafers, is at most 0.50 mil. Use the 0.05 level of significance to test the claim, if the
thickness of 15 dice cut from such wafers have a standard deviation of 0.64 mil.

Here population parameter is proportion i.e., σ.

H0: σ = 0.50

H1 or Ha: σ > 0.50

Types of Errors
Type I Error: A type I error occurs when a true null hypothesis is rejected. This error is denoted
by ‘α’. The value of ‘α’ represents the probability of committing this error; that is

α = P (H0 is rejected | H0 is true). The value of α represents the significance level of the test.

Type II Error: A type I error occurs when a false null hypothesis is not rejected. This error is
denoted by ‘β’. The value of ‘β’ represents the probability of committing this error; that is β = P
(H0 is not rejected | H0 is false). The value of 1 - β is called the power of the test. It represents the
probability of not making a Type II error.

Four Possible Outcomes for a Test of Hypothesis

Actual Situation
H0 is True H0 is False

Do not reject H0 Correct decision Type II or β error


Decision
Reject H0 Type I or α error Correct decision

Tails of the Test: A two-tailed test has rejection regions in both tails, a left-tailed test has the
rejection region in the left tail, and a right-tailed test has the rejection region in the right tail of the
distribution curve.
Few more questions

Example 1: In the past a machine has produced washers having a mean thickness of 0.050 inch.
To determine whether the machine is in proper working order a sample of 10 washers is chosen
for which the mean thickness is 0.053 inch and the standard deviation is 0.003 inch. Test the
hypothesis that the machine is in proper working order.

Example 2: The mayor of a large city claims that the average net worth of families living in this
city is at least $300,000. A random sample of 25 families selected from this city produced a mean
net worth of $288,000. Assume that the net worths of all families in this city have a normal
distribution with the population standard deviation of $80,000. Using the 2.5% significance level,
can you conclude that the mayor’s claim is false?

Example 3: A potential buyer of fluorescent lamp bought 50 lamps of each of two brands, viz.,
Naional lamps and Indian lamps. Upon testing these lamps, he found that the brand ‘National’
had a mean life of 1,282 hours with standard deviation 80 hours, whereas, the brand Indian had
a mean life of 1,208 hours with a standard deviation 94 hours. At 5% level of significance, can the
buyer conclude that both brands have the same Mean life?

Example 4: To compare two kinds of bumper guards, six of each kind were mounted on a certain
make of a compact car. Then each car was run into a concrete wall at 5 miles per hours and the
following are the costs of the repairs.

Bumper Guard 1: 127 168 143 165 122 139

Bumper Guard 2: 154 135 132 171 153 149

Test at 0.01 level of significance whether the difference between the means of these two samples
is significant.

Process to test the hypothesis


• Build the Hypothesis
Formulation • Define the level of significance

• Identify the test statistic


Methodology • Identify the criteria for rejection of Null Hypothesis

• Calculation of Test Statistic


Analysis and • Decision based on criteria of rejection
Decision

Parametric Tests

Problem type Hypothesis Test statistic Criteria of rejection


Single mean H0: µ = µ0 If population variance is known: z > zα (Right Tailed);
𝑥̅ −𝜇
Ha: µ > µ0 (Right 𝑧= z < -zα (Left Tailed);
𝜎/√𝑛
tailed); |z|> zα/2
µ < µ0 (Left tailed); (Two Tailed)
µ ≠ µ0 (Two tailed) If population variance in unknown t > tα, n - 1
𝑥̅ −𝜇
and n < 30: 𝑡 = , where 𝑠 2 = (Right Tailed);
𝑠/√𝑛
∑(𝑥𝑖 − 𝑥̅ )2 t < -tα, n - 1
𝑛−1 (Left Tailed);
|t|> tα/2, n - 1
(Two Tailed)
Two means H0: µ1 - µ2 =  If population variance is known: z > zα (Right Tailed);
Ha: 𝑧=
̅̅̅
𝑥1̅−𝑥
̅̅̅2̅−𝛿
z < -zα (Left Tailed);
µ1 - µ2 >  𝜎 𝜎2
√ 1+ 2
2
|z|> zα/2
𝑛1 𝑛2
(Right tailed); (Two Tailed)
µ1 - µ2 <  If population variance in unknown t > tα, n - 1
(Left tailed); and n < 30: 𝑡 =
̅̅̅
𝑥1̅−𝑥
̅̅̅2̅−𝛿
, (Right Tailed);
1 1
µ1 - µ2 ≠  𝑠𝑝 √ +
𝑛1 𝑛2 t < -tα, n - 1
(Two tailed) Where (Left Tailed);
(𝑛1 −1)𝑠12 + (𝑛2 −1)𝑠22 |t|> tα/2, n - 1
𝑠𝑝2 = ;
𝑛1 +𝑛2 −2 (Two Tailed)
𝑠12and are sample variances of
𝑠22
sample 1 and 2 respectively.
𝑠 2 + 𝑠22
Here, 𝑠𝑝2 = 1
2
if n1 = n2
Several means H0: µ1 = µ2 = … = µn 𝑀𝑆 (𝑇𝑟) 𝐹 > 𝐹𝛼,𝑘−1,𝑁−𝑘
𝐹=
(ANOVA) (αi = 0, for all i) 𝑀𝑆𝐸
Ha: µ1 ≠ µ2 ≠ … ≠ µn 𝑆𝑆 (𝑇𝑟)
𝑀𝑆 (𝑇𝑟) =
(αi = 0, for at least one 𝑘−1
i) 𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑘 (𝑛 − 1)
𝑘 𝑛
1 2
𝑆𝑆𝑇 = ∑ ∑ 𝑥𝑖𝑗2 − 𝑇
𝑁 ∙∙
𝑖=1 𝑗=1
𝑘
1 1
𝑆𝑆(𝑇𝑟) = ∑ 𝑇𝑖∙2 − 𝑇∙∙2
𝑛 𝑁
𝑖=1
𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆(𝑇𝑟)
Where 𝑇𝑖∙ is the total of ith row and
𝑇∙∙ is the grand total
Single proportion H0: p = p0 𝑥 − np0 z > zα (Right Tailed);
𝑧=
Ha: p > p0 √𝑛𝑝0 (1 − 𝑝0 ) z < -zα (Left Tailed);
(Right tailed); |z |> zα/2 (Two
p < p0 (Left tailed); Tailed)
p ≠ p0 (Two tailed)
Two proportions H0: p1 = p2 p1 − 𝑝2 z > zα (Right Tailed);
𝑧=
Ha: p1 > p2 (Right 1 1 z < -zα (Left Tailed);
√𝑝𝑞 ( + )
tailed); 𝑛1 𝑛2 |z |> zα/2 (Two
𝑛 𝑝 +𝑛 𝑝
p1 < p2 (Left tailed); Where 𝑝 = 1 1 2 2 Tailed)
𝑛1 +𝑛2
p1 ≠ p2 (Two tailed)
Several H0: pi1 = pi2 = pi3 = 𝑟
(𝑂𝑖𝑗 − 𝑒𝑖𝑗 )
𝑐 2
2 > 2 𝛼,(𝑟−1)(𝑐−1)
proportions …. = pic  = ∑∑
2
r is the number of
𝑒𝑖𝑗
Ha: All pi1 , pi2 , pi3 , 𝑖=1 𝑗=1
rows and c is the
… , pic are not equal Where 𝑂𝑖𝑗 and 𝑒𝑖𝑗 are observed and
number of columns
expected frequencies respectively.
𝑒𝑖𝑗
(𝑖 𝑡ℎ 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙) × (𝑗𝑡ℎ 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
=
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
Single variance H0: 2 = 02 (𝑛 − 1)𝑠 2
2 > 2 𝛼,𝑛−1
2 =
Ha: 2 > 02 (Right 𝜎 2 (Right Tailed);
tailed); 2 < 21−𝛼,𝑛−1
2 < 02 (Left tailed);  (Left Tailed);
2 ≠ 02 (Two tailed) 2 > 2 𝛼/2, 𝑛−1 or
2 < 21−𝛼/2, 𝑛−1
(Two Tailed)
Two variances H0: 12 = 22 𝐹=
𝑆𝑖2
, where I > j 𝐹 > 𝐹𝛼,𝑛1−1,𝑛2−1
𝑆𝑗2
Ha: 12 > 22 (Right (Right Tailed);
tailed); 12 < 22 (Left 𝐹 > 𝐹𝛼,𝑛2−1,𝑛1−1
tailed);  (Left Tailed);
12 ≠ 22 (Two tailed) 𝐹 > 𝐹𝛼, 𝑛 −1, 𝑛 −1
𝑖 𝑗
2
(Two Tailed)
Data fitness to a (𝑂𝑖 − 𝑒𝑖 )2 2 > 2 𝛼,
probability 2 = ∑
𝑒𝑖 Where  is degree
distribution 𝑖
Where 𝑂𝑖 and 𝑒𝑖 are observed of freedom
and expected frequencies
respectively
Practice Questions
1. A sample of 1000 students from a university was taken and their average weight was found
to be 112 pounds with a S.D. of 20 pounds. Could the mean weight of students in the
population be 120 pounds?
2. The heights of college students in a city are normally distributed with a S.D. of 6 cms. A
sample of 1000 students has mean height 158 cms. Test the hypothesis that the mean height
of college students in the city is 160 cms.
3. Intelligence tests on two groups of boys and girls gave the following results. Examine if the
difference is significant.

Mean S.D. Size


Girls 70 10 70
Boys 75 11 100
4. Two random samples of sizes 1000 and 2000 of farms gave an average yield of 2000 kg and
2050 kg respectively. The variance of wheat farms in the country may be taken as 100 kg.
Examine whether the two samples differ significantly in yield?
5. A sample of size of 600 persons selected at random from a large city shows that the percentage
of males in the sample is 53. It is believed that the ratio of males to the total population in the
city is 0.5. Test whether the belief is confirmed by the observation.
6. A random sample of 400 men and 600 women were asked whether they would to have a
school near their residence. 200 men and 325 women were in favor of the proposal. Test the
hypothesis that the proportion of men and women in favor of the proposal are same at 5%
level of significance.
7. Use the 0.05 level of significance to test the null hypothesis that  = 0.022 inch for the
diameters of certain wire rope against the alternative hypothesis that  ≠ 0.022 inch, given
that a random sample of size 18 yielded 𝑠 2 = 0.000324.
8. From the following two sample values find out whether they have come from the same
population:
Sample 1: 17 27 18 25 27 29 27 23 17
Sample 2: 16 16 20 16 20 17 15 21

9. The results of polls conducted 2 weeks and 4 weeks before an election, are shown in the
following table;
Two weeks before Four weeks before Total
For Candidate A 99 112 211
For Candidate B 101 88 189
Total 200 200 400

10. Fit a Poisson distribution to the following data and test the goodness of fit
x 0 1 2 3 4
f 112 73 30 4 1
11. As part of the investigation of the collapse of the roof of a building, a testing laboratory is
given all the available bolts that connected the steel structure at 3 different positions on the
roof. The forces required to shear each of these bolts (coded values) are as follows:
Position 1 90 82 79 98 83 91
Position 2 105 89 93 104 89 95 86
Position 3 83 89 80 94
Perform an analysis of variance to test at the 0.05 level of significance whether the
differences among the sample means at the 3 positions are significant.

Correlation and Regression


Recall the concept of correlation and regression that you studied in 10+2!
You, so far studied function of single random variables. Let extend it now!
If X and Y are two independent variables, they are not correlated but converse is not true. Can
you explain it?
Hint: 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋, 𝑌) = 𝐸[(𝑋 − 𝑋̅)(𝑌 − 𝑌̅)]
If X and Y are independent: 𝐸 (𝑋, 𝑌) = 𝐸(𝑋). 𝐸(𝑌)
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
Coefficient of correlation 𝑟 =
√𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋) √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑌)
Sample variances for two random variables
1
𝑆𝑥𝑦 = ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) = ∑ 𝑥𝑖 𝑦𝑖 − (∑ 𝑥𝑖 ) (∑ 𝑦𝑖 )
𝑛
1 2
𝑆𝑥𝑥 = ∑(𝑥𝑖 − 𝑥̅ )2 = ∑ 𝑥𝑖 2 − (∑ 𝑥𝑖 )
𝑛
1 2
𝑆𝑦𝑦 = ∑(𝑦𝑖 − 𝑦̅)2 = ∑ 𝑦𝑖 2 − (∑ 𝑦𝑖 )
𝑛
𝑆𝑥𝑦
Coefficient of correlation 𝑟 = 𝑆 𝑆
√ 𝑥𝑥 𝑦𝑦
Now recall, least square method to determine curve fitting related to data fitting to a straight
line, which is leading to the concept of line of regression.
If y = a + bx is the line of regression of y on x and the given data is (𝑥𝑖 , 𝑦𝑖 ), i = 1, 2, …, n
Can we write:
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌) 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
𝑏= 𝜎𝑥 2
; 𝑎 = 𝑦̅ − 𝜎𝑥 2
𝑥̅
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
Line of regression of y on x would be: 𝑦 − 𝑦̅ = 𝜎𝑥 2
(𝑥 − 𝑥̅ )
What would be the line of regression of x on y?
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌) 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
𝛽𝑌 = 𝜎𝑥 2
and 𝛽𝑋 = 𝜎𝑦 2
are called regression coefficient of y on x and x on y,
𝑋 𝑌
respectively.
a) Write these regression coefficients in terms of sample variances.
b) Establish the relationship between these regression coefficients and correlation
coefficient.
c) Find out the permissible range of correlation coefficient and what is significant of
intervals within this range?
Hypothesis testing
Problem type Hypothesis Test statistic Criteria of rejection
Population H0: ρ = 0 ∆𝑟 =
𝑟 √𝑛−2 ∆𝑟 > 𝑡𝛼,𝑛−2
Correlation Ha: ρ > 0 (Right tailed); √1 − 𝑟 2 (Right Tailed);
Coefficient ρ < 0 (Left tailed); (for small samples) ∆𝑟 < −𝑡𝛼,𝑛−2
ρ ≠ 0 (Two tailed) (Left Tailed);
∆𝑟 > 𝑡𝛼/2,𝑛−2 or
∆𝑟 < −𝑡𝛼/2,𝑛−2
(Two Tailed)
General Method: 𝑍 = z > zα (Right Tailed);
√𝑛−3
∙ ln
1+𝑟 z < -zα (Left Tailed);
2 1−𝑟
𝑆𝑥𝑦 |z|> zα/2
Where 𝑟= (Two Tailed)
√𝑆𝑥𝑥 ∙ 𝑆𝑦𝑦

Population H0:  = 0 Test Statistic: 𝑡 = 𝑡 > 𝑡𝛼,𝑛−2 (Right Tailed);


Regression Ha: ̂ −𝛽0
𝛽 (𝑛−2)𝑆𝑥𝑥 𝑡 < −𝑡𝛼,𝑛−2 (Left Tailed);

 (Right tailed); 𝜎
̂ 𝑛 𝑡 > 𝑡𝛼/2,𝑛−2 or 𝑡 <
𝑆
 (Left tailed); Where 𝛽̂ = 𝑥𝑦 and 𝜎̂ = −𝑡𝛼/2,𝑛−2 (Two Tailed)
𝑆𝑥𝑥
≠ (Two tailed)
1
√ (𝑆𝑦𝑦 − 𝛽̂ ∙ 𝑆𝑥𝑦 )
𝑛

Practice Questions
1. The table below shows the number of absences, x, in a Calculus course and the
final exam grade, y, for 7 students. Find the correlation coefficient and interpret
your result. Find the regression line of y on x.

x 1 0 2 6 4 3 3
y 85 80 70 55 90 90 95
2. The time x in years that an employee spent at a company and the employee’s
hourly pay, y, for 5 employees are listed in the table below. Calculate and interpret
the correlation coefficient r. Find the line of regression of y on x.

x 5 3 4 10 15
y 25 20 21 35 38
3. Considering x as number of hours that 10 persons studies for a French test and y
as their scores on the test. Given Σx = 100, Σy = 564, Σx2 = 1376, Σx2 = 36562 and
Σxy = 6945. Find the equation of least squares line that approximates the regression
of the test scores on the number of hours studied. Also find the correlation
coefficient between these two.

You might also like