0% found this document useful (0 votes)

65 views15 pages

Statistics Handout

1. The document discusses basic probability concepts like binomial distribution, discrete and continuous probability distributions, and the normal distribution. 2. It also covers Bayes' rule of probability and provides examples of how it can be used in decision making and game theory. 3. The key aspects of normal distribution are defined, including its probability density function and how it arises from the limiting case of the binomial distribution. Characteristics like the mean, variance, and normal curve shape are also outlined.

Uploaded by

Anish Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views15 pages

Statistics Handout

Uploaded by

Anish Garg

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Statistics

Basic Probability

1. A class contains 8 boys and 7 girls. The teacher selects 3 of the children at random and without
replacement. Calculate the probability that the number of boys selected exceeds the number
of girls selected.
2. A and B throws a dice one by one. The player who throws first ‘two’ wins the game. If A starts
the game, find the probability that B wins.
3. What is the probability of having 53 Sundays in a leap year?
4. A problem of mathematics is given to the three students A, B and C, whose chances of solving
are 1/2, 1/3 and 1/4 respectively. What is the probability that the problem will be solved?

Baye’s Rule of Probability

It’s an interesting theorem that establishes the relationship between two conditional probabilities.

If A1, A2,..,An are mutually exhaustive events of a sample space S and B is any arbitrary event of
𝐵
𝑃(𝐴𝑖 )𝑃( )
𝐴 𝑃(𝐴𝑖 ∩𝐵) 𝐴𝑖
S. Then 𝑃 ( 𝐵𝑖) = 𝑃(𝐵)
= 𝐵
∑𝑛
𝑖=1 𝑃(𝐴𝑖 )𝑃( )
𝐴𝑖

Solve the following question

1. There are three bags: first containing 1 white, 2 red, 3 green balls; second 2 white, 3
red, 1 green balls and third 3 white, 1 red, 2 green balls. Two balls are drawn from a
bag chosen at random. These are found to be one white and one red. Find the
probability that the balls so drawn came from the second bag.
2. Three machines A, B and C produce respectively 60%, 30% and 10% of the total
number of items of a factory. The percentages of respective outputs of these machines
are respectively 2%, 3% and 4%. An item is selected at random and is found to be
defective. Find the probability that the item was produced by machine C.
3. Of the three men, the chances that a politician, a businessman, or an academician will
be appointed as a vice-chancellor (VC) of a public University are 0.5, 0.3, and 0.2,
respectively and the probability that they will promote the research, if becomes VC
are 0.3, 0.7 and 0.8, respectively. If research is promoted, what is the probability that
VC is an academician?

Baye’s Rule in decision making

Decision making under uncertainty is called as Bayesian decision theory. Not going in depth, just
establish an analogy with Baye’s theorem statement itself. Every decision making model has some
nature of environment. These constraints are mutually exclusive and mutually exhaustive as well.
We may consider those as A’s and the decision that we want to take may be taken as B. Our
𝐵
objective is to determine the chances of decision B that is to calculate 𝑃(𝐵) when 𝑃(𝐴𝑖 ) and 𝑃 ( )
𝐴𝑖
are known.

Baye’s Rule in game theory

Similalry, Baye’s theorem is applicable in Game Theory where we would like to find the
probability of strategy to be adopted by Player 1 under the condition of different strategies by
another player.

Probability Distributions

Random Variables
A random variable is a function maps outcome of an experiment to a real value. For example, if
you toss two coins and outcome is getting head then number of heads becomes random variable
i.e. the value of random variable will be 0, 1 or 2 heads.

Discrete Probability Distributions

If the values of random variable are discrete in nature, the distribution is called discrete
probability distribution. In above case, the numbers of heads are discrete values therefore we may
consider it discrete random variable. Now, if check the probability of each value of random
variable, the result will be 0.25, 0.50 and 0.25 respectively for each value of random variable.

You may establish analogy with frequency distribution. Here, frequency is probability of an
outcome and summation of all the probabilities is 1.

Recall frequency distribution and find expressions for mean, variance and moments for discrete
probability distribution.

Binomial Distribution

Consider the experiment of tossing 3 coins, success of getting head and failure if getting tail. It is
clear that the probability of each success is 0.5

One way to get exactly 2 heads: HHT

What’s the probability of this exact arrangement?
P(heads) x P(heads) x P(tails) =(1/2)2 x (1/2)
Other ways to get exactly 2 heads: THH, HTH; Total ways – 03
In all these three ways, the probability remains same therefore we may write –

Total probability of two heads out of 3 tosses of coins = 3 x (1/2)2 x (1/2)

Similar will be done for occurrence of 0, 1 and 3 heads. How many such values will come:
Total number of trials (3 tosses of coin) – 03
Total number of successes (2 heads) – 02
Total number of ways for achieving 2 successes out of 3 trials - 3𝐶2 = 3

Hence we can deduce the probability of a value x of random variable X out of n number of trials
or experiments as
P (X = x) = 𝑛𝐶𝑥 × 𝑝 𝑥 × (1 − 𝑝)𝑛−𝑥
Where, p is the probability of success in each trial.

Find the mean and variance of Binomial Distribution.

Continuous Probability Distributions

We know that the binomial distribution and Poisson distribution are discrete probability
distribution whereas the normal distribution is the form of continuous probability distribution.

Before we start with normal distribution, we have to know that What Is Continuous Probability
Distribution And What Are Its Characteristics?

Defintions:-
Continuous variate: A variate that is not discrete i.e., which can take infinite number of values in
a given interval a  x  b, is called a continuous variate.

Probability Density function: Let X be a continuous random variable and let the probability of X
 1 1 
falling in the interval  x  dx, x  dx  be expressed by f(x)dx, where f(x) is a continuous
 2 2 
function of X and satisfies the following two conditions:

(i) f(x)  0  x  R where R is the collection of all points in the entire range of the variable X.

 b

(ii)  f ( x)dx  1 if -  X   and  f ( x)dx  1 if a  X  b.

 a

Then the function f(x) is called the probability density function and the continuous curve
y = f(x) is called the probability curve.

The probability that x falls in the interval a  x  b is then written as

P (a  x  b) =  f ( x)dx
a
The integral also represents the area under the probability curve y = f (x), between the ordinates
x = a and x = b, and the x-axis i.e., we may understand the concept of probability in relation of
the area under the probability curve.

Characteristics of continuous probability distribution:

If f (x), -  X  , be the probability density function of a continuous probability distribution,
we define:


(i) Mean = x =  x f ( x)dx


(ii) Variance = 2 =  (x  x )
2
f ( x)dx


(iii) Moments: The rth moment about x = a is given by


 r   ( x  a ) r f ( x)dx
'



and the rth moment about the mean x is given by


 r   ( x  x ) r f ( x)dx .


Here, if r = 2, this is equal to variance.

(iv) Mean deviation from the mean, for the above continuous probability distribution is


=  | x  x | f ( x)dx .


(v) Median Md is given by

Md  
1 1
 f ( x)dx  M f ( x)dx  2  f ( x)dx  2
d

(vi) Mode is the value of the variate for which

d d2
f ( x)  0 and f ( x)  0
dx dx 2

i.e., for which the probability f(x) is maximum.

Normal Distribution
The normal distribution can be derived from the binomial distribution in the limiting case
when n, the number of trials, is very large and p, the probability of success, is close to
1/2.
The probability density function for the normal distribution is given by

1
f ( x)  e ( x   ) ( 2 2 )
2
, - < x < 
 (2 )

Where  is the mean of the normal distribution,  the standard deviation are also know as the
parameters of the normal distribution.

The probability distribution with density function given above is called Normal
distribution or the Gaussian distribution. x is called the normal variate with mean  and
standard deviation  and is denoted by x : N( , ).

Normal Curve:-

Since the equation of normal curve is

f(x)

O  X

We may summarize all its properties as below:

The graph of the normal distribution as shown above is called the normal curve. It is symmetrical
about the line x =  when the ordinate has maximum value. Also mean, median and mode
coincide in the normal curve. The line x =  divides the area under the normal curve about x-axis
into two equal parts. Thus median also coincides with the mean and mode. The area under the
normal curve between any two given points x = x1 and x = x2 represents the probability of values
falling into the given interval. The total area under the normal curve about x-axis is 1.
Standard Normal Variate: If x is a normal variate with mean  and standard deviation , then
x
z is called standard normal variate. It has mean  = 0 and standard deviation  = 1. After

putting these values of parameters in the density function, we obtain

1
1  z2
f ( z)  e 2
, - < z < 
(2 )

Properties of the Normal Distribution:-

1. Area under the normal curve is Unity.

2. Mean = Median = Mode = 
3. The normal distribution has the point of inflexion at x    
4. The variance is 2
4
5. Mean deviation from the mean =
5
The values of P, for different values of t, are readily available in the form of tables
and may be seen in all books of statistics.
Since, the curve is symmetric about z = 0 at x =  as given above, therefore,

1
P (- < x  0) = P (0  x < ) =
2

x
1
1  z2
As when z 

, P (- < z < ) =
(2 ) 
e 2
dz = 1,

 1
1  z2 1
Whereas P (z  0) = P (z  0) =
(2 )
e
0
2
dz =
2
.

Suppose we wish to obtain the probability of x lying between x1 and x2 then

x2 z2 1
1 1  z2

 2 x1  e 2 dz
 ( x   ) 2 ( 2 2 )
P (x1  x  x2) = e dx =
(2 ) z1
x1   x2  
Where, z1  and z 2 
 
1  z2
  2 z2
1 z1 1
 z2 

Then P (x1  x  x2) =   e dz   e 2 dz = P2 (z) – P1 (z)
(2 ) 
0 0 


If x1 lies on the right side of the line of mean of the normal curve, then we may also
conclude that
 1 z1 1
1  z2 1  z12 1
Similarly P (x  x1) = 
(2 ) 0
e 2
dz - 
(2 ) 0
e 2
dz1 = - (z1)
2
0 1 z1 1
1  z2 1  z12 1
And P (x  x1) = e
(2 ) 
2
dz +
(2 ) 0
e 2
dz1 =
2
+ (z1)

Where Where z1 and z2 are same as defined above.

Also, (- x1) = - (x1).
(Exercise: Prove all the above properties)

Problems related to area under the curve:-

Problem 1: Let  = 10,  = 50, then find P (60  x  70)
60  50
Solution: When x = 60, z   1,
10
and similarly for x = 70, z = 2.
Then,
P (60  x  70) = Area from z = 1 to z = 2
= ( Area from z = 0 to z = 2) – ( Area from z = 0 to z = 1)
= (2) - (1) = 0.477 – 0.3413 = 0.1359

Problem 2: Let  = 10,  = 50, then find P (40  x  60)

40  50
Solution: When x = 40, z   1 ,
10
and similarly for x = 60, z = 1.
Then,
P (40  x  60) = Area from z = -1 to z = 1
= 2 (Area from z = 0 to z = 1) ( By symmetry)
= 2 (1) = 2  0.3413 = 0.6823

Problem 3: Let  = 10,  = 50, then find P (30  x  40)

Solution: When x = 30, z = -2 and for x = 40, z = -1.
P (30  x  40) = Area from z = -2 to z = -1
= (Area from z = 1 to z = 2) ( By symmetry)
= 0.1359 (Same as Problem 1)

Problems of Normal Distribution

Question 1: If the heights of 300 students are normally distributed with mean 64.5
inches and standard deviation 3.3 inches, how many students have heights
i) Less than 5 feet, i.e., 60 inches,
ii) Between 5 feet and 5 feet 9 inches.
Also find the height below which 99% of the students lie.

Question 2: The distribution of weekly wages for 500 workers in a factory is

approximately normal with the mean and standard deviation of Rs. 75 and Rs. 15. Find
the number of workers who receive weekly wages
i) More than Rs. 90 ii) Less than Rs. 45

Question 3: In a normal distribution, 31% of the items are under 45 and 8% are over 64.
Find the mean and standard deviation of the distribution.

Test of Hypothesis

Hypothesis Tests: An Introduction

To test a certain theory or belief about a population parameter say mean, variance, proportion..

Types of Hypothesis
There are two types of hypothesis
Null Hypothesis
Alternative Hypothesis

A null hypothesis is a claim or statement about a population parameter that is assumed to be true
until it is declared false. An alternative hypothesis is a claim about a population parameter that
will be true if the null hypothesis is false.

Hypothesis Building

Example 1: In the past a machine has produced washers having a mean thickness of 0.050 inch.
To determine whether the machine is in proper working order a sample of 10 washers is chosen
for which the mean thickness is 0.053 inch and the standard deviation is 0.003 inch. Test the
hypothesis that the machine is in proper working order.

The null hypothesis states that a given claim about a population parameter is true. In the given
example population parameter is mean. The claim to be tested is that the machine is in proper
working order may or may not be true. The claim is true when the mean is 0.050 inches. Therefore
the null hypothesis will be that the mean is 0.050 inches therefore alternative hypothesis is mean
is not 0.050 inches. Which we write as

H0: µ = 0.050

H1 or Ha: µ ≠ 0.050

Example 2: The percentage of people who prefer specific seat in the plane where they fly. A survey
shows that 61% of the adults prefer a window seat, 38% prefer an aisle seat, and the only 1%
prefer the middle seat. These results are based on a sample of 806 adults. Suppose that the result
were true for the population of such adults at the time of the survey and that we want to check if
the current percentage of all adults who prefer the window seat when they fly is still 61%.
Suppose we take a random sample of 1000 adults and ask them which seat is their favorite when
they fly. Of them, 640 say that they prefer a window seat.

Here population parameter is proportion i.e., p.

H0: p = 0.61

H1 or Ha: p ≠ 0.61

Example 3: The lapping process which is used to grind certain silicon wafers to the proper
thickness is acceptable only if σ, the population standard deviation of the thickness of dice cut
from the wafers, is at most 0.50 mil. Use the 0.05 level of significance to test the claim, if the
thickness of 15 dice cut from such wafers have a standard deviation of 0.64 mil.

Here population parameter is proportion i.e., σ.

H0: σ = 0.50

H1 or Ha: σ > 0.50

Types of Errors
Type I Error: A type I error occurs when a true null hypothesis is rejected. This error is denoted
by ‘α’. The value of ‘α’ represents the probability of committing this error; that is

α = P (H0 is rejected | H0 is true). The value of α represents the significance level of the test.

Type II Error: A type I error occurs when a false null hypothesis is not rejected. This error is
denoted by ‘β’. The value of ‘β’ represents the probability of committing this error; that is β = P
(H0 is not rejected | H0 is false). The value of 1 - β is called the power of the test. It represents the
probability of not making a Type II error.

Four Possible Outcomes for a Test of Hypothesis

Actual Situation
H0 is True H0 is False

Do not reject H0 Correct decision Type II or β error

Decision
Reject H0 Type I or α error Correct decision

Tails of the Test: A two-tailed test has rejection regions in both tails, a left-tailed test has the
rejection region in the left tail, and a right-tailed test has the rejection region in the right tail of the
distribution curve.
Few more questions

Example 2: The mayor of a large city claims that the average net worth of families living in this
city is at least $300,000. A random sample of 25 families selected from this city produced a mean
net worth of $288,000. Assume that the net worths of all families in this city have a normal
distribution with the population standard deviation of $80,000. Using the 2.5% significance level,
can you conclude that the mayor’s claim is false?

Example 3: A potential buyer of fluorescent lamp bought 50 lamps of each of two brands, viz.,
Naional lamps and Indian lamps. Upon testing these lamps, he found that the brand ‘National’
had a mean life of 1,282 hours with standard deviation 80 hours, whereas, the brand Indian had
a mean life of 1,208 hours with a standard deviation 94 hours. At 5% level of significance, can the
buyer conclude that both brands have the same Mean life?

Example 4: To compare two kinds of bumper guards, six of each kind were mounted on a certain
make of a compact car. Then each car was run into a concrete wall at 5 miles per hours and the
following are the costs of the repairs.

Bumper Guard 1: 127 168 143 165 122 139

Bumper Guard 2: 154 135 132 171 153 149

Test at 0.01 level of significance whether the difference between the means of these two samples
is significant.

Process to test the hypothesis

• Build the Hypothesis
Formulation • Define the level of significance

• Identify the test statistic

Methodology • Identify the criteria for rejection of Null Hypothesis

• Calculation of Test Statistic

Analysis and • Decision based on criteria of rejection
Decision

Parametric Tests

Problem type Hypothesis Test statistic Criteria of rejection

Single mean H0: µ = µ0 If population variance is known: z > zα (Right Tailed);
𝑥̅ −𝜇
Ha: µ > µ0 (Right 𝑧= z < -zα (Left Tailed);
𝜎/√𝑛
tailed); |z|> zα/2
µ < µ0 (Left tailed); (Two Tailed)
µ ≠ µ0 (Two tailed) If population variance in unknown t > tα, n - 1
𝑥̅ −𝜇
and n < 30: 𝑡 = , where 𝑠 2 = (Right Tailed);
𝑠/√𝑛
∑(𝑥𝑖 − 𝑥̅ )2 t < -tα, n - 1
𝑛−1 (Left Tailed);
|t|> tα/2, n - 1
(Two Tailed)
Two means H0: µ1 - µ2 =  If population variance is known: z > zα (Right Tailed);
Ha: 𝑧=
̅̅̅
𝑥1̅−𝑥
̅̅̅2̅−𝛿
z < -zα (Left Tailed);
µ1 - µ2 >  𝜎 𝜎2
√ 1+ 2
2
|z|> zα/2
𝑛1 𝑛2
(Right tailed); (Two Tailed)
µ1 - µ2 <  If population variance in unknown t > tα, n - 1
(Left tailed); and n < 30: 𝑡 =
̅̅̅
𝑥1̅−𝑥
̅̅̅2̅−𝛿
, (Right Tailed);
1 1
µ1 - µ2 ≠  𝑠𝑝 √ +
𝑛1 𝑛2 t < -tα, n - 1
(Two tailed) Where (Left Tailed);
(𝑛1 −1)𝑠12 + (𝑛2 −1)𝑠22 |t|> tα/2, n - 1
𝑠𝑝2 = ;
𝑛1 +𝑛2 −2 (Two Tailed)
𝑠12and are sample variances of
𝑠22
sample 1 and 2 respectively.
𝑠 2 + 𝑠22
Here, 𝑠𝑝2 = 1
2
if n1 = n2
Several means H0: µ1 = µ2 = … = µn 𝑀𝑆 (𝑇𝑟) 𝐹 > 𝐹𝛼,𝑘−1,𝑁−𝑘
𝐹=
(ANOVA) (αi = 0, for all i) 𝑀𝑆𝐸
Ha: µ1 ≠ µ2 ≠ … ≠ µn 𝑆𝑆 (𝑇𝑟)
𝑀𝑆 (𝑇𝑟) =
(αi = 0, for at least one 𝑘−1
i) 𝑆𝑆𝐸
𝑀𝑆𝐸 =
𝑘 (𝑛 − 1)
𝑘 𝑛
1 2
𝑆𝑆𝑇 = ∑ ∑ 𝑥𝑖𝑗2 − 𝑇
𝑁 ∙∙
𝑖=1 𝑗=1
𝑘
1 1
𝑆𝑆(𝑇𝑟) = ∑ 𝑇𝑖∙2 − 𝑇∙∙2
𝑛 𝑁
𝑖=1
𝑆𝑆𝐸 = 𝑆𝑆𝑇 − 𝑆𝑆(𝑇𝑟)
Where 𝑇𝑖∙ is the total of ith row and
𝑇∙∙ is the grand total
Single proportion H0: p = p0 𝑥 − np0 z > zα (Right Tailed);
𝑧=
Ha: p > p0 √𝑛𝑝0 (1 − 𝑝0 ) z < -zα (Left Tailed);
(Right tailed); |z |> zα/2 (Two
p < p0 (Left tailed); Tailed)
p ≠ p0 (Two tailed)
Two proportions H0: p1 = p2 p1 − 𝑝2 z > zα (Right Tailed);
𝑧=
Ha: p1 > p2 (Right 1 1 z < -zα (Left Tailed);
√𝑝𝑞 ( + )
tailed); 𝑛1 𝑛2 |z |> zα/2 (Two
𝑛 𝑝 +𝑛 𝑝
p1 < p2 (Left tailed); Where 𝑝 = 1 1 2 2 Tailed)
𝑛1 +𝑛2
p1 ≠ p2 (Two tailed)
Several H0: pi1 = pi2 = pi3 = 𝑟
(𝑂𝑖𝑗 − 𝑒𝑖𝑗 )
𝑐 2
2 > 2 𝛼,(𝑟−1)(𝑐−1)
proportions …. = pic  = ∑∑
2
r is the number of
𝑒𝑖𝑗
Ha: All pi1 , pi2 , pi3 , 𝑖=1 𝑗=1
rows and c is the
… , pic are not equal Where 𝑂𝑖𝑗 and 𝑒𝑖𝑗 are observed and
number of columns
expected frequencies respectively.
𝑒𝑖𝑗
(𝑖 𝑡ℎ 𝑟𝑜𝑤 𝑡𝑜𝑡𝑎𝑙) × (𝑗𝑡ℎ 𝑐𝑜𝑙𝑢𝑚𝑛 𝑡𝑜𝑡𝑎𝑙)
=
𝐺𝑟𝑎𝑛𝑑 𝑇𝑜𝑡𝑎𝑙
Single variance H0: 2 = 02 (𝑛 − 1)𝑠 2
2 > 2 𝛼,𝑛−1
2 =
Ha: 2 > 02 (Right 𝜎 2 (Right Tailed);
tailed); 2 < 21−𝛼,𝑛−1
2 < 02 (Left tailed);  (Left Tailed);
2 ≠ 02 (Two tailed) 2 > 2 𝛼/2, 𝑛−1 or
2 < 21−𝛼/2, 𝑛−1
(Two Tailed)
Two variances H0: 12 = 22 𝐹=
𝑆𝑖2
, where I > j 𝐹 > 𝐹𝛼,𝑛1−1,𝑛2−1
𝑆𝑗2
Ha: 12 > 22 (Right (Right Tailed);
tailed); 12 < 22 (Left 𝐹 > 𝐹𝛼,𝑛2−1,𝑛1−1
tailed);  (Left Tailed);
12 ≠ 22 (Two tailed) 𝐹 > 𝐹𝛼, 𝑛 −1, 𝑛 −1
𝑖 𝑗
2
(Two Tailed)
Data fitness to a (𝑂𝑖 − 𝑒𝑖 )2 2 > 2 𝛼,
probability 2 = ∑
𝑒𝑖 Where  is degree
distribution 𝑖
Where 𝑂𝑖 and 𝑒𝑖 are observed of freedom
and expected frequencies
respectively
Practice Questions
1. A sample of 1000 students from a university was taken and their average weight was found
to be 112 pounds with a S.D. of 20 pounds. Could the mean weight of students in the
population be 120 pounds?
2. The heights of college students in a city are normally distributed with a S.D. of 6 cms. A
sample of 1000 students has mean height 158 cms. Test the hypothesis that the mean height
of college students in the city is 160 cms.
3. Intelligence tests on two groups of boys and girls gave the following results. Examine if the
difference is significant.

Mean S.D. Size

Girls 70 10 70
Boys 75 11 100
4. Two random samples of sizes 1000 and 2000 of farms gave an average yield of 2000 kg and
2050 kg respectively. The variance of wheat farms in the country may be taken as 100 kg.
Examine whether the two samples differ significantly in yield?
5. A sample of size of 600 persons selected at random from a large city shows that the percentage
of males in the sample is 53. It is believed that the ratio of males to the total population in the
city is 0.5. Test whether the belief is confirmed by the observation.
6. A random sample of 400 men and 600 women were asked whether they would to have a
school near their residence. 200 men and 325 women were in favor of the proposal. Test the
hypothesis that the proportion of men and women in favor of the proposal are same at 5%
level of significance.
7. Use the 0.05 level of significance to test the null hypothesis that  = 0.022 inch for the
diameters of certain wire rope against the alternative hypothesis that  ≠ 0.022 inch, given
that a random sample of size 18 yielded 𝑠 2 = 0.000324.
8. From the following two sample values find out whether they have come from the same
population:
Sample 1: 17 27 18 25 27 29 27 23 17
Sample 2: 16 16 20 16 20 17 15 21

9. The results of polls conducted 2 weeks and 4 weeks before an election, are shown in the
following table;
Two weeks before Four weeks before Total
For Candidate A 99 112 211
For Candidate B 101 88 189
Total 200 200 400

10. Fit a Poisson distribution to the following data and test the goodness of fit
x 0 1 2 3 4
f 112 73 30 4 1
11. As part of the investigation of the collapse of the roof of a building, a testing laboratory is
given all the available bolts that connected the steel structure at 3 different positions on the
roof. The forces required to shear each of these bolts (coded values) are as follows:
Position 1 90 82 79 98 83 91
Position 2 105 89 93 104 89 95 86
Position 3 83 89 80 94
Perform an analysis of variance to test at the 0.05 level of significance whether the
differences among the sample means at the 3 positions are significant.

Correlation and Regression

Recall the concept of correlation and regression that you studied in 10+2!
You, so far studied function of single random variables. Let extend it now!
If X and Y are two independent variables, they are not correlated but converse is not true. Can
you explain it?
Hint: 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋, 𝑌) = 𝐸[(𝑋 − 𝑋̅)(𝑌 − 𝑌̅)]
If X and Y are independent: 𝐸 (𝑋, 𝑌) = 𝐸(𝑋). 𝐸(𝑌)
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
Coefficient of correlation 𝑟 =
√𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋) √𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑌)
Sample variances for two random variables
1
𝑆𝑥𝑦 = ∑(𝑥𝑖 − 𝑥̅ )(𝑦𝑖 − 𝑦̅) = ∑ 𝑥𝑖 𝑦𝑖 − (∑ 𝑥𝑖 ) (∑ 𝑦𝑖 )
𝑛
1 2
𝑆𝑥𝑥 = ∑(𝑥𝑖 − 𝑥̅ )2 = ∑ 𝑥𝑖 2 − (∑ 𝑥𝑖 )
𝑛
1 2
𝑆𝑦𝑦 = ∑(𝑦𝑖 − 𝑦̅)2 = ∑ 𝑦𝑖 2 − (∑ 𝑦𝑖 )
𝑛
𝑆𝑥𝑦
Coefficient of correlation 𝑟 = 𝑆 𝑆
√ 𝑥𝑥 𝑦𝑦
Now recall, least square method to determine curve fitting related to data fitting to a straight
line, which is leading to the concept of line of regression.
If y = a + bx is the line of regression of y on x and the given data is (𝑥𝑖 , 𝑦𝑖 ), i = 1, 2, …, n
Can we write:
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌) 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
𝑏= 𝜎𝑥 2
; 𝑎 = 𝑦̅ − 𝜎𝑥 2
𝑥̅
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
Line of regression of y on x would be: 𝑦 − 𝑦̅ = 𝜎𝑥 2
(𝑥 − 𝑥̅ )
What would be the line of regression of x on y?
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌) 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 (𝑋,𝑌)
𝛽𝑌 = 𝜎𝑥 2
and 𝛽𝑋 = 𝜎𝑦 2
are called regression coefficient of y on x and x on y,
𝑋 𝑌
respectively.
a) Write these regression coefficients in terms of sample variances.
b) Establish the relationship between these regression coefficients and correlation
coefficient.
c) Find out the permissible range of correlation coefficient and what is significant of
intervals within this range?
Hypothesis testing
Problem type Hypothesis Test statistic Criteria of rejection
Population H0: ρ = 0 ∆𝑟 =
𝑟 √𝑛−2 ∆𝑟 > 𝑡𝛼,𝑛−2
Correlation Ha: ρ > 0 (Right tailed); √1 − 𝑟 2 (Right Tailed);
Coefficient ρ < 0 (Left tailed); (for small samples) ∆𝑟 < −𝑡𝛼,𝑛−2
ρ ≠ 0 (Two tailed) (Left Tailed);
∆𝑟 > 𝑡𝛼/2,𝑛−2 or
∆𝑟 < −𝑡𝛼/2,𝑛−2
(Two Tailed)
General Method: 𝑍 = z > zα (Right Tailed);
√𝑛−3
∙ ln
1+𝑟 z < -zα (Left Tailed);
2 1−𝑟
𝑆𝑥𝑦 |z|> zα/2
Where 𝑟= (Two Tailed)
√𝑆𝑥𝑥 ∙ 𝑆𝑦𝑦

Population H0:  = 0 Test Statistic: 𝑡 = 𝑡 > 𝑡𝛼,𝑛−2 (Right Tailed);

Regression Ha: ̂ −𝛽0
𝛽 (𝑛−2)𝑆𝑥𝑥 𝑡 < −𝑡𝛼,𝑛−2 (Left Tailed);
√
 (Right tailed); 𝜎
̂ 𝑛 𝑡 > 𝑡𝛼/2,𝑛−2 or 𝑡 <
𝑆
 (Left tailed); Where 𝛽̂ = 𝑥𝑦 and 𝜎̂ = −𝑡𝛼/2,𝑛−2 (Two Tailed)
𝑆𝑥𝑥
≠ (Two tailed)
1
√ (𝑆𝑦𝑦 − 𝛽̂ ∙ 𝑆𝑥𝑦 )
𝑛

Practice Questions
1. The table below shows the number of absences, x, in a Calculus course and the
final exam grade, y, for 7 students. Find the correlation coefficient and interpret
your result. Find the regression line of y on x.

x 1 0 2 6 4 3 3
y 85 80 70 55 90 90 95
2. The time x in years that an employee spent at a company and the employee’s
hourly pay, y, for 5 employees are listed in the table below. Calculate and interpret
the correlation coefficient r. Find the line of regression of y on x.

x 5 3 4 10 15
y 25 20 21 35 38
3. Considering x as number of hours that 10 persons studies for a French test and y
as their scores on the test. Given Σx = 100, Σy = 564, Σx2 = 1376, Σx2 = 36562 and
Σxy = 6945. Find the equation of least squares line that approximates the regression
of the test scores on the number of hours studied. Also find the correlation
coefficient between these two.

STOCK VICHAR GANN Calculator
No ratings yet
STOCK VICHAR GANN Calculator
154 pages
Lecture 13 (Discrete Probability Distribution)
No ratings yet
Lecture 13 (Discrete Probability Distribution)
14 pages
STA 211 Lecture 1
No ratings yet
STA 211 Lecture 1
18 pages
GB2 Q1 Week2
No ratings yet
GB2 Q1 Week2
5 pages
PTSP
No ratings yet
PTSP
101 pages
Slides-Probability and Random Processes, 4, March 2024
No ratings yet
Slides-Probability and Random Processes, 4, March 2024
116 pages
Binomial, Poisson & Normal Distribution
No ratings yet
Binomial, Poisson & Normal Distribution
38 pages
Chapter 2 Elemtry Vs Probabilty Distr
No ratings yet
Chapter 2 Elemtry Vs Probabilty Distr
77 pages
Session 4-6
No ratings yet
Session 4-6
69 pages
Random Variables and Distributions - New
No ratings yet
Random Variables and Distributions - New
84 pages
Pass The Architect Board Exam Made Easy
No ratings yet
Pass The Architect Board Exam Made Easy
63 pages
1853 - Random Variable & Distribution
No ratings yet
1853 - Random Variable & Distribution
43 pages
Probability
No ratings yet
Probability
37 pages
Probability 2
No ratings yet
Probability 2
37 pages
Unit 1 Ssmda Notes
No ratings yet
Unit 1 Ssmda Notes
35 pages
Common Probability
No ratings yet
Common Probability
47 pages
Business Mathematics (p166-177)
No ratings yet
Business Mathematics (p166-177)
12 pages
ENGLISH 2 - Set A 1st QUARTER TEST 2024-2025
100% (2)
ENGLISH 2 - Set A 1st QUARTER TEST 2024-2025
5 pages
Chapter 6 PPT Bio
No ratings yet
Chapter 6 PPT Bio
39 pages
Biostatistics - Probability - 02 October 2024
No ratings yet
Biostatistics - Probability - 02 October 2024
42 pages
Common Properties of Light Renzej - 20240309 - 201256 - 0000
No ratings yet
Common Properties of Light Renzej - 20240309 - 201256 - 0000
52 pages
Random Variable and Bivariate Distributions
No ratings yet
Random Variable and Bivariate Distributions
124 pages
Ramramesh in 2023 ...
No ratings yet
Ramramesh in 2023 ...
32 pages
Statistics Notes Part-2
No ratings yet
Statistics Notes Part-2
24 pages
MM3&4 - Probability and Distributions Summary Notes
No ratings yet
MM3&4 - Probability and Distributions Summary Notes
31 pages
Probability
No ratings yet
Probability
43 pages
Probability Distribution
0% (1)
Probability Distribution
21 pages
Probability Distribution
No ratings yet
Probability Distribution
20 pages
SSLC Result 2024 25 Division Wise
No ratings yet
SSLC Result 2024 25 Division Wise
7 pages
MBA 1st Sem Unit-4 Business Statistics
No ratings yet
MBA 1st Sem Unit-4 Business Statistics
13 pages
Types of Data Measurement Scales - Nominal, Ordinal, Interval, and Ratio
No ratings yet
Types of Data Measurement Scales - Nominal, Ordinal, Interval, and Ratio
51 pages
Probability Distributions.
No ratings yet
Probability Distributions.
46 pages
Stats
No ratings yet
Stats
25 pages
Bio101 Student Notes B
No ratings yet
Bio101 Student Notes B
13 pages
Stats Unit3
No ratings yet
Stats Unit3
19 pages
Probability Distribution
No ratings yet
Probability Distribution
14 pages
Prob Distributions
No ratings yet
Prob Distributions
12 pages
Bernoulli Distribution
No ratings yet
Bernoulli Distribution
16 pages
Different Types of Distributions
No ratings yet
Different Types of Distributions
12 pages
Probability Distributions 2
No ratings yet
Probability Distributions 2
36 pages
MAS 102 - Topic 1
No ratings yet
MAS 102 - Topic 1
13 pages
I-Flange SAE
No ratings yet
I-Flange SAE
59 pages
Bio 101 Hereditary Notes-Dr Anifowoshe
No ratings yet
Bio 101 Hereditary Notes-Dr Anifowoshe
10 pages
Unit 4.
No ratings yet
Unit 4.
22 pages
Introduction To Probability Distributions
No ratings yet
Introduction To Probability Distributions
73 pages
Probability Theory
No ratings yet
Probability Theory
8 pages
Topic 6 Probability Theory
No ratings yet
Topic 6 Probability Theory
43 pages
Assignment On Linear Algebra: Es1101: Computational Data Analysis
No ratings yet
Assignment On Linear Algebra: Es1101: Computational Data Analysis
52 pages
Understanding The Concepts of Probability
No ratings yet
Understanding The Concepts of Probability
10 pages
Sample Act Essay
100% (3)
Sample Act Essay
7 pages
Binomial Distribution
No ratings yet
Binomial Distribution
36 pages
Spirited Awayand Japanese Culture
No ratings yet
Spirited Awayand Japanese Culture
21 pages
(Lecture 4) Discrete Probability Distributions
No ratings yet
(Lecture 4) Discrete Probability Distributions
57 pages
DLL - MTB 1 - Q3 - W1
No ratings yet
DLL - MTB 1 - Q3 - W1
8 pages
Applications of Probability
No ratings yet
Applications of Probability
11 pages
Ifism Vol 8 and 12
No ratings yet
Ifism Vol 8 and 12
6 pages
De Cuong On Tap HKI Tieng Anh 8 Global
No ratings yet
De Cuong On Tap HKI Tieng Anh 8 Global
5 pages
Hospitality, Leisure, Sport and Tourism Education
No ratings yet
Hospitality, Leisure, Sport and Tourism Education
7 pages
S1) Basic Probability Review
No ratings yet
S1) Basic Probability Review
71 pages
Lec Note E6
No ratings yet
Lec Note E6
7 pages
Two Types of Writing - Creative and Letter Writing PDF
No ratings yet
Two Types of Writing - Creative and Letter Writing PDF
3 pages
Chapter 3 - Special Probability Distributions
No ratings yet
Chapter 3 - Special Probability Distributions
45 pages
Random Variable: The Term Random Variable Is Widely Used in Statistics. A Practical
No ratings yet
Random Variable: The Term Random Variable Is Widely Used in Statistics. A Practical
32 pages
We Get On Really Well SV
No ratings yet
We Get On Really Well SV
3 pages
What Makes Us Laugh
No ratings yet
What Makes Us Laugh
8 pages
Envinova Smartech Pvt. LTD.: Company Details
No ratings yet
Envinova Smartech Pvt. LTD.: Company Details
3 pages
Lecture On Random Variables Statistics
No ratings yet
Lecture On Random Variables Statistics
23 pages
Unit 2 Ma 202
No ratings yet
Unit 2 Ma 202
13 pages
1743 Chapter 4 Probability Distribution
No ratings yet
1743 Chapter 4 Probability Distribution
23 pages
Management of Anterior Shoulder Instability Without Bone Loss: Arthroscopic and Mini-Open Techniques
No ratings yet
Management of Anterior Shoulder Instability Without Bone Loss: Arthroscopic and Mini-Open Techniques
7 pages
PECB Certified ISO/IEC 27001: Lead Implementer
No ratings yet
PECB Certified ISO/IEC 27001: Lead Implementer
5 pages
Theoretical Distributions 2
No ratings yet
Theoretical Distributions 2
3 pages
Chapter - 4 Probability Distribution
No ratings yet
Chapter - 4 Probability Distribution
8 pages
LA Assignment - Team Ranking
No ratings yet
LA Assignment - Team Ranking
1 page
FD11A MCQ Midsemester
0% (1)
FD11A MCQ Midsemester
6 pages
Reading Practice
No ratings yet
Reading Practice
3 pages
The Beauty of Pagsanjan Falls
No ratings yet
The Beauty of Pagsanjan Falls
1 page
Welcome: To All PGDM Students
No ratings yet
Welcome: To All PGDM Students
47 pages
Probability 9.21.2019
No ratings yet
Probability 9.21.2019
20 pages
A Versions of Cause and Effect in Technology and Society
No ratings yet
A Versions of Cause and Effect in Technology and Society
2 pages
Differential Benefits of Volunteering Across The Life Course
No ratings yet
Differential Benefits of Volunteering Across The Life Course
11 pages
Chapter 3: Probability: Experiment
No ratings yet
Chapter 3: Probability: Experiment
12 pages
Functions and Dictionaries Assignment
No ratings yet
Functions and Dictionaries Assignment
1 page
Mughira 2
No ratings yet
Mughira 2
16 pages
Suzan S. Waryoba Vs Shija Dalawa
No ratings yet
Suzan S. Waryoba Vs Shija Dalawa
11 pages
LMS Content IVth Sem Module 3 PDF
No ratings yet
LMS Content IVth Sem Module 3 PDF
16 pages
Quantitative Techniques
No ratings yet
Quantitative Techniques
129 pages
CELTA Precourse Task New Key
No ratings yet
CELTA Precourse Task New Key
16 pages
Practice Sheet - Stats
No ratings yet
Practice Sheet - Stats
5 pages
Assignment 3
No ratings yet
Assignment 3
4 pages
BS UNIT 2 Note # 3
No ratings yet
BS UNIT 2 Note # 3
7 pages
Probability in A Nutshell
No ratings yet
Probability in A Nutshell
3 pages
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
No ratings yet
Probability Theory: Much Inspired by The Presentation of Kren and Samuelsson
27 pages
Bank Nifty Weekly FnO Hedging Strategy
No ratings yet
Bank Nifty Weekly FnO Hedging Strategy
5 pages
Elgenfunction Expansions Associated with Second Order Differential Equations
From Everand
Elgenfunction Expansions Associated with Second Order Differential Equations
E. C. Titchmarsh
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet

Statistics Handout

Uploaded by

Statistics Handout

Uploaded by

Statistics

Baye’s Rule of Probability

Solve the following question

Baye’s Rule in decision making

Baye’s Rule in game theory

Discrete Probability Distributions

One way to get exactly 2 heads: HHT

Total probability of two heads out of 3 tosses of coins = 3 x (1/2)2 x (1/2)

Find the mean and variance of Binomial Distribution.

Continuous Probability Distributions

(ii)  f ( x)dx  1 if -  X   and  f ( x)dx  1 if a  X  b.

The probability that x falls in the interval a  x  b is then written as

Characteristics of continuous probability distribution:

(iii) Moments: The rth moment about x = a is given by

and the rth moment about the mean x is given by

Here, if r = 2, this is equal to variance.

(v) Median Md is given by

(vi) Mode is the value of the variate for which

i.e., for which the probability f(x) is maximum.

Since the equation of normal curve is

We may summarize all its properties as below:

Properties of the Normal Distribution:-

1. Area under the normal curve is Unity.

Suppose we wish to obtain the probability of x lying between x1 and x2 then

Where Where z1 and z2 are same as defined above.

Problems related to area under the curve:-

Problem 2: Let  = 10,  = 50, then find P (40  x  60)

Problem 3: Let  = 10,  = 50, then find P (30  x  40)

Problems of Normal Distribution

Question 2: The distribution of weekly wages for 500 workers in a factory is

Hypothesis Tests: An Introduction

Here population parameter is proportion i.e., p.

Here population parameter is proportion i.e., σ.

H1 or Ha: σ > 0.50

Four Possible Outcomes for a Test of Hypothesis

Do not reject H0 Correct decision Type II or β error

Bumper Guard 1: 127 168 143 165 122 139

Bumper Guard 2: 154 135 132 171 153 149

Process to test the hypothesis

• Identify the test statistic

• Calculation of Test Statistic

Problem type Hypothesis Test statistic Criteria of rejection

Mean S.D. Size

Correlation and Regression

Population H0:  = 0 Test Statistic: 𝑡 = 𝑡 > 𝑡𝛼,𝑛−2 (Right Tailed);

You might also like