0% found this document useful (0 votes)
17 views22 pages

Aps QB Final

Uploaded by

s.sivaram2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views22 pages

Aps QB Final

Uploaded by

s.sivaram2002
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

JERUSALEM COLLEGE OF ENGINEERING

(An Autonomous Institution, Affiliated to Anna University, Chennai)

QUESTION BANK
Subject code : JMA1405 Subject Name : Applied Probability & Statistics
Year / Semester : II / 4 Regulation : 2021
Branch : B.E/B.Tech. AI&ML, AI&DS
CS & CS&BS

Knowledge Level as per Bloom’s Taxonomy:


K1-Remember
K2-Understand
K3-Apply
K4-Analyse
K5-Evaluate
K6-Create
Unit I
One – Dimensional Random Variables

Course Objective – Cob1: To provide basic concepts of discrete, continuous random variables and
standard distributions.

Course Outcome – CO1: To understand random variables and use standard distributions in solving
real time problems.

Bloom’s Taxonomy Level: K1, K2, K3, K4


Part A

S.No. Question BTL*


Define discrete and continuous random variable.
If X is a random variable that takes a finite number of values or
countably infinite number of values, then X is said to be a discrete random
variable.
Example: Marks obtained in a test
1. K1
If X is a random variable that takes continuous values in an interval
(i.e.) uncountably infinite number of values, then X is said to be a continuous
random variable.
Example: Duration of a telephone conversation

Find the constant 𝐴 if 𝑋 is a continuous random variable with density function


𝑓(𝑥) = 𝐴𝑥 2 , 0 < 𝑥 < 1
Solution:

W.K.T ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
1
∫0 𝐴 𝑥 2 𝑑𝑥 = 1
1
2. 𝐴 ∫0 𝑥 2 𝑑𝑥 = 1 K2
1
𝑥3
𝐴[3] = 1
0
1
𝐴 [3 − 0] = 1
𝐴=3

Define 𝑟 𝑡ℎ moment of a random variable 𝑋 of any type.


If X is a random variable of any type, ‘A ‘is a constant and ‘r ‘is a non-
negative integer, then the 𝑟 𝑡ℎ moment of X about A, is defined as follows
3. ∑𝑖(𝑥𝑖 − 𝐴)𝑟 𝑃(𝑥𝑖 ), 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒 𝑟. 𝑣 K1
𝜇𝑟′⁄ 𝑟
𝑋=𝐴 = 𝐸[𝑋 − 𝐴] = { ∞
∫−∞(𝑥 − 𝐴)𝑟 𝑓(𝑥)𝑑𝑥 , 𝑖𝑓 𝑋 𝑖𝑠 𝑎 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠 𝑟. 𝑣

Find the standard deviation of a random variable 𝑋 whose moment generating


3
function is 𝑀𝑋 (𝑡) = 3−𝑡
4. K2
Solution:
3
Given 𝑀𝑋 (𝑡) = 3−𝑡
3
⇒ 𝑀𝑋 (𝑡) = 𝑡
3[1− ]
3

𝑡 −1
= [1 − 3]
𝑡 𝑡 2 𝑡 𝑟
=1+ + [3] + ⋯ … … + [3] + ⋯.
3
𝑡 𝑡 2 𝑡 𝑟 𝑟!
𝑀𝑋 (𝑡) = 1 + + [3] + ⋯ … … + . + ⋯.
3 𝑟! 3𝑟
𝑡𝑟 𝑟!
Coefficient of in above expression is
𝑟! 3𝑟
𝑟!
⇒ 𝐸[𝑋 𝑟 ] = 3𝑟
1] 1! 1
Putting 𝑟 = 1; 𝐸[𝑋 = 31 = 3
2! 2
Putting 𝑟 = 2; 𝐸[𝑋 2 ] = 32 = 9
2 1 2 1
𝑉[𝑋] = 𝐸[𝑋 2 ] − {𝐸[𝑋]}2 = −( ) =
9 3 9
1
⇒ 𝜎2 = 9 [ Since standard deviation is positive square root of variance]
1
⇒𝜎=
3

Comment on the following “mean of a binomial distribution is 3 and variance


is 4”.
Solution:
𝑋~𝐵(𝑛, 𝑝)
5. Given, Mean = 3 ⇒ 𝑛𝑝 = 3 K2
Variance = 4 ⇒ 𝑛𝑝𝑞 = 4
𝑛𝑝𝑞 4 4
=3⇒𝑞= >1 [not possible]
𝑛𝑝 3
Since probability does not exceed 1.
A random variable X has a uniform distribution over (−3,3). Compute
𝑃(|𝑋| < 2)
Solution:
Given X follows Uniform distribution in (−3,3)
1 1 1
𝑓(𝑥) = 𝑏−𝑎 = 3−(−3) = 6
1
⇒ 𝑓(𝑥) = 6 ; -3 < x < 3
6. K2
𝑃(|𝑋| < 2) = 𝑃(−2 < 𝑋 < 2)
2
= ∫−2 𝑓(𝑥)𝑑𝑥
2 1 1
= ∫−2 6 𝑑𝑥 = 6 [𝑥]2−2
4 2
=6=3
2
𝑃(|𝑋| < 2) =
3
If 𝑋 & 𝑌 are two Poisson random variables such that
𝑃[𝑋 = 1] = 𝑃[𝑋 = 2] & 𝑃[𝑌 = 2] = 𝑃[𝑌 = 3] then find 𝑉[𝑋 − 2𝑌]
7. Solution: K2
𝑋~𝑃(𝜆)
𝑒 −𝜆 𝜆𝑥
W.K.T 𝑃(𝑋 = 𝑥) = 𝑥!
Given 𝑃[𝑋 = 1] = 𝑃[𝑋 = 2]
𝑒 −𝜆 𝜆1 𝑒 −𝜆 𝜆2
=
1! 2!
2𝜆 = 𝜆2
𝜆2 − 2𝜆 = 0
⇒Mean of X, 𝜆 = 2
⇒Mean of X=Var(X)= 𝜆 = 2
Given 𝑃[𝑌 = 2] = 𝑃[𝑌 = 3]
𝑒 −𝜆 𝜆2 𝑒 −𝜆 𝜆3
=
2! 3!
1 𝜆
= 3.2
2
𝜆=3
⇒Mean of Y, 𝜆 = 3
⇒Mean of Y=Var(Y)= 𝜆 = 3
𝑉𝑎𝑟(𝑋 − 2𝑌) = 𝑉𝑎𝑟(𝑋) + 4𝑉𝑎𝑟(𝑌) = 2 + 4(3) = 14

Define exponential random variable and write its memory less property.
A continuous random variable X is said to follow exponential
distribution, if its probability density function is,
𝑓(𝑥) = 𝜆𝑒 −𝜆𝑥 ; 𝑥 ≥ 0
8. Memory less property K1
If X is exponentially distributed then 𝑃( 𝑋 > 𝑠 + 𝑡 ⁄𝑋 > 𝑠) = 𝑃(𝑋 >
𝑡)
for any 𝑠, 𝑡 > 0.

Write any two properties of normal distribution.


• The curve is bell shaped and symmetrical about the line 𝑥 = 𝜇.
9. • Mean, median and mode of the distribution coincide. K1
• X-axis is asymptote to the curve.

If 𝑋 is a normal random variable with mean 3 and variance 4, find 𝑃[𝑋 > −1].
Solution:
Given : Mean = 𝜇 = 3
Variance = 𝜎 2 = 4 ⇒ 𝜎 = 2
𝑋−𝜇 𝑋−3
Let 𝑍 = 𝜎 = 2
𝑋−𝜇 −1−𝜇
10. 𝑃(𝑋 > −1) = 𝑃( > ) K2
𝜎 𝜎
−1−3
= 𝑃( 𝑍 > 2 )
= 𝑃(𝑍 > −2 )
= 0.5 + 𝑃(0 < 𝑍 < 2)
= 0.4 + 0.4772 (Table value)
= 0.9772
Part B

S.No. Question BTL


*
(i) If 𝑋 is a random variable with the following probability distribution
find 𝑎, 𝑃[𝑋 < 3], 𝑃[𝑋 ≥ 3], 𝑃[0 < 𝑋 < 5].

𝑥 0 1 2 3 4 5 6 7 8
𝑃(𝑥) 𝑎 3𝑎 5𝑎 7𝑎 9𝑎 11𝑎 13𝑎 15𝑎 17𝑎
Also find the smallest value of 𝑥 such that 𝑃[𝑋 ≤ 𝑥] > 0.5 and the cumulative
distribution function of 𝑋.

(ii) If 𝑋 is a random variable with the following probability distribution find


𝑘, 𝑃[𝑋 < 2], 𝑃[𝑋 ≥ 2], 𝑃[−2 < 𝑋 < 2].

𝑥 −2 −1 0 1 2 3
1. 𝑃(𝑥) 1/10 𝑘 2/10 2𝑘 3/10 3𝑘 K4
Also find the smallest value of 𝑥 such that 𝑃[𝑋 ≤ 𝑥] ≥ 0.36 and the cumulative
distribution function of 𝑋.

(iii) Let 𝑋 be a random variable such that 𝑃[𝑋 = −2] = 𝑃[𝑋 = −1] = 𝑃[𝑋 =
1] = 𝑃[𝑋 = 2] & 𝑃[𝑋 < 0] = 𝑃[𝑋 = 0] = 𝑃[𝑋 > 0] . Determine probability
function and distribution function of 𝑋

(iv) Let 𝑋 be a random variable with probably function


1
𝑃[𝑋 = 𝑥] = 2𝑥 , 𝑥 = 1,2,3,4, … find 𝑃[𝑋 is even], 𝑃[𝑋 is odd], 𝑃[𝑋 ≥ 4],
𝑃[𝑋 is a multiple of 3]

(i) The mileage 𝑋 in thousands of miles of a particular type of car type is a random
1 −𝑥
variable with probability function 𝑓(𝑥) = 20 𝑒 ⁄20 , 𝑥 ≥ 0. Find probability that
the tyre will have mileage (a) at most 10000 miles (b) anywhere between 16000 to
24000 miles (c) at least 30000 miles.

(ii) The quantity of bread in hundreds of kilograms that a bakery is able to sell in a
day is a random variable with density function 𝑓(𝑥) =
𝐴𝑥, 0≤𝑥<5
{𝐴(10 − 𝑥), 5 ≤ 𝑥 < 10
2. 0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 K4
Find (a) A (b) probability that in a day sales is more than 500 kgs (c) probability
that in a day sales is less than 500 kgs (d) probability that in a day sales is between
250 and 750 kgs.

(iii) 𝑋 is a random variable with probability function


𝑘𝑥, 0<𝑥<2
𝑓(𝑥) = { 2𝑘, 2 < 𝑥 < 4 , find cumulative distribution function of 𝑋.
6𝑘 − 𝑘𝑥, 4 < 𝑥 < 6
(iv) If a random variable 𝑋 has probability function given by
2𝑥, 0 ≤ 𝑥 ≤ 1 1 1 1
𝑓(𝑥) = { find (a) 𝑃 [𝑋 < ] (b) 𝑃[ < 𝑋 < ]
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 2 4 2
3 1 3 1
(c) 𝑃 [𝑋 > 4⁄𝑋 > 2 ] (d) 𝑃 [ 𝑋 < 4⁄𝑋 > 2 ] (e) cdf of 𝑋
(i) Let 𝑋 be a random variable such that

𝑥 −1 0 1 2
𝑃(𝑥) 0.3 0.1 0.4 0.2
2 ],
Find (a) 𝐸[𝑋] (b) 𝐸[𝑋 (c) 𝑉[𝑋] (d) 𝐸[2𝑋 + 1] (e) 𝑉[2𝑋 + 1]

3. (ii) A continuous random variable 𝑋 is distributed over [0,1] with density function K4
𝑎𝑥 2 + 𝑏𝑥 where 𝑎, 𝑏 are constants. Find 𝑎, 𝑏 if mean of 𝑋 is 0.5.

(iii) For a continuous random variable 𝑋 the relative frequency density is


𝑓(𝑥) = 𝑦0 𝑥(2 − 𝑥) , 0 ≤ 𝑥 ≤ 2. Find the 𝑟 𝑡ℎ moment about origin and hence
mean and variance of 𝑋.

1
A random variable 𝑋 has probability function 𝑃[𝑋 = 𝑥] = 2𝑥 , 𝑥 = 1,2,3,4, ….
4. K4
Find its MGF and hence mean, variance.
(i) Define binomial random variable. Find moment generating function and hence
mean and variance.
(ii) A man takes a step forward with probability 0.4 and a step backward with
probability 0.6. Find the probability that at the end of 11 steps he is one step away
from the starting point.
5. (iii) Out of 800 families with 4 children each, how many families would be expected K4
to have (a) 2 boys, 2 girls (b) at least 1 boy (c) at most 2 girls (d) children of both
gender.

(iv) Assume that half the population in a locality are vegetarians. If 100
investigators take 10 individuals to see whether they are vegetarians, how many
investigators are expected to report 3 or less were vegetarians.
(i) Define Poisson random variable. Find moment generating function and hence
mean and variance
(ii) If 𝑋 is a Poisson random variable such that 𝑃[𝑋 = 2] = 9𝑃[𝑋 = 4] + 90𝑃[𝑋 =
6] find its mean.

(iii) In a certain factory manufacturing blades, there is a chance of 1 out of 500 to


be defective. The blades are sold in packets of 10. Using Poisson distribution, find
6. approximate number of packets in a consignment of 10000 packets that will have K4
(a) no defective (b) 1 defective (c) 2 defectives.

(iv) It is known from past experiences that in a certain factory, there are on an
average 4 industrial accidents in a year. Find the probability that in a given year,
there will be (a) less than 3 accidents (b) no accidents (c) more than half the average
number of accidents.
(i) Define uniform distribution. Find moment generating function, 𝑟 𝑡ℎ moment
about origin and hence find mean, variance.

7. (ii) If X is uniformly distributed over ( 0,10), find the probability that (a) 𝑃(𝑋 <
2),
(b) 𝑃(𝑋 > 8) (c) 𝑃(3 < 𝑋 < 9).
(i) Define exponential random variable. Find moment generating function and
hence mean and variance.
(ii) State and prove memoryless property of exponential random variable.

(iii) The amount of time that a watch will run without having to be reset is a random
variable having an exponential distribution with mean 120 days. Find the
probability that such a watch will (a) have to be reset in less than 24 days (b) not
have to be reset in at least 180 days

8. (iv) The length of a shower on a tropical island during rainy season has an K4
exponential distribution with parameter 2, time being measured in minutes. What
is the probability that a shower will last more than 3 minutes? If a shower has
already lasted for 2 minutes what is the probability that it will last for at least one
more minute?

(v) The time required to repair a machine is exponentially distributed with


1
parameter 2 , time measured in hours. What is the probability that the repair exceeds
2 hours? What is the probability that the repair takes at least 10 hours given that the
repair time exceeds 9 hours?

(i) If 𝑋 is normally distributed with mean 12 and standard deviation 4, find


(a) 𝑃[𝑋 ≤ 20] (b) 𝑃[𝑋 ≥ 20] (c) 𝑃[0 ≤ 𝑋 ≤ 12]
(ii) The lifetime of a certain type of electronic device has a mean of 300 hours and
standard deviation of 25 hours. Assuming normal distribution (a) find probability
that a device has lifetime beyond 350 hours (b) what percentage of devices will
have a lifetime between 220 and 260 hours.

(iii) As a result of test conducted on electric bulbs manufactured by a company, it


was found that the lifetime of a bulb is normally distributed with average life of
2040 hours and standard deviation of 60 hours. Estimate the number of bulbs out
9. of 20000 produced on a day, expected to have (a) more than 2150 hours lifetime (b) K4
less than 1960 hours lifetime.

(iv) Assume that the mean height of soldiers is 68.22 inches with standard deviation
of 10.8 inches. How many soldiers in a regiment of 1000, would you expect to be
over 6 feet tall, assuming normal distribution.

(v) The marks obtained by students in an exam in Mathematics, Physics and


Chemistry are normally distributed with means 65, 75, 80 and standard deviation
11, 7, 5 respectively. If a student is selected at random what is the probability that
he/she has secured a total of (a) 250 or more (b) 200 or less (c) between 250 and
280
Unit II
Two – Dimensional Random Variables

Course Objective – Cob2: To introduce two dimensional random variables, correlation and
regression.

Course Outcome – CO2: To use joint density functions to perform correlation and regression analysis.

Bloom’s Taxonomy Level: K1, K2, K3, K4


Part A

S.No. Question BTL*


Define marginal probability function of 𝑋 and 𝑌 , if (𝑋, 𝑌) is a two-
dimensional discrete random variable.
Marginal probability function of X:
The marginal probability function of X is defined as
𝑃𝑋 (𝑥𝑖 ) = 𝑃( 𝑋 = 𝑥𝑖 ) = ∑𝑗 𝑝.𝑗 The set of values {𝑥𝑖 , 𝑝𝑖 . } is called marginal
1. distribution of X. K1
Marginal probability function of Y:
The marginal probability function of Y is defined as
𝑃𝑌 (𝑦𝑗 ) = 𝑃( 𝑌 = 𝑦𝑗 ) = ∑𝑖 𝑝𝑖 . The set of values {𝑦𝑗 , 𝑝.𝑗 } is called marginal
distribution of Y.

Define conditional probability function of 𝑋 given 𝑌 and 𝑌 given 𝑋 if (𝑋, 𝑌) is


a two-dimensional discrete random variable.
The conditional probability function of 𝑋 given 𝑌 is defined as
𝑃[𝑋=𝑥𝑖 ∩𝑌= 𝑦𝑗 ]
2. 𝑃(𝑋 = 𝑥𝑖 ⁄𝑌 = 𝑦𝑗 ) = K1
𝑃[𝑌= 𝑦𝑗 ]
The conditional probability function of 𝑌 given 𝑋 is defined as
𝑃[𝑋=𝑥𝑖 ∩𝑌= 𝑦𝑗 ]
𝑃(𝑌 = 𝑦𝑗 ⁄𝑋 = 𝑥𝑖 ) = 𝑃[𝑋= 𝑥𝑖 ]
Define marginal probability function of 𝑋 and 𝑌 , if (𝑋, 𝑌) is a two-
dimensional continuous random variable.
If (𝑋, 𝑌) is a two-dimensional continuous random variable with joint PDF
𝑓𝑋𝑌 (𝑥, 𝑦), then the marginal probability distribution function of X and Y are
given by
3. 𝑓𝑋 (𝑥) = ∫ 𝑓𝑋𝑌 (𝑥, 𝑦)𝑑𝑦 K1
𝑅𝑌

𝑓𝑌 (𝑦) = ∫ 𝑓𝑋𝑌 (𝑥, 𝑦)𝑑𝑥


𝑅𝑋

Define conditional probability function of 𝑋 given 𝑌 and 𝑌 given 𝑋 if (𝑋, 𝑌) is


4. K1
a two-dimensional continuous random variable.
If (𝑋, 𝑌) is a two-dimensional continuous random variable with joint PDF
𝑓 (𝑥,𝑦)
𝑓𝑋𝑌 (𝑥, 𝑦), then the conditional PDF of X is given by 𝑓𝑋⁄𝑌 (𝑥⁄𝑦) = 𝑋𝑌 (𝑦) .
𝑓𝑌
𝑓𝑋𝑌 (𝑥,𝑦)
Similarly the conditional PDF of Y is given by 𝑓𝑌⁄𝑋 (𝑦⁄𝑥) = 𝑓𝑋 (𝑥)
Find the constant 𝑘 if (𝑋, 𝑌) is a two-dimensional random variable with joint
probability 𝑃(𝑥, 𝑦) = 𝑘(2𝑥 + 3𝑦), 𝑥 = 0,1,2 & 𝑦 = 1,2,3
Solution:

X 0 1 2

Y
1 3k 5k 7k
5. K2
2 6k 8k 10k
3 9k 11k 13k
∑ ∑
W.K.T 𝑖 𝑗 𝑃(𝑥𝑖 , 𝑦𝑗 ) = 1
3𝑘 + 5𝑘 + 7𝑘 + 6𝑘 + 8𝑘 + 10𝑘 + 9𝑘 + 11𝑘 + 13𝑘 = 1
72𝑘 = 1
1
𝑘=
72

Find the constant 𝑘 if (𝑋, 𝑌) is a two-dimensional random variable with joint


density 𝑓(𝑥, 𝑦) = 𝑘𝑥 2 𝑦, 0 < 𝑥, 𝑦 < 1
Solution:
To find k:
W.K.T ∫𝑅 ∫𝑅 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1
𝑌 𝑋
1 1
∫0 ∫0 𝑘𝑥 2 𝑦𝑑𝑥𝑑𝑦 = 1
1
1 𝑥3
𝑘 ∫0 [ 3 ] 𝑦𝑑𝑦 = 1
0
6. 1 K2
1
𝑘 ∫ ( ) 𝑦𝑑𝑦 = 1
0 3
1
𝑘 𝑦2
[ ] =1
3 2 0
𝑘 1
[ ]=1
3 2
𝑘=6
∴ 𝑓(𝑥, 𝑦) = 6𝑥 2 𝑦, 0 < 𝑥, 𝑦 < 1.
Define covariance and show that 𝐶𝑜𝑣(𝑋, 𝑌) = 0 , if 𝑋 & 𝑌 are independent.
(i)If X and Y are any two random variables, then the covariance between
them is given by 𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[𝑋𝑌] − 𝐸[𝑋]𝐸[𝑌].
(ii) If X and Y are independent then 𝐶𝑜𝑣(𝑋, 𝑌) = 0
7. Proof: K1
Given X and Y are independent then 𝐸[𝑋𝑌] = 𝐸[𝑋]𝐸[𝑌] → (∗)
W.K.T 𝐶𝑜𝑣(𝑋, 𝑌) = 𝐸[𝑋𝑌] − 𝐸[𝑋]𝐸[𝑌]
⇒ 𝐶𝑜𝑣(𝑋, 𝑌) = 0 Using (*)

8. What is correlation analysis? K1


Correlation analysis deals with the association between two or more
variables. It helps to determine the association between degree of relationship
between variables.
Distinguish between (i) positive and negative correlation (ii) linear and
nonlinear correlation.
(i) Positive and negative correlation:
If increase (or decrease) in one variable results in increase (or decrease) in the
other variable, then correlation is said to be positive.
9. If increase (or decrease) in one variable results in decrease (or increase) in K1
other variable, then correlation is said to be negative.
(ii) Linear and nonlinear correlation
If the change between the variables is a constant, then we say that there is a
linear correlation. Otherwise it is said to be Non-linear or Curvilinear
correlation.
Show that correlation coefficient is geometric mean of the regression
coefficients
𝜎 𝜎
W.K.T 𝑏𝑋𝑌 = 𝜌𝑋𝑌 . 𝜎𝑋 ; 𝑏𝑌𝑋 = 𝜌𝑌𝑋 . 𝜎𝑌
𝑌 𝑋
𝜎𝑋 𝜎
𝑏𝑋𝑌 . 𝑏𝑌𝑋 = 𝜌𝑋𝑌 . 𝜎 . 𝜌𝑌𝑋 . 𝜎𝑌
10. 𝑌 𝑋 K2
2
𝑏𝑋𝑌 . 𝑏𝑌𝑋 = 𝜌𝑋𝑌
𝜌𝑋𝑌 = ±√𝑏𝑋𝑌 . 𝑏𝑌𝑋
⇒ 𝜌𝑋𝑌 is geometric mean between 𝑏𝑋𝑌 &𝑏𝑋𝑌

Part B

S.No. Question BTL *


(i) Let (𝑋, 𝑌) be a two-dimensional random variable with the following joint
mass function.

𝑌 1 2 3 4 5 6
𝑋
0 0 0 1/32 2/32 2/32 3/32
1 1/16 1/16 1/8 1/8 1/8 1/8
2 1/32 1/32 1/64 1/64 0 2/64

1. Find (a) 𝑃[𝑋 ≤ 1] (b) 𝑃[𝑌 ≤ 3] (c) 𝑃[𝑋 ≤ 1, 𝑌 ≤ 3] (d) 𝑃[ 𝑋 ≤ 1⁄𝑌 ≤ 3 ] K4


(e) 𝑃[ 𝑌 ≤ 3⁄𝑋 ≤ 1] (f) 𝑃[𝑋 + 𝑌 ≤ 4] (g) marginal functions of 𝑋 and 𝑌
(h) conditional distribution of 𝑋 given 𝑌 and 𝑌 given 𝑋.

(ii) A two-dimensional random variable (𝑋, 𝑌) has a bivariate distribution


𝑥 2 +𝑦
given by 𝑓(𝑥, 𝑦) = 32 , 𝑥 = 0,1,2,3 & 𝑦 = 0,1. Find (a) 𝑃[𝑋 ≥ 2]
(b) 𝑃[𝑌 ≤ 1] (c) 𝑃[𝑋 ≥ 2, 𝑌 ≤ 1] (d) 𝑃[𝑋 ≥ 2/𝑌 ≤ 1]
(e) 𝑃[𝑌 ≤ 1/𝑋 ≥ 2] (f) 𝑃[𝑋 + 2𝑌 ≤ 3] (g) marginal distributions of 𝑋 and
𝑌 (h) conditional distribution of 𝑋 given 𝑌 = 1 (i) conditional
distribution of 𝑌 given 𝑋 = 2
(iii) The joint probability mass function of (𝑋, 𝑌) is given by
𝑃(𝑥, 𝑦) = 𝑘(2𝑥 + 3𝑦), 𝑥 = 0,1,2 & 𝑦 = 1,2,3. Find (a)marginal
distributions of 𝑋 and 𝑌 (b) conditional distribution of 𝑋 given 𝑌 = 3 (c)
conditional distribution of 𝑌 given 𝑋 = 1 (d) probability distribution of 𝑋 +
𝑌 (e) 𝑃[𝑋 + 𝑌 > 3]
(i) If 𝑋 & 𝑌 are two random variables having joint density
𝑘(6 − 𝑥 − 𝑦), 0 ≤ 𝑥 ≤ 2, 2 ≤ 𝑦 ≤ 4
𝑓(𝑥, 𝑦) = { find
0, 𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒
(a) 𝑘 (b) 𝑃[𝑋 < 1 ∩ 𝑌 < 3] (c) 𝑃[ 𝑋 < 1⁄𝑌 < 3] (d) 𝑃[𝑋 + 𝑌 < 3]

(ii) Suppose that a two-dimensional continuous random variable (𝑋, 𝑌) has


2. joint probability density function given by 𝑓(𝑥, 𝑦) = 6𝑥 2 𝑦, 0 < 𝑥, 𝑦 < 1 K4

verify that ∫𝑅 ∫𝑅 𝑓(𝑥, 𝑦)𝑑𝑥𝑑𝑦 = 1. Also find


𝑌 𝑋
3 1
(a) 𝑃 [0 < 𝑋 < 4 , 3 < 𝑌 < 2] (b) 𝑃[𝑋 + 𝑌 < 1] (c) 𝑃[𝑋 > 𝑌]
(d) 𝑃[𝑋 < 1⁄ 𝑌 < 2 ]

(i) The joint probability density function of (𝑋, 𝑌) is given by


2 2
𝑓(𝑥, 𝑦) = 4𝑥𝑦𝑒 −(𝑥 +𝑦 ) , 𝑥 ≥ 0, 𝑦 ≥ 0. Are 𝑋 and 𝑌 independent? Also find
conditional densities of 𝑋⁄𝑌 and 𝑌⁄𝑋
(ii) If the joint density of a two-dimensional continuous random variable is
3.
given by 𝑓(𝑥, 𝑦) = 𝑘𝑥𝑦, 0 ≤ 𝑦 ≤ 𝑥 ≤ 1 find (a) 𝑘 (b) marginal densities
of X and Y (c) conditional densities of 𝑋⁄𝑌 and 𝑌⁄𝑋
1 1
(d) 𝑃 [𝑋 < 2 ∩ 𝑌 < 4]. Are 𝑋 and 𝑌 independent?

(i) Find correlation coefficient for (𝑋, 𝑌) whose joint distribution is as given
below.
𝑋
−1 1
𝑌
0 1/8 3/8
1 2/8 2/8
4. (ii) The following table gives the joint distribution of a two-dimensional K4
random variable(𝑋, 𝑌). Find 𝐸[𝑋], 𝐸[𝑌]& 𝐸[𝑋𝑌]. Are 𝑋 & 𝑌 correlated, if so
find correlation coefficient.
𝑋
0 1 2 3
𝑌
2 1/8 1/8 1/8 1/8
3 1/16 1/8 0 1/16
4 1/16 0 1/8 1/16
(i) The joint probability density function of a two-dimensional random
𝑥 + 𝑦, 0 < 𝑥, 𝑦 < 1
variable (𝑋, 𝑌) is given by 𝑓(𝑥, 𝑦) = { . Find the
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
correlation coefficient between 𝑋 & 𝑌
5.
(ii) If the joint density function of a two-dimensional random variable(𝑋, 𝑌) is
2 , 0<𝑥<𝑦<1
given by 𝑓(𝑥, 𝑦) = { , find the correlation coefficient
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
between 𝑋 & 𝑌
(i) Given the joint probability density function of a two-dimensional random
variable(𝑋, 𝑌) to be 𝑓(𝑥, 𝑦) = 𝑥𝑒 −𝑥(𝑦+1) , 𝑥 ≥ 0, 𝑦 ≥ 0, find the regression
curve of 𝑌 on 𝑋.
6. (ii) The joint probability density function of a two-dimensional random K4
𝑥+𝑦
variable(𝑋, 𝑌) is 𝑓(𝑥, 𝑦) = 3 , 0 < 𝑥 < 1, 0 < 𝑦 < 2. Find the regression
equations of 𝑋 on 𝑌 and 𝑌 on 𝑋.

The variables 𝑋 and 𝑌 have joint probability density function given by


6(1 − 𝑥 − 𝑦), 𝑥 ≥ 0, 𝑦 ≥ 0, 𝑥 + 𝑦 < 1
7. 𝑓(𝑥, 𝑦) = { . Find the regression
0, 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
equations of 𝑋 on 𝑌 and 𝑌 on 𝑋. Hence find the correlation coefficient.
(i) If 𝑋1 , 𝑋2 ,𝑋3 ,…,𝑋100 are independent and identical random variables with
1
mean 2 and variance 4 , find 𝑃[192 < 𝑆100 < 210]

where 𝑆100 = 𝑋1 + 𝑋2 + ⋯ + 𝑋100


(ii) If 𝑋1 , 𝑋2 ,𝑋3 ,…,𝑋𝑛 are independent and identical random variables with
mean 2 and variance 2 , find 𝑃[120 ≤ 𝑆𝑛 ≤ 160] given 𝑛 = 75 using central
limit theorem.
8. K4
(iii) If 𝑋1 , 𝑋2 , 𝑋3 , 𝑋4 are independent uniform random variables with mean 500
and variance 833.33, find 𝑃[1900 ≤ 𝑋1 + 𝑋2 + 𝑋3 + 𝑋4 ≤ 2100] using
central limit theorem.

(iv) If 𝑋𝑖 , 𝑖 = 1,2,3, … ,50 are independent Poisson random variables with


mean and variance 0.03, find 𝑃[𝑆𝑛 ≥ 3] using central limit theorem given
𝑆𝑛 = 𝑋1 + 𝑋2 + ⋯ + 𝑋50
Unit III
Testing of Hypothesis

Course Objective – Cob3: To acquaint students with statistical testing of hypothesis and its
applications.

Course Outcome – CO3: To apply hypothesis testing for making statistical inferences in large and
small sample real life problems
Bloom’s Taxonomy Level: K1, K2, K3, K4
Part A

S.No. Question BTL*


Define sample and population.
Sample: A finite subset of statistical individuals in a population is called
1. Sample K1
Population: The group of individuals under study is called population. The
population may be finite or infinite
Define parameter and statistic.
Parameter: A numerical measure of a population is called a population
2. parameter or simply a parameter. K1
Statistic: A numerical measure of the sample is called a sample statistic or
simply a statistic.
What is test of hypothesis?
A test of hypothesis is a statistical method to determine if there is enough
3. K1
evidence in sample data to support or reject specific statement about a
population parameter.

Define null and alternative hypothesis.


Null Hypothesis (𝑯𝟎 ): The assertion that there is no significant difference or
4. K1
effect.
Alternative Hypothesis (𝑯𝟏 ): The statement suggesting a significant
difference or effect.
What do you mean type I and type II errors?
Type I error: If we reject a hypothesis when it should be accepted, we say that
5. type I error. K1
Type II error: If we accept a hypothesis when it should be rejected, we say
that a type II error.
Define two tailed test.
Two tailed test is one where the hypothesis about the population parameter is
6. K1
rejected for the value of sample statistic falling into the either tails of the
sampling distribution.
Define critical region.
7. A region corresponding to a test statistic in the sample space which tends to K1
rejection of 𝐻0 (Null Hypothesis) is called critical region or region of rejection.
The region complementary to the critical region is called the region of
acceptance.
What do you mean by level of significance?
The probability ‘α’ (the probability of making type I error) that a random value
of the test statistic belongs to the critical region is known as the level of
8. K1
significance. In other words, level of significance is the size of the type I error.
The levels of significance usually employed in testing of hypothesis are 5%
and 1%
List out the applications of t –distribution.
1. To test the significant difference between the means of two independent
samples.
9. 2.To test the significant difference between the means of two dependent K1
samples or paired observation.
3. To test the significance of the mean of a random sample.
4. To test the significance of an observed correlation coefficient.
Explain the various uses of Chi-square test.
1.Test of goodness of fit
10. 2.Test of independence of attributes K1
3. Test of Homogeneity of independent estimates of the population correlation
coefficient

Part B

S.No. Question BTL


*
(i) A sample of 100 students was taken from a large population. The mean height of
students in this sample was 160cm. Can it be reasonably regarded that the mean
height is 165cm in the population. Assume that the population standard deviation is
10cm. Use a 1% level of significance.
1. K4
(ii) A manufacturer claims that the mean breaking strength of safety belts for air
passengers produced in his factory is 1275kg. A sample of 100 belts was tested and
the mean breaking strength was found to be 1258kg with standard deviation of 90kg.
Test the manufacturer’s claim at 1% level of significance
(i) A sample of heights of 6400 Englishmen has a mean of 170cm and standard
deviation of 6.4cm while sample of heights of 1600 Americans has a mean of 172cm
with standard deviation 6.3cm. Does the data indicate that Americans are taller than
Englishmen at 5% level of significance?
(ii) Intelligence tests were given to two groups of boys and girls chosen from the
same college and the following results were obtained.
2. K4
Standard
Size Mean
Deviation
Boys 100 73 10
Girls 60 75 8
From the data given can we conclude that girls are more intelligent than boys at
10% level of significance?
A sample of 100 bulbs of brand A had a mean lifetime of 1200hrs and standard
3. K4
deviation of 70hrs, while another sample of 120 bulbs of brand B gave a mean
lifetime of 1150hrs and standard deviation of 85hrs. Can we conclude that brand A
bulbs are superior to brand B bulbs at 5% level of significance assuming
(A) 2 different manufacturers
(B) Same manufacturer
(i) A random sample of 16 values from a normal population showed a mean of
103.75cm and sum of squares of deviations from this mean is 843.75cm2. Is the
assumption of mean as 108.75cm for the population reasonable under 1% level of
significance?
4. K4
(ii) The annual rainfall at a certain place is normally distributed with mean 30cm. If
the rainfall during the past 8 years is 31.1, 30.7, 24.3, 28.1, 27.9, 32.2, 25.4 and
29.1cm can we conclude that average rainfall over past 8 years is less than normal
rainfall?
(i) Two independent samples contained the following values of marks obtained out
of 25 in a test.
Sample 1 19 17 15 21 16 18 16 14
Sample 2 15 14 15 19 15 18 16
Is the difference between the sample means significant at 5% level of
significance?
5. K4
(ii) The following table shows the biological values of protein from cow’s milk and
buffalo’s milk taken from a dairy farm.
Cow’s milk 1.82 2.02 1.88 1.61 1.81 1.54
Buffalo’s milk 2.00 1.83 1.86 2.03 2.19 1.88
From the above data can we conclude that buffalo’s milk is more protein
rich than cow’s milk at 5% level of significance?
(i) The following data gives the marks obtained by 11 students in 2 tests. Test 1 is
conducted at the beginning of the year and test 2 is conducted at the end of the year
after intensive coaching. Does the data indicate that the students have benefitted by
coaching.
Test 1 19 23 16 24 17 18 20 18 21 19 20
Test 2 17 24 20 24 20 22 20 20 18 22 19
(ii)10 children were tested for memory skills and the scores are 8, 6, 5, 7, 6, 8, 7, 4,
5 and 6. A memory coach was hired and the children were trained by him for 3
6. K4
months. In a test conducted after training the scores were 10, 6, 7, 8, 6, 9, 7, 6, 7 and
7. Was hiring the memory coach productive?

(iii)The following data represents marks obtained by 12 students in 2 tests, 1 before


football world cup and the other after the world cup. Does the data indicate that the
football tournament has affected students’ performance?
Test 1 63 70 70 81 54 29 21 38 32 50 70 80
Test 2 55 60 65 75 49 25 18 30 35 55 61 72

(i) Two independent samples of 8 and 7 items respectively had the following values.
Sample 1 9 11 13 11 15 9 12 14
Sample 2 10 12 10 14 9 8 10 K4
7. Do the population variances differ significantly at 5% level of significance?

(ii) Two random samples from two populations gave the following observations.
Sample I 20 16 26 27 23 22 18 24 25 19
Sample II 17 23 32 25 22 24 28 18 31 33 20 27
Test whether the two populations have the same variance at 1% level of
significance.
(i) The following data gives the number of accidents that occur in a junction during
the working days of a week. Test whether the accidents are uniformly distributed
over the week.

Days Monday Tuesday Wednesday Thursday Friday Saturday


Number
8. K4
of 15 19 13 12 16 15
accidents

(ii) Theory predicts that the proportion of rice cultivation in 4 zones A, B, C and D
is 9:3:3:1. In an experiment, out of 1600 tons of rice, 882, 313, 287 and 118 tons
were from the 4 zones respectively. Does the experiment support theory?
(i) A total number of 3759 individuals were interviewed in a public opinion survey
on the proposal of conduction centre and state elections simultaneously. Of them,
1872 were men and the rest were women. 2257 individuals were in favour of the
proposal, 917 were opposed to it. 243 men were undecided and 442 women were
opposed to the proposal. Is there any association between gender and opinion?

(ii) Out of 1660 candidates who appeared for a competitive examination, 422 were
successful. Out of these 256 attended coaching classes and 150 of them came out
successful. Examine whether coaching and success are associated.
9. K4
(iii) A survey of music listener’s preference under various age groups is given
below. Is music preference influenced by age?

Age group
19 – 25 26 – 35 Above 36
Music preference Carnatic music 80 60 90
Film music 210 325 44
Folk music 16 45 135

The following data shows defective articles produced by 4 machines. Does the data
indicate significant difference in the performance of the machines?

Machine A B C D
10. Production K4
1 1 2 3
time (in hours)
Number of
12 30 63 98
defectives
Unit IV

ESTIMATION THEORY

Course Objective – Cob4: To develop the ability to apply the concepts of Estimation theory in
problems.

Course Outcome – CO4: To use theory of estimation in practical applications and problem solving.

Bloom’s Taxonomy Level: K1, K2, K3, K4

Part A

S.No. Question BTL*


What are the characteristics of a good Estimator?
Sol:
1. K1
Consistency, Unbiasedness, Efficiency & Sufficiency are the
characteristics of a good estimator.
Define consistency of an estimator.
2. Sol: K1
𝜃̂𝑛 is a consistent estimator of 𝜃 if lim 𝑃[|𝜃̂𝑛 − 𝜃| >∈] = 0 for all ∈> 0
𝑛→∞
Prove that in a sampling from 𝑁(µ, 𝜎 2 ) population, the sample mean is a
consistent estimator of population mean, µ.
Sol:
In sampling from a normal population, 𝑁(µ, 𝜎 2 ) the sample mean 𝑥̅ is
3. K2
normally distributed as 𝑁(µ, 𝜎 2 /𝑛)
∴ 𝐸[𝑥̅ ] = 𝜇 ; 𝑉[𝑥̅ ] = 𝜎 2 /𝑛
∴ as 𝑛 → ∞, 𝐸[𝑥̅ ] = 𝜇 & 𝑉[𝑥̅ ] = 0
Hence 𝑥̅ is a consistent estimator of 𝜇.
Define Unbiasedness.
Sol:
4. K1
An estimator 𝜃̂𝑛 drawn from a sample of size 𝑛 is said to be an unbiased
estimator of 𝜃 if 𝐸(𝜃̂𝑛 ) = 𝜃 for all 𝜃.
Show that if 𝑇 is an unbiased estimator of parameter 𝜃, then 𝜆1 𝑇 + 𝜆2 is an
unbiased estimator of 𝜆1 𝜃 + 𝜆2 where 𝜆1 , 𝜆2 are known constants.
Sol:
5. K2
Given 𝐸[𝑇] = 𝜃
𝐸[𝜆1 𝑇 + 𝜆2 ] = 𝜆1 𝐸[𝑇] + 𝜆2 = 𝜆1 𝜃 + 𝜆2 .
𝜆1 𝑇 + 𝜆2 is an unbiased estimator of 𝜆1 𝜃 + 𝜆2 .
If 𝑇 is an unbiased estimator for 𝜃, show that 𝑇 2 is a biased estimator for 𝜃 2 .
Sol:
6. Given 𝑇 is an unbiased estimator for 𝜃 , we have 𝐸(𝑇) = 𝜃 K2
We know that 𝑉(𝑇) = 𝐸(𝑇 2 ) − [𝐸(𝑇)]2 = 𝐸(𝑇 2 ) − 𝜃 2
⇒ 𝐸(𝑇 2 ) = 𝜃 2 + 𝑉(𝑇)
But 𝑉(𝑇) > 0 ⇒ 𝐸(𝑇 2 ) ≠ 𝜃 2
∴ 𝑇 2 is a biased estimator for 𝜃 2 .
Define Efficiency of an estimate.
Sol:
7. K1
An estimator 𝜃̂1 is said to be most efficient estimator than 𝜃̂2 if 𝜃̂1 & 𝜃̂2
are unbiased and 𝑉[𝜃̂1 ] < 𝑉[𝜃̂2 ].
Define sufficiency of an estimate.
Sol:
8. K1
An estimator is said to be sufficient estimator for the parameter if it
contains all the information in the sample regarding the parameter.
What is Maximum Likelihood Estimation?
Sol:
Let 𝑥1 , 𝑥2 , … , 𝑥𝑛 be a random sample of size 𝑛 from a sample with density
function 𝑓(𝑥; 𝜃). The likelihood function of the sample is
9. K1
𝐿(𝑥1 , 𝑥2 , … , 𝑥𝑛 ; 𝜃) = 𝑓(𝑥1 ; 𝜃)𝑓(𝑥2 ; 𝜃) … 𝑓(𝑥𝑛 ; 𝜃)
The maximum likelihood estimator for 𝜃 is the value for 𝜃 that maximizes
the probability of the observed sample.

Write any two properties of Maximum Likelihood Estimation.


Sol:
10. 1. MLE’s are consistent. K1
2. The maximum likelihood estimator may be biased. However, such a
bias may be removed by multiplying by an appropriate constant.

Part B

S.No. Question BTL *


∑ 𝑥𝑖 (∑ 𝑥𝑖 −1)
Show that is an unbiased estimate of 𝜃 2 , for the sample
𝑛(𝑛−1)
1. 𝑥1 , 𝑥2 , … … 𝑥𝑛 drawn on 𝑋 which takes values 1 or 0 with respective K4
probability 𝜃 and1 − 𝜃.

𝑋1 , 𝑋2 , 𝑋3 is a random sample of size 3 from a population with mean 𝜇 and


variance 𝜎 2 . If 𝑇1 = 𝑋1 + 𝑋2 − 𝑋3 , 𝑇2 = 2𝑋1 + 3𝑋3 − 4𝑋2, and 𝑇3 =
𝜆𝑋1 +𝑋2 +𝑋3
2. are estimators of µ, show that 𝑇1 and 𝑇2 are unbiased and find λ, so K4
3
that 𝑇3 is unbiased.

Let 𝑋1 , 𝑋2 , 𝑋3 and 𝑋4 be independent random variables such that 𝐸(𝑋𝑖 ) = µ and


𝑉𝑎𝑟(𝑋𝑖 ) = 𝜎 2 for 𝑖 = 1,2,3,4
𝑋 +𝑋 +2𝑋 +𝑋
3. (i)If 𝑌 = 1 2 4 3 4 , examine whether 𝑌 is an unbiased estimator of µ. K4
𝑎(𝑋1 +𝑋2 +𝑋3 )
(ii) If 𝑇 = 3
, find
‘𝑎’ such that 𝑇 is an unbiased estimator.
(iii) Which is the best estimator?
Suppose 𝑋 and 𝑌 are independent random variables with same unknown
means µ and same variances 36 and if 𝑇 = 𝑎𝑋 + 𝑏𝑌 is an estimator of µ,
4. (i)Show that ‘T’ is an unbiased estimator of µ if 𝑎 + 𝑏 = 1. K4
1 2
(ii) If 𝑎 = 3 , 𝑏 = 3 , find Variance of 𝑇.
1 1
(iii) If 𝑎 = 2 , 𝑏 = 2 , find Variance of 𝑇.
If 𝑥1 , 𝑥2 , … … . 𝑥𝑛 is a random sample from a normal population 𝑁(µ, 1) ,
1
5 Show that 𝑡 = ∑𝑛𝑖=1 𝑥𝑖2 is an unbiased estimator of 𝜇 2 + 1
𝑛

6. Find MLE for β where 𝑓(𝑥; 𝛽) is an exponential distribution. K4


𝑥 1−𝑥
7. Find MLE for 𝜃 where 𝑓(𝑥; 𝜃) = 𝜃 (1 − 𝜃) , 𝑥 = 0,1; 0 ≤ 𝜃 ≤ 1 K4
1 𝑥
8 Find MLE for 𝜃 where 𝑓(𝑥; 𝜃) = 𝜃 exp (− 𝜃 ), 𝑥 ≥ 0, 𝜃 > 0. K4
Suppose that 𝑋 has a Weibull distribution with pdf,
9. 𝑓(𝑥; 𝛼) = (𝜆𝛼)𝑥 𝛼−1 𝑒 −𝜆 𝑥 𝛼 , 𝑥 > 0. Assuming 𝛼 is known, find MLE of 𝜆 K4
based on a sample of size 𝑛.
Suppose that the random sample has normal distribution 𝑁(µ, 𝜎 2 ), find the
MLE
10. (i) for µ, when 𝜎 2 = 1 and K4
(ii) for 𝜎 2 , when µ = 0.
(16)
Estimate α and 𝛽 in the case of Pearson’s Type III distribution by the method
11. 𝛽𝛼 K4
of moments given 𝑓(𝑥; 𝛼, 𝛽) = 𝛤(𝛼) 𝑥 𝛼−1 𝑒 −𝛽𝑥 , 0 ≤ 𝑥 < ∞.
A random variable 𝑋 takes the values 0,1,2 with respective probabilities
𝜃 1 𝜃 𝜃 𝛼 𝜃 𝜃 1−𝛼 𝜃
+ 2 (1 − 𝑁) , 2𝑁 + 2 (1 − 𝑁) and 4𝑁 + 2 (1 − 𝑁)
4𝑁
12. where 𝑁 is a known number and α, θ are unknown parameters. If 75 K4
independent observations on 𝑋 yielded the values 0,1,2 with frequencies
27,38,10 respectively, estimate θ and α by the method of moments.
(16)
(i) Fit a straight line to the following data:

𝑋 1 2 3 4 6 8
Y 2.4 3 3.6 4 5 6
13. K3
(ii) Fit a straight line to the following data:

𝑋 0 1 2 3 4
𝑌 1 1.8 3.3 4.5 6.3
(i) Fit a parabola of second degree to the following data:

𝑋 0 1 2 3 4
𝑌 1 1.8 1.3 2.5 6.3
14. (ii) Fit a parabola of second degree to the following data: K4

𝑋 1 2 3 4 5 6 7 8 9
𝑌 2 6 7 8 10 11 11 10 9
Unit V

CORRELATION & REGRESSION

Course Objective – Cob5: To develop the ability to apply the concepts of Correlation & Regression
in problems.
Course Outcome – CO5: To understand the methods of finding the Correlation values between
variables and use Regression analysis for predicting values of variables.
Bloom’s Taxonomy Level: K1, K2, K3, K4

Part A

S.No. Question BTL*


Define Partial correlation and Partial regression.
Sol:
1. The correlation and regression between only two variates eliminating the K1
linear effect of other variables in them is called partial correlation and partial
regression.
Define Multiple correlation and Multiple regression.
Sol:
2. K1
The joint effect of a group of variables upon a variable not included in that
group is called multiple correlation and multiple regression.
Define Residual.
Sol:
3. 𝑋1.23 = 𝑋1 − 𝑏12.3 𝑋2 − 𝑏13.2 𝑋3 is called the residual, where K1
𝑏12.3 & 𝑏13.2 are partial regression coefficients of 𝑋1 on 𝑋2 and of 𝑋1 on 𝑋3
respectively.
Define order of Residual.
4. K1
The order of the residual is defined as the number of secondary subscripts
Write any one property of Residual.
Sol:
1.The sum of the product of any residual of order zero with any other residual
5. of higher order is zero, provided the subscript of the former occurs among the K1
subscripts of the latter.
2.The sum of the product of two residuals is zero if all the subscripts of the
one occur among the secondary subscripts of the other.
Write any two properties of multiple correlation coefficient.
Sol:
6. 1.Multiple correlation coefficient is non-negative. K1
2.Multiple correlation coefficient is not less than any total correlation
coefficient.
2 2 )(1 2 ).
Prove that 𝑅1.23 ≥ 𝑟12 , if 1 − 𝑅1.23 = (1 − 𝑟12 − 𝑟13.2
Sol:
2 2 )(1 2 )
Given 1 − 𝑅1.23 = (1 − 𝑟12 − 𝑟13.2
2 2) 2 |
7. 1 − 𝑅1.23 ≤ (1 − 𝑟12 since |𝑟13.2 ≤1 K2
2 2
⇒ −𝑅1.23 ≤ −𝑟12
2 2
⇒ 𝑅1.23 ≥ 𝑟12
⇒ 𝑅1.23 ≥ 𝑟12
If 𝑟12 = 0.86, 𝑟13 = 0.65 & 𝑟23 = 0.72, find 𝑟12.3
Sol:
8. 𝑟12 −𝑟13 𝑟23 0.86−0.65×0.72 K2
We know that 𝑟12.3 = = = 0.74
2 )(1−𝑟 2 )
√(1−𝑟13 √(1−(0.65)2 )(1−(0.72)2 )
23

For a tri variate distribution, give the relationship between Multiple


correlation coefficient in terms of Total and Partial correlation coefficients.
Sol:
9. 1 − 𝑅1.23 2 = (1 − 𝑟12 2 2
)(1 − 𝑟13.2 ), K2
where 𝑅1.23 is Multiple correlation coefficient of 𝑋1 on 𝑋2 & 𝑋3
𝑟12 is Total correlation coefficient
𝑟13.2 is partial correlation coefficient between 𝑋1 & 𝑋3
2 2 2 2 2 )(1 2 )
Prove that 𝑅1.23 = 𝑟12 + 𝑟13 ,if 1 − 𝑅1.23 = (1 − 𝑟12 − 𝑟13.2 & 𝑟23 = 0.
Sol:
Given 𝑟23 = 0 ⇒ 𝑟32 = 0…………(1)
𝑟13 −𝑟12 𝑟32 𝑟13 −𝑟12 (0)
We know that 𝑟13.2 = = using (1)
2 )(1−𝑟 2 )
√(1−𝑟12 2 )(1−0)
√(1−𝑟12
32
𝑟13
= ………….(2)
2 )
√(1−𝑟12

2 2 )(1 2 )
10. 1 − 𝑅1.23 = (1 − 𝑟12 − 𝑟13.2 K2

2 2) 𝑟13
⇒ 1 − 𝑅1.23 = (1 − 𝑟12 [1 − ( )2 ] using (2)
2 )
√(1−𝑟12

2 2 2
⇒ 1 − 𝑅1.23 = 1 − 𝑟12 − 𝑟13
2 2 2
⇒ −𝑅1.23 = −𝑟12 − 𝑟13
2 2 2
⇒ 𝑅1.23 = 𝑟12 + 𝑟13 .

Part B

S.No. Question BTL


*
Derive the equation of plane of regression 𝑋1 = 𝑏12.3 𝑋2 + 𝑏13.2 𝑋3 where
1. 𝜎 𝜔 𝜎 𝜔
𝑏12.3 = − 𝜎1𝜔12 , 𝑏13.2 = − 𝜎1𝜔13 , using Principle of Least squares. K4
2 11 3 11

2. From the data relating to the yield of dry bark (𝑋1), height (𝑋2) and girth (𝑋3) K4
for 18 cinchona plants, the following correlation coefficients are obtained 𝑟12
= 0.97, 𝑟13 =0.82 & 𝑟23 = 0.62. Find p artial correlation coefficient 𝑟12.3 &
Multiple correlation coefficient 𝑅1.23
2 2 2
Show that 1 − 𝑅1.23 = (1 − 𝑟12 )(1 − 𝑟13.2 ) and deduce that
(i) 𝑋1 is uncorrelated with any other variables (i.e)., 𝑟12 = 𝑟13 = 0, 𝑖𝑓 𝑅1.23 = 0.
3. 2 (1−𝜌)(1+2𝜌) K4
(ii) 1 − 𝑅1.23 = , provided all the coefficients of zero order are equal to
(1+𝜌)

𝜌. (16)

In a tri variate distribution, 𝜎1 = 2, 𝜎2 = 𝜎3 = 3, 𝑟12 = 0.7, 𝑟23 = 𝑟31 = 0.5. Find


(i) Multiple correlation coefficient 𝑅1.23
4. K4
(ii) partial regression coefficients 𝑏12.3 and 𝑏13.2
(iii) standard deviation 𝜎1.23
Find the regression equation of 𝑋1 on 𝑋2 and 𝑋3 given the following results:

Variable Mean S.D 𝑟12 𝑟23 𝑟31


(𝜎)
𝑋1 28.02 4.42 0.8 - -
5. 𝑋2 4.91 1.10 - − 0.56 - K4
𝑋3 594 85 - - −0.4

where 𝑋1 − Seed per acre; 𝑋2 − Rainfall in inches; 𝑋3 −Accumulated


temperature about 42°𝐹.
(16)
Show that the correlation coefficient between the residuals 𝑋1.23 and 𝑋2.13 is
6. K4
equal & opposite to that between 𝑋1.3 and 𝑋2.3.
Show that if 𝑋3 = 𝑎𝑋1 + 𝑏𝑋2, the three partial correlations are equal to unity,
7. 𝑟13.2 having the sign of 𝑎, 𝑟23.1 having the sign of 𝑏 & 𝑟12.3 the opposite sign of K4
𝑎/𝑏.
If 𝑟12 & 𝑟13 are given, show that 𝑟23 must lie in the range:
2 2 2 2 1/2
8. (𝑟12 𝑟13 ± (1 − 𝑟12 − 𝑟13 + 𝑟12 𝑟13 ) ) K4
2
Show that 𝑟23 will lie between −1 and 1 − 2𝑘 if 𝑟12 = 𝑘 & 𝑟13 = −𝑘.

You might also like