Unit 1 Statistics - 21MA41
Unit 1 Statistics - 21MA41
Unit 1 Statistics - 21MA41
UNIT-I
STATISTICS
➢ Expand their knowledge and skills of the Statistical Concepts and a personal
development experience towards the needs of statistical data analysis.
➢ Understand the Central Moments, Skewness and Kurtosis.
➢ Describe & evaluate the concept of correlation and regression coefficients.
➢ Investigate the strength and direction of a relationship between two variables by
collecting measurements and using appropriate statistical analysis.
➢ To model a linear relationship between a dependent variable and two or more
independent variables.
Introduction:
In many fields of Applied Mathematics and Engineering we face some problems and do the
experiments involving two variables. In this chapter, we consider the Mathematical theory of
statistics, by presenting an elementary treatment of Central moments, mean, variance,
coefficients of skewness and kurtosis in terms of moments, curve fitting, correlation and
regression. In mathematics, a moment is a specific quantitative measure of the shape of a
function. It is used in both mechanics and statistics. If the function represents physical density,
then the zeroth moment is the total mass, the first moment divided by the total mass is the center
of mass, and the second moment is the rotational inertia. If the function is a probability
distribution, then the zeroth moment is the total probability (i.e. one), the first moment is
the mean, the second central moment is the variance, the third standardized moment is
the skewness, and the fourth standardized moment is the kurtosis.
Moments:
In mechanics, moment refers to the turning or the rotating effect of a force whereas it is used
to describe the peculiarities of a frequency distribution in statistics. We can measure the central
tendency of a set of observations by using moments. Moments also help in measuring the
scatteredness, asymmetry and peakedness of a curve for a particular distribution.
Moments refers to the average of the deviations from mean or some other value raised to a
certain power. The arithmetic mean of various powers of these deviations in any distribution is
called the moments of the distribution about mean. Moments about mean are generally used in
statistics.
Relation between raw (Moments about origin or any point) and Central Moments
The central moments can be expressed in terms of raw moments and vice-versa. The general
relation between the moments about mean in terms of moments about any point is given by,
𝜇2 = 𝜇2′ − 𝜇 ′21 , μ3 = 𝜇3′ − 3𝜇2′ 𝜇1′ + 2μ′31 and μ4 = 𝜇4′ − 4𝜇3′ 𝜇1′ + 6μ′2 𝜇 ′21 -3μ′41 .
Conversely,
𝜇2′ = 𝜇2 − 𝜇 ′21 , μ′3 = 𝜇3 − 3𝜇2 𝜇1′ + μ′31 and μ′4 = 𝜇4 − 4𝜇3 𝜇1′ + 6μ2 𝜇 ′21 +μ′41 .
Example 1: The first four moments of a distribution about the value 4 of the variables are
-1.5, 17, -30 and 108. Find the moments about the mean.
Solution: Given A = 4, 𝜇1′ =-1.5, μ′2 =17, μ′3 =-30 and μ′4 =108.
Moments about mean:
𝜇2 = 𝜇2′ − 𝜇 ′21 = 17 - (-1.5)2 =14.75
𝜇3 = 𝜇3′ − 3𝜇2′ 𝜇1′ + 2μ′31 = -30 - 3(17)(-1.5) + 2 (-1.5)2 = 39.75
𝜇4 = 𝜇4′ − 4𝜇3′ 𝜇1′ + 6μ′2 𝜇 ′21 - 3μ′41
=108 - 4(-30)(-1.5) + 6(17)(-1.5)2 - 3(-1.5)4 = 142.3125.
Example 2: Calculate the first four moments of the following distribution about the mean.
0 1 2 3 4 5 6 7 8
1 8 28 56 70 56 28 8 1
Solution:
1.5 - 2.5 2.5 - 3.5 3.5 - 4.5 4.5 - 5.5 5.5 - 6.5
1 3 7 3 3
Calculate the first four central moments of the following distribution.
Mid-point
Wages f d = ( x - 𝑥̄ ) fd fd2 fd3 fd4
x
1.5 - 2.5 1 2 -2 -2 4 -8 16
2.5 - 3.5 3 3 -1 -3 3 -3 3
3.5 - 4.5 7 4 0 0 0 0 0
4.5 - 5.5 3 5 1 3 3 3 3
5.5 - 6.5 3 6 2 6 12 24 48
∑=4 ∑ = 22 ∑ = 16 ∑ = 70
Curve of the type ‘A’ which is neither flat nor peaked is called the normal curve or
‘MESOKURTIC’ curve (β2 = 3). If items concentrate too much at the center (more peaked
than the normal curve), the curve of the type ‘C’ becomes ‘LEPTOKURTIC’ curve (β2 > 3).
If the concentration at the center is comparatively less (flatter than the normal curve), the curve
of the type ‘B’ becomes ‘PLATYKURTIC’ curve (β2 < 3).
Measures of Skewness:
Literally, skewness means ‘lack of symmetry’. A distribution is said to be skewed if
(i) Mean, Median and Mode fall at different points.
(ii) The curve drawn with the help of the given data is not symmetrical but stretched more to
one side than to the other.
Karl Pearson’s coefficient of Skewness: The method is most frequently used for measuring
skewness. The formula for measuring coefficient of skewness is as follows:
Mean - Mode
Sk = , where σ is the standard deviation of the distribution.
𝜎
Note:
𝜇32
From 𝛽1 = …………………(*) we observe the following:
𝜇23
• 𝜇32 is always positive whether 𝜇3 is positive or negative.
• 𝜇23 is always positive as 𝜇2 is variance.
∴ from (*) 𝛽1 is always positive which is not so always as skewness may be negative also.
To overcome this, the measure of skewness is defined by
𝛾1 = ±√𝛽1
Here sign of 𝛾1 depends on the sign of 𝜇3 .
Solution:
Mid-point
Wages f d = (x -17) / 2 fd fd2 fd3 fd4
x
10-12 1 11 -3 -3 9 -27 81
12-14 3 13 -2 -6 12 -24 48
14-16 7 15 -1 -7 7 -7 7
16-18 12 17 0 0 0 0 0
18-20 12 19 1 12 12 12 12
20-22 4 21 2 8 16 32 64
22-24 3 23 3 9 27 81 243
∑ = 13 ∑ = 83 ∑ = 67 ∑ =455
∑ fd ∑ fd2 ∑ fd3
𝜇1′
= ′
x h = 0.52, μ2 = 2 ′
x h = 2.16, μ3 = x h3 = 10.72,
𝑁 𝑁 𝑁
∑ 4
fd
𝜇4′ = x h4 = 145.6
𝑁
Moments about mean:
𝜇1 = 0, 𝜇2 = 𝜇2′ − 𝜇 ′21 = 2.16 - 0.2704= 1.8896
𝜇3 = 𝜇3′ − 3𝜇2′ 𝜇1′ + 2μ′31 = 10.72 - 3(2.16)(0.52) + 2 (0.52)2 = 7.491
𝜇4
= 𝜇4′ − 4𝜇3′ 𝜇1′ + 6μ′2 𝜇 ′21 - 3μ′41 =145.6 - 4(0.52)(10.72) + 6(2.56)(0.52)2 -3 x 0.07312
= 126.5874.
𝜇2 𝜇
So, we have 𝛽1 = 𝜇33 = 8.317, β2 = 𝜇42 = 35.4527.
2 2
Exercise:
1. The first four raw moments of a distribution are 2, 136, 320 and 40,000. Find the
coefficients of skewness and kurtosis.
𝜇2 𝜇
Ans. 𝛽1 = 𝜇33 = 0.0904, β2 = 𝜇42 = 2.333.
2 2
2. Find the second, third and fourth central moments of the frequency distribution given
below. Hence, find (i) a measure of skewness and (ii) a measure of kurtosis.
Class limits Frequency
110.0 – 114.9 5
115.0 – 119.9 15
120.0 – 124.9 20
125.0 – 129.9 35
130.0 – 134.9 10
135.0 – 134.9 10
140.0 – 144.9 5
Ans.
𝜇2 = 2.16, μ3 = 0.804, μ4 = 12.5232
𝛾1 = √𝛽1 = 0.25298; γ2 = β2 -3 = -0.317
3. Find the second, third and fourth central moments of the frequency distribution
given below. Hence, find (i) a measure of skewness and (ii) a measure of kurtosis.
5 10 15 20 25 30 35
4 10 20 36 16 12 2
Ans.
𝜇2 = 44.41, μ3 = -12.504, μ4 = 5423.5057, β1 = 0.001785,
𝛽2 = 2.7499, γ1 = √𝛽1 = 0.25298; γ2 = β2 -3 = -0.317.
4. Compute the first four moments about mean from the following data. Hence, find (i) a
measure of skewness and (ii) a measure of kurtosis.
Class Intervals: 0 -10 10 – 20 20 – 30 30 – 40
Frequency: 1 3 4 2
Ans.
𝜇1 = 0, μ2 = 81, μ3 = -144, μ4 = 14817, β1 = 0.03902,
𝛽2 = 0.01909, γ1 = √𝛽1 = 0.1975; γ2 = β2 -3 = - 2.9809.
∑𝑦
Mean of the x series 𝑦 = → mean of the y series.
𝑛
1
a i bi (∑ 𝑎 ∑ 𝑏 )2
r= n , 𝑟 2 = ∑ 𝑎 2𝑖 ∑ 𝑏𝑖 2 . (1)
𝑖 𝑖
1 2 1 2
ai bi
n n
By Schwartz inequality, which states that if a i ,𝑏𝑖 i=1, 2… n are real quantities then
RANK CORRELATION
In many practical situations, characters are not measurable.
They are qualitative characteristics and individuals or items can be ranked in order of their
merits. This type of situation occurs when we deal with the qualitative study such as honesty,
beauty, voice, etc. For example, contestants of a singing competition may be ranked by judge
according to their performance. In another example, students may be ranked in different
subjects according to their performance in tests.
Arrangement of individuals or items in order of merit or proficiency in the possession of a
certain characteristic is called ranking and the number indicating the position of individuals or
items is known as rank.
If ranks of individuals or items are available for two characteristics then correlation between
ranks of these two characteristics is known as rank correlation.
With the help of rank correlation, we find the association between two qualitative
characteristics. As we know that the Karl Pearson’s correlation coefficient gives the intensity
of linear relationship between two variables and Spearman’s rank correlation coefficient gives
the concentration of association between two qualitative characteristics. In fact, Spearman’s
rank correlation coefficient measures the strength of association between two ranked variables.
Derivation of the Spearman’s rank correlation coefficient formula is discussed in the following
section.
RANK CORRELATIONCOEFFICIENT FORMULA
Suppose we have a group of n individuals and let x1, x 2 ,..., x n and y1 , y2 ,..., yn be the
ranks of n individuals in characteristics A and B respectively. Then rank correlation
coefficient 𝑟𝑠 is given by
6 ∑𝑛𝑖=1 𝑑𝑖 2
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)
Examples:
1. If r is the correlation coefficient between x and y and z= ax+by. Show that
𝜎𝑧 2 −(𝑎2 𝜎𝑥 2 +𝑏 2 𝜎𝑦 2 )
𝑟= .
2abσ𝑥 𝜎𝑦
Solution:
1 𝑎 𝑏
Let z = ax + by ⇒ 𝑛 ∑ z = ∑ 𝑥 + ∑ 𝑦 𝑧 = 𝑎𝑥 + 𝑏𝑦,
𝑛 𝑛
1 1 1 1
∑(𝑧 − 𝑧)2 = 𝑎2 𝑛 ∑(𝑥 − 𝑥)2 + 𝑏 2 𝑛 ∑(𝑦 − 𝑦)2 + 2ab 𝑛 ∑(𝑥 − 𝑥)(𝑦 − 𝑦) ,
𝑛
⇒ σ𝑧 2 = 𝑎2 𝜎𝑥 2 + 𝑏 2 𝜎𝑦 2 + 2abrσ𝑥 𝜎𝑦 ,
𝜎𝑧 2 −(𝑎2 𝜎𝑥 2 +𝑏 2 𝜎𝑦 2 )
⇒ r= .
2abσ𝑥 𝜎𝑦
n = 25,
𝑛 ∑ xy−(∑ 𝑥)(∑ 𝑦)
𝑟= = 0.51912.
√{𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 }{𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 }
Age of Husband(x) 23 27 28 29 30 31 33 35 36 39
Age of wife(y) 18 22 23 24 25 26 28 29 30 32
Solution:
Here n=10
1 311 1 257
We find 𝑥̄ = 𝑛 ∑ 𝑥𝑖 = = 31.1 ȳ = 𝑛 ∑ 𝑦𝑖 = = 25.7.
10 10
𝑥𝑖 X i = 𝑥𝑖 -𝑥̄ 𝑋𝑖 2 𝑌𝑖 = 𝑦𝑖 − ȳ Y𝑖 2 𝑋𝑖 𝑌𝑖
∑ 𝑋𝑖 2 = 202.9 ∑ 𝑌𝑖 2 ∑ 𝑋𝑖 𝑌𝑖 =178.
= 158.10
∑ 𝑋𝑖 𝑌𝑖
r= = 0.9955 ≈ 1.
√∑ 𝑋𝑖 2 ∑ 𝑌𝑖 2
i.e, the ages of husbands and wives are almost perfectly correlated.
1 2 −1 1
2 4 −2 4
3 1 2 4
4 5 −1 1
5 3 2 4
6 8 −2 4
7 7 0 0
8 6 2 4
∑8𝑖=1 𝑑𝑖 2
=22
i
Here, n = number of paired observations = 8
6 ∑𝑛𝑖=1 𝑑𝑖 2 6𝑋22
𝑟𝑠 = 1 − = 1 − = 0.74
𝑛(𝑛2 − 1) 8𝑋63
Rank in 2 4 5 1 3
Computer
Rank in Physics 5 1 2 3 4
Rank in Statistics 2 3 5 4 1
Solution: In this problem, we want to see which two subjects have same trend i.e.,
which two subjects have the positive rank correlation coefficient.
Here we have to calculate three rank correlation coefficients
𝑟12𝑠 = Rank correlation coefficient between the ranks of Computer and Physics
𝑟23𝑠 = Rank correlation coefficient between the ranks of Physics and Statistics
𝑟13𝑠 = Rank correlation coefficient between the ranks of Computer and Statistics
Let 𝑅1 , 𝑅 2 and 𝑅3 be the ranks of students in Computer, Physics and
Statistics respectively.
Rank in Rank in Rank in 𝒅 = 𝒅𝟐𝟏𝟐 𝒅𝟐𝟑 = 𝒅𝟐𝟐𝟑 𝒅𝟏𝟑 = 𝒅𝟐𝟏𝟑
𝟏𝟐
Compute Physics Statistics
r (R1) (R2) (R3) R1−R2 R2−R3 R1−R3
2 5 2 −3 9 3 9 0 0
4 1 3 3 9 −2 4 1 1
5 2 5 3 9 −3 9 0 0
1 3 4 −2 4 −1 1 −3 9
3 4 1 −1 1 −3 9 2 4
Total 32 32 14
2
6∑𝑑
𝑟13𝑠 = 1 − 𝑛(𝑛213 =-0.3
−1)
𝑟12𝑠 is negative which indicates that Computer and Physics have opposite trend. Similarly,
negative rank correlation . 𝑟23𝑠 shows the opposite trend in Physics and Statistics. 𝑟13𝑠 = 0.3
indicates that Computer and Statistics have same trend.
Sometimes we do not have rank but actual values of variables are available. If we are interested
in rank correlation coefficient, we find ranks from the given values. Considering this case we
are taking a problem and try to solve it.
x 78 89 97 69 59 79 68
y 125 137 156 112 107 136 124
∑ 𝑑𝑖 2 = 2
𝑖=1
Spearman’s Rank correlation formula is
6 ∑𝑛𝑖=1 𝑑𝑖 2 6×2
𝑟𝑠 = 1 − = 1 − = 0.96
𝑛(𝑛2 − 1) 7(49 − 1)
∑ 𝑑𝑖 2 = 159.50
𝑖=1
1
6 ∑𝑛𝑖=1 𝑑𝑖 2 + 12 [ (𝑚13 − 𝑚1 ) + (𝑚23 − 𝑚2 )]
𝑟𝑠 = 1 −
𝑛(𝑛2 − 1)
Where 𝑚1 = 2 (the two items of x have equal value ie., 73) and 𝑚2 = 3 (three items of y
having value 18)
1
6 × [159.50 + 12 ((8 − 2) + (27 − 3))]
= 1− = −0.9286
8(64 − 1)
Regression :
Correlation describes the strength of an association between two variables, and is completely
symmetrical, the correlation between A and B is the same as the correlation between B and A.
However, if the two variables are related it means that when one changes by a certain amount
the other changes on an average by a certain amount. The relationship can be represented by a
simple equation called the regression equation. In this context "regression" (the term is a
historical anomaly) simply means that the average value of y is a "function" of x, that is, it
changes with x.
Regression analysis is a mathematical measure of the average relationship between two or more
variables in terms of the original units of data.
Line of regression:
Line of regression is the line which gives the best estimate to the value of one variable for any
specific value of the other variable. So the line of regression is the line of best fit.
Method of Least squares:
Suppose we are given n values of x1, x2, x3,….., xn of an independent variable x and the
corresponding values y1, y2, y3,….., yn of a variable y depending on x. Then the pairs (x1, y1),
(x2, y2),........, (xn, yn) give us n- points in the xy-plane. Generally, it is not possible to find the
actual curve y = f(x) that passes through these points. Hence, we try to find a curve that serves
as best approximation to the curve y = f(x). Such a curve is referred to as the curve of best fit.
The process of determining a curve of best fit is called curve fitting. A method to find curve of
best fit is called method of least squares.
Regression line of y on x:
Let regression line of y on x be y = a + bx.
The normal equations by the method of least squares is
∑ y = na + b ∑ 𝑥,
∑ xy = a ∑ 𝑥 + b ∑ 𝑥 2 ,
1 𝑏
∑y = a + ∑ 𝑥.
𝑛 𝑛
σy
y− y = r (x − x) Y = b yx X is the regression line of y on x.
σx
Regression line of y on x:
𝜎𝑥
𝑥−𝑥 =𝑟 (𝑦 − 𝑦) ⟹ 𝑋 = 𝑏𝑥𝑦 𝑌
𝜎𝑦
Note:
1. Regression coefficient of y on x
∑(𝑥−𝑥̄ )(𝑦−𝑦̄ ) 𝑛 ∑ xy− ∑ 𝑥 ∑ 𝑦 𝜎𝑦
𝑏yx = ∑(𝑥−𝑥̄ )2
= = 𝑟𝜎 .
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2 𝑥
2. Regression coefficient of x on y
Examples:
1. If two regression equations of the variables x and y are x = 19.13 - .87y, y = 11.6 – 0.5x,
find
(a) mean of x
(b) mean of y
(c)The correlation coefficient between x and y.
Soln:
Since 𝑥̄ and ȳ lie on two regression lines,
𝑥̄ = 19.13 − 0.87ȳ , ȳ = 11.64 − 0.5x̄ ,
Solving we get x̄ = 15.79, ȳ = 3.74.
𝑏yx = −0.5,bxy = −0.87,r = √−0.5 × −0.87 = −0.66.
2. In the following table data is showing the test scores made by sales man on an intelligent
test and their weekly sales.
Test scores(x) 1 2 3 4 5 6 7 8 9 10
sales(y) 2.5 6 4.5 5 4.5 2 5.5 3 4.5 3
Calculate the regression line of sales on test scores and estimate the most possible weekly
volume if a sales man scores 70.
Soln:
𝜎𝑦
𝑥̄ = 60, ȳ = 4.05, Regression line of y on x is𝑦 − 𝑦̄ = 𝑟 𝜎 (𝑥 − 𝑥̄ ),
𝑥
y = 0.06x + 0.45.
When x = 70, y = 4.65.
When several independent variables are used to estimate the value of the
dependent variable it is called multiple regression. The multiple linear regression model is just
an extension of the simple linear regrssion model. In simple linear regression, we used an “x”
= 𝛽0 + ∑ 𝛽𝑗 𝑥𝑖𝑗
𝑗=1
Using least squares principle, we get the following normal equations:
𝑛 𝑛 𝑛 𝑛
Example:
1. A company produces two different items A and B. The data below shows the sale of these items in
one day and the profit made by the company on that day.
𝑥1 (Sales of item 8 11 9 8 6 10 7
A)
𝑥2 (Sales of item 6 4 5 7 1 1 0
B)
Profit (y) 93.26 89.76 60.78 79.34 28.23 75.83 32.74
Fit the best multilinear model that represents the relationship between sales of A and B and
the profit.
𝑥1 𝑥2 𝑦 𝑦2 𝑥12 𝑥22 𝑥1 ∗ 𝑦 𝑥2 ∗ 𝑦 𝑥1 ∗ 𝑥2
8 6 93.26 8697.43 64 36 746.08 559.56 48
11 4 89.76 8056.86 121 16 987.36 359.04 44
9 5 60.78 3694.21 81 25 547.02 303.9 45
8 7 79.34 6294.84 64 49 634.72 555.38 56
6 1 28.23 796.933 36 1 169.38 28.23 6
10 1 75.83 5750.19 100 1 758.3 75.83 10
7 0 32.74 1071.91 49 0 229.18 0 0
∑= 59 24 459.94 34362.4 515 128 4072.04 1881.94 209
2. A set of experimental runs was made to determine a way of predicting cooking time 𝑦 at
various values of oven width 𝑥1 and flue temperature 𝑥2 . The coded data were recorded
as follows:
𝑦 6.40 15.05 18.75 30.25 44.85 48.94 51.55 61.50 100.44 111.42
𝑥1 1.32 2.69 3.56 4.41 5.35 6.20 7.12 8.87 9.80 10.65
𝑥2 1.15 3.40 4.10 8.75 14.82 15.15 15.32 18.18 35.19 40.40
Estimate the multiple linear regression equation y = 0 + 1 x1 + 2 x 2
Solution:
The normal equations corresponding to the regression equation y = 0 + 1 x1 + 2 x 2
are:
𝑛𝛽0 + 𝛽1 ∑𝑥1 + 𝛽2 ∑𝑥2 = ∑𝑦
𝛽0 ∑𝑥1 + 𝛽1 ∑𝑥12 + 𝛽2 ∑𝑥1 𝑥2 = ∑𝑦𝑥1
𝛽0 ∑𝑥2 + 𝛽1 ∑𝑥1 𝑥2 + 𝛽2 ∑𝑥22 = ∑𝑦𝑥2
For the given data
𝑛 = 10, ∑𝑥1 = 59.97, ∑𝑥12 = 446.9965, ∑𝑦 = 489.15, ∑𝑦𝑥1 = 3875.9365
∑𝑥2 = 156.46, ∑𝑥22 = 3991.1208, ∑𝑦𝑥2 = 11749.8781, ∑𝑥1 𝑥2 = 1282.0769
Substituting these values in the above normal equations and solving we get,
𝛽0 = 0.4178, 𝛽1 = 2.7719, 𝛽2 = 2.0372
Hence the required multiple linear regression equation is
𝑦 = 0.4178 + 2.7719𝑥1 + 2.0372𝑥2
Exercise:
1. If the coefficient of correlation between the variables x and y is 0.5 and the acute angle
3
between their lines of regression is tan-1 (5) . Find the ratio of the standard deviation of
x and y.
𝜎 1 𝜎 2
Ans. 𝜎𝑥 = 2
or . 𝜎𝑥 = 1.
𝑦 𝑦
2. Prove the following formulas for the coefficient of correlation r (in the usual notation)
2 2
1 𝑋𝑖 𝑌𝑖 1 𝑋 𝑌
a) 𝑟 = 1 − 2n ∑( −𝜎 ) , 𝑟 = −1 + 2n ∑ (𝜎 𝑖 + 𝜎 𝑖 ) .
𝜎𝑥 𝑦 𝑥 𝑦
x 56 42 72 36 63 47 55 49 38 42 68 60
y 147 125 160 118 149 128 150 145 115 140 152 155
Y 6 4 9 8 1 2 3 10 5 7
Age (x) 56 42 72 36 63 47 55 49 38 42 68 60
Blood Pressure (y) 147 125 160 118 149 128 150 145 115 140 152 155
Calculate the coefficient of correlation between x and y. Estimate the blood pressure of
a person whose age is 45 years.
Ans. r = 0.8961, y = 80.78 + 1.138 x , when x = 45, y = 132.
6. The height (inches) and weight (pounds) of baseball players are given below:
(76, 212), (76, 224), (72, 180), (74, 210), (75, 215), (71, 200), (77, 235), (78, 235),
(77, 194), (76, 185).
(i) Estimate the coefficient of correlation between weight and height of baseball
players.
(ii) Find the regression line between weight and height. Use the regression equation to
find the weight of a baseball player that is 68 inches tall.
Ans. r = 0.5529, y = 4.737 x – 147.227, x = 0.064 y + 61.712, when x = 68, y = 97.37.
7. The equations of regression lines of two variables x and y are 4 x – 5y + 33 = 0 and
20x - 9y = 107, Find the correlation coefficient and the means of x and y.
Ans. r = 0.6, Mean of x = 13 and Mean of y = 17.
8. If the tangent of the angle between the lines of regression of y on x and x on y is 0.6
and the standard deviation of y is twice the standard deviation of x. find the coefficient
of correlation between x and y.
Ans. r = 0.5.
9. The chemistry grade, intelligence test score and number of classes missed data of 12
students are given.
Chemistry 85 74 76 90 85 87 94 98 81 91 76 74
grade (𝑦)
Test 65 50 55 65 55 70 65 70 55 70 50 55
score(𝑥1 )
Resources:
1. https://nptel.ac.in/courses/111105042/
2. http://www.nptelvideos.in/2012/12/regression-analysis.html
3. https://nptel.ac.in/courses/111104074/