0% found this document useful (0 votes)
57 views21 pages

Relationship Between Variables: Fitting An Equation or Curve The Meaning of Regression The Population Regression Function (PRF)

1. The document discusses the relationship between variables and fitting an equation or curve to explore this relationship. 2. It introduces the population regression function (PRF) and sample regression function (SRF), explaining the stochastic error term in the SRF. 3. The combination of the PRF and SRF models is shown, with the SRF depicting the sample data and estimated regression line, and the residuals representing the difference from the PRF.

Uploaded by

Akshit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views21 pages

Relationship Between Variables: Fitting An Equation or Curve The Meaning of Regression The Population Regression Function (PRF)

1. The document discusses the relationship between variables and fitting an equation or curve to explore this relationship. 2. It introduces the population regression function (PRF) and sample regression function (SRF), explaining the stochastic error term in the SRF. 3. The combination of the PRF and SRF models is shown, with the SRF depicting the sample data and estimated regression line, and the residuals representing the difference from the PRF.

Uploaded by

Akshit
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Session 2: Class Outline

Relationship Between Variables: Fitting an Equation or curve

The Meaning of Regression

The Population Regression Function (PRF)


Stochastic Specification of the PRF

The Sample Regression Function (SRF)


The Nature of the Stochastic Error Term

Combination of Sample and Population Regression Line.

Ref. Ch 1 and 2 of Gujrati Textbook


Measures of Associate between variables
What Kind of Relationship do you see here:
+ve, - ve , or ambiguous?

• The salary offered in the final placement and CQPI of the students

• Aggregate net investment in India during a given year and GDP in that year.

• The amount of hair on the head of a male professor and the age of that
professor.

• The growth rate of GDP in a year and the average hair length in that year.

• Volume of sales of a product and the total cost of advertisement during a year

• The quantity of demanded of two wheeler and the price of two wheeler in a
year.

• Number of trees grown in India per year and US Gross Domestic Product
Measures of Association between variables

X Y
Y=f(X) 10 10
20 20
1. Scatter Plot and Curve Fitting
30 30
2. Covariance: 40 40
50 50
3. Correlation :

4. Regression :
Yi=β1+β2Xi+ui

Ex. 1. Compensation and Revenue


2. Dividend and Profit
Regression Models
# Historical Origin of Regression
• The term REGRESSION was introduced by Francis Galton
• Tendency for tall parents to have tall children and for short parents to have short
children, but the average height of children born from parents of a given height
tended to move (or regress) toward the average height in the population as a whole
(F. Galton, “Family Likeness in Stature”)

• Galton’s Law was confirmed by Karl Pearson: The average height of sons of a
group of tall fathers < their fathers’ height. And the average height of sons of a
group of short fathers > their fathers’ height. Thus “regressing” tall and short sons
alike toward the average height of all men. (K. Pearson and A. Lee, “On the law of
Inheritance”)
Regression Models
# Modern Way of Interpreting Regression
“Regression analysis is concerned with the study of the dependence of one variable
(dependent variable) upon another (independent variable) for
estimating/predicting the (population) mean of value of the former in terms of the
known (fixed in the repeated sampling) values of the latter”.

Dependent Variable Y; Explanatory Variable Xs

• Y = Son’s Height; X = Father’s Height


• Y = Height of boys; X = Age of boys
• Y = Personal Consumption Expenditure; X = Personal Disposable Income
• Y = Demand; X = Price
• Y = Rate of Change of Wages , X = Unemployment Rate
• Y = Money/Income; X = Inflation Rate
• Y = % Change in Demand; X = % Change in the advertising budget
• Y = Crop yield; Xs = temperature, rainfall, sunshine, fertilizer
Terminology

Dependent Variable Independent Variable(s)


 
Explained Variable Explanatory Variable(s)
 
Predictor(s)
Predictand


Regressor(s)
Regressand 
 Stimulus or control variable(s)
Response 
 Exogenous(es)
Endogenous
Regression Models

Type of Regression: Lists of Different Reg.

➢ Univariate : One dep. var. • Linear Regression

➢ Multivariate: More than one quantitative • Non-linear Regression

dep. var. • Logistic Regression

➢ Simple : One Indep. Var. • Polynomial Regression

➢ Multiple : More than one indp.var. • Spline Regression

➢ Linear : staring line • Stepwise Regression

➢ Non-Linear : curve • Ridge Regression

➢ Limited Dep Var. reg: Qualitative dep.var. • Lasso(Least Absolute Shrinkage and

➢ ANOVA: all Indp. Var. is qualitative Selection Operator) Regression

➢ ANCOVA: Indp. Var. is mix of qualitative and • ElasticNet Regression

quantitative • Quantile Regression


Regression Models
Deterministic Regression Model: Yi= β1 + β2 Xi
Probabilistic Regression Model: Yi= β1 + β2 Xi+ui

• Why probabilistic? (ui)


1. Omission of explanatory variables
2. Aggregation of variables
3. Model specification
4. Functional misspecification
5. Measurement errors
Example: Weekly family income & expenditure

X = Weekly Family Income, $


Y = Weekly Family Consumption Expenditure, $

Conditional
X ------------------- Y ----------------- TOTAL Mean of
Y
80 55 60 65 70 75 325 65
100 65 70 74 80 85 88 462 77
120 79 84 90 94 98 445 89
140 80 93 95 103 108 113 115 707 101
160 102 107 110 116 118 125 678 113
180 110 115 120 130 135 140 750 125
200 120 136 140 144 145 685 137
220 135 137 140 152 157 160 162 1043 149
240 137 145 155 165 175 189 966 161
260 150 152 175 178 180 185 191 1211 173

Weekly Family Income , n = 60


Conditional Mean : E(Y|X=Xi)
Unconditional Mean : E(Y)=7272/60=121.20
Graphically,

250
E(Y/X)
Weekly Consumption Expenditure

200

150

100

50

0
70 90 110 130 150 170 190 210 230 250 270

Weekly Income
A. Population Regression Function: Error Term

Estimated PRF: E(Y |X=Xi)=f(Xi) =17.00+0.60Xi


A. Population Regression Function: Error terms

E(Y |X=Xi)=17.00+0.60Xi
Y
y4 .{
u4
. . ..
. . . . .
y3 . u3
y2 u2{ . . . . .
.
. ..
u1 .
y1 .
x1 x2 x3 x4 X
Stochastic Specification of the PRF

This specification has two main parts:


1. Systematic or deterministic component
2. Nonsystematic component

If we take the expected value of the PRF, we obtain


the following

E (Y | X i ) = E[ E (Y | X i )] + E (ui | X i )
E (Y | X i ) = E (Y | X i ) + E (ui | X i )
since E (Y | X i ) = E (Y | X i )
then
E (ui | X i ) = 0
Why Stochastic Disturbance Term (ui) exists?

The error term contains all the factors explained by other variables.

1. Vagueness of theory

2. Unavailability of data

3. Core variables versus peripheral variables

4. Intrinsic randomness in human behavior

5. Poor proxy variables

6. Principle of parsimony

7. Wrong functional form


B. Sample Regression Model: Example

SRF Model: Yi = ˆ1 + ˆ 2 X i + uˆ i

• Samples from our population

Y X Y X
70 80 55 80
65 100 88 100
90 120 90 120
95 140 80 140
110 160 118 160
115 180 120 180
120 200 145 200
140 220 135 220
155 240 145 240
150 260 175 260
Sample Regression Function SRF
B. Sample Regression Model : Residuals

Y
y4 .
û4 {
SRF: . . ..
Yˆi = ˆ1 + ˆ2  X i

y3 . . . .} û3
y2 û2 { . .
. .
.
. } û1
y1 .
x1 x2 x3 x4 X
B. Sample Regression Model : Residuals
Combination of both The Model : PRF and SRF

Y SRF1: Yˆi = ˆ1 + ˆ2  X i


Yi
Yi
ui û i
Yˆi SRF2: Yˆi = ˆ1 + ˆ2  X i
Yˆi
E(Y|Xi) E(Y|Xi)
PRF: E(Y | X i ) = 1 +  2 X i

Xi X
obs Y X x y x^2 y^2 xy Y^ ε^ ε^2 Xi^2 Y^ε^ Xε^
1 70 80 -90 -41 8100 1681 3690 65.18 4.82 23.21 6400 314.1 385.5
2 65 100 -70 -46 4900 2116 3220 75.36 -10.36 107.41 10000 -781.0 -1036.4
3 90 120 -50 -21 2500 441 1050 85.55 4.45 19.84 14400 381.1 534.5
4 95 140 -30 -16 900 256 480 95.73 -0.73 0.53 19600 -69.6 -101.8
5 110 160 -10 -1 100 1 10 105.91 4.09 16.74 25600 433.3 654.5
6 115 180 10 4 100 16 40 116.09 -1.09 1.19 32400 -126.6 -196.4
7 120 200 30 9 900 81 270 126.27 -6.27 39.35 40000 -792.1 -1254.6
8 140 220 50 29 2500 841 1450 136.45 3.55 12.57 48400 483.8 780.0
9 155 240 70 44 4900 1936 3080 146.64 8.36 69.95 57600 1226.4 2007.3
10 150 260 90 39 8100 1521 3510 156.82 -6.82 46.49 67600 -1069.2 -1772.7
Sum 1110 1700 0 0 33000 8890 16800 1110 0.00 337.3 322000 -0.02 -0.04
Average 111 170 0 0 3300 889 1680 111 0.00 33.7 32200 0.00 0.00

Σxiyi/Σxi Var(b^2) σ^2/Σxi^ SE(b^2) Sqrt(var(


b^2= ^2 0.509 = 2= 0.0012 = b^2))= 0.035
(ΣXi^2/n
Ybar- Var(b^1) Σxi^2)*σ SE(b^1) Sqrt(var(
b^1= b^2*Xbar 24.455 = = 41.137 = b^1))= 6.413
ESS= b^2*Σxiyi 8552.727

(σ2/n)*(Σ minus
TSS- Cov(b^1, Xi/Σxi^2) Xbar*var
RSS= ESS 337.273 b^2)= = -0.217 Or (b^2) -0.217
ESS/TS
TSS Σyi^2 8890 R^2= S 0.962
σ^2= RSS/n-k 42.159
σ^= sqrt(σ^2) 6.493
Thank You

You might also like