0% found this document useful (0 votes)
18 views8 pages

Lecture 2-3

Uploaded by

Diêu Meo Meo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views8 pages

Lecture 2-3

Uploaded by

Diêu Meo Meo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Lecture 2

Simplest form econometric model is linear (straight line):

Simple (single) linear regression model: 1 dependent variable and 1 independent variable

Y=b1+b2*X+u

Multiple linear regression model: more than one independent variables

Y=b1+b2*X1+b3*X2+u

Data in economics: qualitative (categorical) data and Quantitative (numerical) data

Furthermore, another classification of data in economics is:

- Cross sectional data: data collected at one point in time but in many locations: score (GPA) of
students in our class in the semester 3 (45 students= observations), GDP of 63 provinces in VN in
2021 à independent observations (in this course)
- Time series data: data collected for one subject (location) overtime: GDP of VN from 2000 to
2021 (22 obs)à dependent obs (time series data analysis: chapters 21,22)
- Combine cross sectional and time series à panel data (panel data analysis)

Regression model: Y=b1+b2*X+u à E(Y/X)=b1+b2*X (as we assume E(u)=0)

b1: intercept, this represents the average of Y when X=0

Note: it is not always b1 has economic meaning. That is why we are not very much concerning on
explaining for b1 in the model.

Example: Consider Salary on Experience (years of working): Salary=b1+b2*Exp+u: b1 in this case has the
meaning of starting salary. Another example: cost on quantity produced (b1: fixed cost)

In another case, consider the price of house on area of house: b1 does not make sense because no
house having 0 area (no need to explain b1)

b2: slope of the line: rate of change of Average of Y on X: when X changes 1 unit, average of Y changes
b2 unit.

 Two types of regression models:


- Population Regression Function - PRF (theoretical model) – True function

Simple regression: E(Y|Xi) = f(Xi) = β1 + β2 Xi (1)

The difference between observation Yi and its average is the error term ui

(1) Can be represented in the form: Yi= E(Y|Xi)+ui or Yi= β1 + β2 Xi + ui (2)


β1 is the intercept, β2 is the slope parameter.
- The PRF is only theoretical, we only have true PRF when we have all population.
However, in practice, we do not have all population, we only estimate the population
parameters based on a random sample. From this sample, we estimate the PRF, we have the
Sample Regression Function (SRF). We denote SRF as follows:

Y^i = b^1 + b^2 Xi (3) in which Y^ is the estimator of E(Y/X), b^1 and b^2 are the estimators of b1
and b2. The differences between Yi and Yi^ is called residuals, denoted ui^ à
Yi – Yi^ = ui^ à Yi=b^1 + b^2*Xi + ui^ (4)

 Method for estimating the SRF: Ordinary Least Square (OLS)


Assume that we have a sample and we want to estimate the sample function:
Yi=b^1 + b^2*Xi + ui^
It means, we have to find b^1 and b^2
Idea (pricinple) of OLS method: We have to find the sample regression line (Yi^) such that it is as
close as possible to the real observations (Yi) à minimum ui^ àminimum ∑ui^ à minimum
∑u^i2
So, we have to find b^1 and b^2 such that sum of square of residuals (RSS) is minimum.

RSS=

Yi=b^1 + b^2*Xi + ui^à ui^= Yi – (b^1 + b^2*Xi) à

Final solution for b^1 and b^2 is:

In which:

Example: Consider the dependence of Weight on Height, the population is the students in
Class EPMP 6. Take a sample of 6 students, data is collected in the excel file.

Using the linear model, the population function is: Wi= β1+ β2Hi+ui (1)
Students W H w h wh x^2
1 75 180 13.5 8.666667 117 75.11111
2 61 174 -0.5 2.666667 -1.33333 7.111111
3 78 175 16.5 3.666667 60.5 13.44444
4 57 169 -4.5 -2.33333 10.5 5.444444
5 51 170 -10.5 -1.33333 14 1.777778
6 47 160 -14.5 -11.3333 164.3333 128.4444

mean 61.5 171.3333 SUM 365 231.3333

beta^2 1.57781
beta^1 -208.831

The sample regression function is:


W^=-208.83+1.58H
Wi=-208.83+1.58Hi+ ui^
Intercept has no meaning.
Slope= 1.58 > 0 that is suitable (conforming with reality) as the taller then the heavier
1.58 means if a person with Height increases 1 cm, the average Weight increases 1.58.
Prediction: Huyen has Height of 165cm, plug in to the sample function, the predicted Weight of
Huyen is: 51.8, real weight of Huyen is 65. The residual (error of prediction) is: 65-51.8=13.2kg.

Residual standard error: 7.409


Residual standard error: σ^=sqrt(RSS/(n-k))=sqrt(RSS/(n-2))
model=lm(W~H)
summary(model)

Lecture 3
The coefficient of determination R2 to measure the goodness of fit of the model (how good of
the model to fit the data): R2 lets us know how much percent of variation of Y that can be
explained by the model (or by independent variable)
Y=Y^+u^ à Y-Ybar=Y^-Ybar +u^
TSS=RSS+ESS
TSS: Total Sum of Squares: total variation of Y
RSS: Residual Sum of Squares: Variation dued to the error
ESS: Explained Sum of Squares: Variation ued the model
Define: R2= ESS/TSS: how much percent of variation of Y explained by the model.

In R-studio: Multiple R-squared: R2= 0.7239: 72.39% of variation of Weight that can be
explained the model of Height
More than 80%: very good
From 60% to under 80%: good
From 50% to under 60%: average
Under 50%: Weak
Example: Use a data set from outside (excel data file), load it in R-studio, estimate the model,
explain the meaning of coefficient, R2, find RSS, TSS, ESS.
The data set is: Table 2_6: this is the data to consider the depedence of Hourly wages (Wages)
on the years of Education (Years)
Model: Wagesi = β1+ β2*Yearsi + ui (1)
Hypothesis (expectation): positive relationship between Wages and Years of Education, β2 > 0
Loading data into R, estimate using command “lm”

Loading data into R for using


attach(Table_2_6)
Produce the scatter plot and correlation:
plot(Years,wages)

Comment: The relationship between wages and Years are strong positive.
Calculation correlation
cor(Years,wages)
[1] 0.9527809
Estimate model (1), using lm command
lm(wages~Years)

Estimated model is:


Wages^=-0.0144+0.7241*Years
The coefficient of Years is positive, showing that the relationship between Years and wages is
positive (as expected).
The value of 0.7241 means when Years of education increases 1 year, the average wages per
hour increase 0.7241 $
We can predict the hourly wages for a person who has the high school degree (12 years), that
is: -0.0144+0.7241*12= 8.67 $/hour
R2=0.9078 meaning that 90.78% of variation of hourly wages that is explained by years of
education (by the model)

Find RSS:
From Residual standard error: 0.9387:
The formula: Residual standard error: σ^=sqrt(RSS/(n-k))=sqrt(RSS/(n-2))=0.92387
RSS=0.92872* (n-2)= 0.92872* (13-2)=9.4873
R2=ESS/TSS=1-RSS/TSS=0.9078 à TSS=102.89 à ESS= TSS-RSS=93.4

 Precision of the estimators: How precise of the estimators β1^ and β2^, we use their
standard errors. On the results, we have se(β1^)=0.8746, se(β2^)=0.06958
Var(β1^)= (se(β1^))2; Var(β2^)= (se(β2^))2
 Precision of the model (estimated model): σ^
Variance of u is assumed σ2, we estimate σ2 by σ2^ , the formula is: σ2^=RSS/(n-k)
To compare the precision of two models: we use the Coefficient of Variation (CV):

CV= (σ^/ )*100 (%)

We normally do not compare R2 to evaluate which model is better because the


dependent variable of two model can be different. We only compare R2 when two
models have the same dependent variable and the number of inpependent variables
Example: Y=beta1+beta2*X1+u and Y=alpha1+alpha2*X2+v
 Assumption of the distribution of error ui
We assume ui has the normal distribution with mean of 0 and variance of σ2
ui ~N(0, σ2) à β1^ and β2^ also have the normal distribution.

Exercises:
1. Redo the estimation of coefficients for the simple model using R-studio
2 A training manager wondered whether the length of time his trainees revised for an
. examination had any effect on the marks they scored in the examination. Before the exam,
he asked a random sample of 10 of them to estimate honestly how long, to the nearest hour,
they had spent revising. After the examination he investigated the relationship between the
two variables. The sample data provided observations on the following:
Yi = exam mark for individual i (%)
Xi = revision time by individual i (hours)
The manager believes that the relationship between revision hours and exam mark takes
the form:
Yi = β1 + β2Xi + ui
where B1 and B2 are unknown parameters. The stochastic disturbance terms, u i, are
assumed to be normally and independently distributed with zero mean and constant
variance s2.
Preliminary analysis of the sample data produces the following sample information:

n=10

Lower case letters indicate that the variables are measured as deviations from their
respective sample means i.e xi = Xi - X .
Use the above sample information to answer all the following questions. Show
explicitly all calculations.

a) Calculate estimates for the unknown coefficients, β^1 and β^2. [6 marks]

b) Present the resulting regression equation and interpret the meaning of the
estimated coefficients in this problem. [6 marks]

c) Calculate the estimated standard errors of the estimated coefficients. [5


marks]

install.packages("MASS")
library(MASS)

You might also like