0% found this document useful (0 votes)

68 views38 pages

Bus 173 - Lecture 5

Linear regression with one regressor can be used to model the relationship between a dependent variable (Y) and a single independent variable (X). It assumes a linear relationship between X and Y, where the slope represents the effect of a one-unit change in X on Y. The population regression line defines this true relationship, but the parameters are unknown and must be estimated from a sample of data. Ordinary least squares (OLS) regression estimates the intercept and slope by finding the line that minimizes the sum of squared errors between predicted and actual Y values. It provides unbiased, efficient estimates if its assumptions are met. An example analyzed the relationship between test scores (Y) and student-teacher ratios (X) using O

Uploaded by

Sabab Munif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views38 pages

Bus 173 - Lecture 5

Uploaded by

Sabab Munif

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

Lecture 5

Linear regression with one regressor

Niza
Talukder
Introduction

• A state implements tough new penalties on drunk drivers. What is the effect
on highway fatalities?
• A school district cuts the sizes of its elementary classes: What is the effect on its students'
standardized test scores?
• You successfully complete one more year of college classes: What is the effect on your future
earnings?

All these statements show an unknown effect of changing one variable X on another variable, Y

• This model postulates a linear relationship between X and Y: the slope of the line relating X and
Y is the effect of a one-unit change in X on Y. Just as the mean of Y is an unknown
characteristic of the population distribution of Y, the slope of the line relating X and Y is an
unknown characteristic of the population joint distribution of X and Y. The econometric problem
is to estimate this slope-that is to estimate the effect on Y of a unit change in X-using a sample
of data on these two variables.
• Consumption function

Y= +

Y= consumption (dependent variable)

X = Income (independent or explanatory variable)
and are parameters of the model

the intercept

= slope coefficient

Relationship between variables are generally inexact. To allow for this, econometricians modify the above
equation as

Y= + +µ

µ = disturbance or error term. This is a random (stochastic) variable that has well defined probabilistic
properties. This error term takes into account all the factors that affect consumption but is not considered in the
model explicitly
• This is an example of linear regression model which hypothesizes that the dependent variable Y
(consumption) is linearly related to the explanatory variable X (income) but relation between them is
inexact and subject to individual variation.

• Regression versus correlation

Correlation coefficient measures the strength of association. But in regression, we try to

estimate or predict the value of one variable on the basis of the fixed value of another
variable.

• Regression versus causation

Population Regression Model

Y= + +µ - population regression line

+ is the population regression line or population regression function. This is the relationship that
holds between X and Y on average over the entire population. This, if you know the value of X
according to this population regression, you could predict that the value of the dependent variable Y is
+ . Note that only has statistical meaning but no real world meaning.
Different names for dependent and independent variable in applied statistics.

Dependent variable Independent variable

Explained variable Explanatory variable

Predictand Predictor

Regressand Regressor

Response Stimulus

Endogenous Exogeneous

Controlled variable Control variable

Outcome covariate
Estimating coefficients of a linear regression model

Topic: Analysis of the relationship between class size and performance of students

Test scores = + + other factors

Do we know the population value of ? They are unknown. How do we estimate this?

Is it possible to consider the entire population to test if test scores are affected by class size? No
We need to use a random sample of data drawn from the population.

Let’s use the data set named copy of caschool that is available in Google Classroom. This dataset contains data
on test performance, school characteristics and student demographic backgrounds. The
data used here are from all 420 K-6 and K-8 districts in California with data available for
1998 and 1999. Test scores are the average of the reading and math scores on the
Stanford 9 standardized test administered to 5th grade students.
Variables of interest for a simple regression
• Test score: district wide average of reading and math scores for fifth graders
• Class size: number of students divided by number of teachers i.e. the student teacher ratio

• Before doing regression, we first need to describe the data through descriptive statistics. We also
analyze the graphical presentation of the data.
• Figure: scatter plot of test score versus student teacher ratio
Correlation: -0.23
In the 10th percentile, the student teacher ratio is 17.3 meaning 10% of districts
have student teacher ratio below 17.3
• The scatter plot reflects a weak negative relationship. Although larger classes tend to have
lower test scores, there are other factors affecting test scores that keep the observations from
falling perfectly along a straight line.

• If we want to draw the best fit line through these data, then the slope of this line would be an
estimate of based on these data. The problem is that different people will create different
estimated lines. How do we choose among the many possible lines? The most common method
used is to choose a line that produces the ‘least squares’ fit to these data i.e. use the ordinary
least squares (OLS) estimator.

• OLS: Developed by Gauss in 1795. OLS estimator chooses the regression coefficients so that
the estimated regression line is as close as possible to the observed data, where closeness is
measured by the sum of squared mistakes made in predicting Y given X
OLS

• We want to draw a line that approximates all the lines. Error is the vertical distance
between the actual data point and the line. What we want to do is minimize the
squared errors to get the best approximate line.
• Suppose there are four observations from a data set for X and Y. You need to find the estimates of the
intercept and slope, and to find estimates of intercept denoted as and slope coefficient denoted as .
Fitted line will be written as

Equation 1
Please note: caret above Y simply indicates that it’s a fitted value or predicted value of Y
corresponding to X, not the actual value. In class, I have denoted this with a small letter, y.
We discussed before how drawing free hand line will leave a lot of subjective judgment. Hence, we need a good
method to calculate efficient estimates of and
Algebraically,

The first step is to understand residual – estimate of the error term. It is the difference between the actual value
Y and the fitted value given by the regression line. It will be denoted by

equation 2
Substituting equation 1 into 2 gives

Equation 3

Residual of each observation depends on our choice of and e want to choose the estimates in
such a way that the residuals will be as small as possible. The way to do this is minimize the
sum of squares of the residuals

The smaller the RSS, the better the fit, leading to unbiased and efficient estimates.
Note: (a – b – c)2 = a2 + b2 + c2 – 2ab – 2ac + 2bc
OLS technique: The workout is shown in chapter 2 of the book by Dougherty.
The Least Square Assumption:

1) The mean of the error terms has an expected value of zero given values for the independent variable. In
short, the expected value of the error term is zero.

E(
E(

It simply means that the error terms have no relationship with the independent variable X. In other words,
the other factors affecting the dependent are not correlated with each other.

2) The observations of X and Y are independently and identically distributed.

If the observations are drawn by simple random sampling from a single large population, then (X, Y) , i= 1,
2……n are i.i.d (identically, independently distributed)
Example: Recall the age and income example discussed in class
3) Large outliers : observations with values of X and Y far outside the usual range of the data are
unlikely. It leads to misleading results. It would be important to examine those outliers and make sure
they are correctly recorded
If the assumptions hold, then the estimates will be unbiased and efficient.

• Efficiency : how reliable your estimates are

Biased: Bias is the difference between the parameter and the expected value of the estimator of
the parameter.
Bias ( = E( -
It follows that bias of an unbiased estimator is 0.

• Biased Estimator: An estimator whose expectation, or sampling mean, is different from the
population value it is supposed to be estimating.

Why least squares? Two useful characteristics of least squares:

- It has desirable properties; subject to some assumptions, it is efficient, consistent and unbiased.
- Algebra is comparatively straightforward

After using the OLS technique to find the estimates of the population intercept and population slope, the
regression equation can be written as:

where,

is the estimate of

Residual is the estimate of the error term. We cannot observe the error term but we can observe the
residual.

The test score and STR equation will now look like this:

= – STR

OLS estimates of the relationship between test scores and student teacher ratio:
When OLS is used to estimate a line relating the student teacher ratio to test scores using 420
observations, the estimated slope is -2.28 and the estimated intercept is 698.9. Accordingly, OLS
regression line for these 420 observations is

= 698.9 – 2.28 STR

where test score is the average test score in the district and STR is the student teacher ratio. The symbol
indicates that this is a predicted value based on the OLS regression line. The scatter plot below shows the
OLS regression line superimposed over it.
Interpretation:

The slope, -2.28 means that an increase student teacher ratio by one student per class is on average
associated with a decline in district wide test scores by 2.28 points on the test. A decrease in STR by 2
students per class is on average associated with an increase in test scores by 4.56 points.

-2 (-2.28)
The negative slope indicates that more students per teacher i.e. a larger class is associated with poorer
performance on the test.

Now given the student teacher ratio, it is possible to predict the district wide test scores.

Question: what happens if STR increases by 20 students per class?

what is the predicted score for a district with 20 students per teacher?
• Predicted test score is : 698.9 – 2.28 (20) = 653.3

However this prediction will not always be right because of the other factors that
determine the district’s performance. But the regression line give a prediction (OLS
prediction) of what test scores would be for that district based on their STR absent those
other factors.
Measures of Goodness of Fit

Having estimated a linear regression, you might want to find out how well the regression line
describes the data. Does the regressor account for much or little of the variation in the dependent
variable? Are the observations tightly clustered around the regression line or are they spread out?

, also known as coefficient of determination, explains the variability of X in Y. In other words, it

measures the fraction of variance in Y that is explained by X. It shows how well the OLS
regression fits the data.
Properties of

0≤ ≤1

• = 1 means perfect fit. All data points fall exactly on regression line.

• = 0 means X does not have any explanatory power for Y whatsoever

• Bigger values of imply that X has more explanatory power for Y.

• is equal to correlation between X and Y squared

Y =+

• In this notation, the is the ratio of the sample of variance to the sample variance of Y.
mathematically, can be written as the explained sum of squares to the total sum of squares.

• ESS (within variation) = - )^2

• TSS =

• =

• TSS = ESS +SSR. Thus, can also be expressed as 1-

• Another measure for goodness of fit is the adjusted

SER: A measure of accuracy

The standard error of regression is an estimator of the standard deviation of the regression error.
SER provides some measure of the precision of the estimates. Conveniently, it tells you about the
spread of the observations around the regression line, measured in units of the dependent variable.
Smaller values are better because it indicates that the observations are closer to the fitted line.

If the units of the dependent variable are in dollars, then the SER measures the magnitude of a typical
deviation from the regression line – that is, the magnitude of a typical regression error in dollars.
Application to the test score data:

• Using California test score data, it has been estimated that R squared is 0.051 or 5.1 %. It
means that STR explains 5.1% of the variance of the dependent variable, test score. Or it
accounts for 5% of the variability in test score.

Only 5%. What happened to the rest i.e. 95%?what explains the rest of the variability in test
score?

• SER of 18.6 means that the standard deviation of the regression residuals is 18.6. because
the standard deviation is a measure of spread, SER of 18.6 means that there is a large spread
of the scatter plot around the regression line measured in points on the test. The large spread
means that the prediction of test scores made using only the student teacher ratio for that
district will often be wrong by a large amount.

• What should we make of this low and high SER?

Example: Analyzing the relationship between educational attainment and earnings

• Data used for analysis between earnings measured in dollars and years of schooling : data on
earnings and years of schooling collected from a survey in 1992. The following is another
example of a simple regression. Simple regression is when you have one independent
variable.
First step to interpretation is to write the equation: (discussed in class several times)

• Adjusted R-squared adjusts the statistic based on the number of independent variables in the
model.

Testing hypothesis about the slope : test for the significance of regression coefficients

Now suppose I challenge your theory. I claim that level of education has no effect on earnings,
meaning that the regression line is flat, i.e. years of schooling is 0 (coefficient of years of
schooling is 0). I ask you if there is any evidence that the slope is zero. Can you say that my
hypothesis that years of schooling = 0? Or should you accept it?

We will now discuss tests of hypothesis about the slope and intercept of population regression
line.
= 0
≠0

Three steps to test the two sided hypothesis:

1) Compute the standard error
2) The t statistic

t = estimator – hypothesized value

standard error

Check if it is less than the critical value or not

3) Compute the p value which is the smallest value – smallest significance level at which the null
hypothesis is rejected based on the test statistic actually observed
Interpretation:

If from the regression output you see that the t-stat is greater than the critical value for a
given significance level, you reject the null. Let’s take the regression output in slide 27
as an example:

T-stat of 8.12 is greater than the critical value 1.96 at 5% significance level. So, we
have enough evidence to reject the null such that the coefficient is significantly
different from 0. Thus, years of education has explanatory power over earnings
meaning it’s a significant factor that should be included in the regression model. The p-
value of 0 further shows evidence that years of schooling affects earnings of
individuals.

suggests that schooling explain 10.36% of variability in earnings.

Standard error of regression:

- Refer to slide 27.

- Beside the coefficients column, you can see ‘std. error’ column. Here, standard error is an
estimate of the standard deviation of the coefficient. It can be thought of as a measure of the
precision with which the regression coefficient is measured.

- Standard error also includes the standard deviation of the residuals in the model. In STATA
output, this is shown by the Root MSE.
Regression when X is a binary variable:

• Discussion so far focused on the case where independent variable is continuous. Regression analysis can also
be used when the regressors are binary – that is, it takes only two values, 0 or 1. For example, X might be a
worker’s gender (=1 if female, = 0 if male) or whether a class size is small or large (=1 if small, = 0 if large).
Then there are dummy variables that we can also use to introduce qualitative variables in regression analysis
that utilizes numerical data.

• Interpretation:
The interpretation of the slope changes when there is a binary variable as an independent variable. Suppose we
have a variable D that equals either 0 or 1, depending on whether the student teacher ratio is less than 20.

= 1 if STR in the district < 20

0 if STR in the district is ≥ 20

What is the average test score when STR is greater than or equal to 20?

What is the average test score when STR is less than 20?
is simply the difference of sample average test scores between the two groups.
• With a dummy/ binary variable it is not useful to think of as slope; indeed, because can take only two
values. So there is no slope, hence it will not make sense to talk about slope. Instead, we will simply
refer to as the coefficient of multiplying in this regression or the coefficient of .
Female and bachelor are examples of a dummy variable:

earning bachelor female

34.61538 1 0

19.23077 1 1

13.73626 0 1

19.23077 1 1

19.23077 1 0
Heteroskedasticity and Homoskedasticity:

• Homoscedasticity means that the variance of errors is the same across all levels of the
independent variable and the error in particular does not depend on the independent variable.
An important assumption of OLS is that the variance in the residuals has to be homoskedastic
or constant. Residuals cannot vary for lower or higher values of independent variable.
• When the variance of errors differs at different values of the IV, heteroscedasticity is indicated. A
scatterplot of these variables will often create a cone-like shape as the scatter (or variability) of the
dependent variable widens or narrows as the value of the independent variable (IV) increases. When
heteroscedasticity is marked, it can lead to serious distortion of findings and seriously weaken the
analysis, thus increasing the possibility of a Type I error.

CIA Part 2 Exam
100% (8)
CIA Part 2 Exam
55 pages
Mung Bean Final Final
No ratings yet
Mung Bean Final Final
19 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Regression
No ratings yet
Regression
60 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Lecture 2 - MRA and Inference
No ratings yet
Lecture 2 - MRA and Inference
57 pages
Econometrics 2
No ratings yet
Econometrics 2
128 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Simple Linear Regression Analysis..
No ratings yet
Simple Linear Regression Analysis..
51 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
No ratings yet
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
11 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
Unit 07 Regression Correlation
No ratings yet
Unit 07 Regression Correlation
36 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
R18&19
No ratings yet
R18&19
32 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Econometrics II: Revision Class: Introduction To Econometrics
No ratings yet
Econometrics II: Revision Class: Introduction To Econometrics
55 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
Regression: Dr. Agustinus Suryantoro, M.S
No ratings yet
Regression: Dr. Agustinus Suryantoro, M.S
31 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
R-Programming - Unit 5
No ratings yet
R-Programming - Unit 5
43 pages
Econometrics Notes Heidelberg
No ratings yet
Econometrics Notes Heidelberg
62 pages
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Lecture 2: MRA and Inference: Dr. Yundan Gong
No ratings yet
Lecture 2: MRA and Inference: Dr. Yundan Gong
52 pages
Econometrics For Finace Lecture II-Session Three
No ratings yet
Econometrics For Finace Lecture II-Session Three
32 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
No ratings yet
Topic:-Regression: Name: - Teotia Nidhi Class: - M.SC Biotechnology
10 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Third, Regression Analysis Predicts Trends and Future Values
No ratings yet
Third, Regression Analysis Predicts Trends and Future Values
2 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
No ratings yet
Regression Equation: Independent Variable Predictor Variable Explanatory Variable Dependent Variable Response Variable
60 pages
Module 6A Estimating Relationships
No ratings yet
Module 6A Estimating Relationships
104 pages
Lecture 2-3
No ratings yet
Lecture 2-3
8 pages
Linear Regression Model
No ratings yet
Linear Regression Model
3 pages
Chapter 5
No ratings yet
Chapter 5
47 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
Regression Bs
No ratings yet
Regression Bs
29 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Simple Linear Regression - Lecture Notes
No ratings yet
Simple Linear Regression - Lecture Notes
19 pages
Linear Regression II
No ratings yet
Linear Regression II
54 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Bivariate Regression Analysis: The Beginning of Many Types of Regression
No ratings yet
Bivariate Regression Analysis: The Beginning of Many Types of Regression
40 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
Linear Models
No ratings yet
Linear Models
92 pages
Econometrics For MGT ppt-2
No ratings yet
Econometrics For MGT ppt-2
58 pages
8.2 Regression
No ratings yet
8.2 Regression
16 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Bus-173 3
No ratings yet
Bus-173 3
19 pages
3rd Year
No ratings yet
3rd Year
1 page
Days
No ratings yet
Days
1 page
Writing A Video Summary
No ratings yet
Writing A Video Summary
1 page
Doxycycline
No ratings yet
Doxycycline
18 pages
Economic Impacts of Power Shortage: Sustainability
No ratings yet
Economic Impacts of Power Shortage: Sustainability
21 pages
Book 1
No ratings yet
Book 1
13 pages
Six Points
No ratings yet
Six Points
2 pages
10dnote Devraj PDF
No ratings yet
10dnote Devraj PDF
242 pages
Brown, Brughelli Et Al. 2014 - Knee Mechanics During Planned
No ratings yet
Brown, Brughelli Et Al. 2014 - Knee Mechanics During Planned
16 pages
Slides #1 Chapter 10.1.2
No ratings yet
Slides #1 Chapter 10.1.2
33 pages
Exports of Goods - % of GDP
No ratings yet
Exports of Goods - % of GDP
5 pages
Week 12 - Independent Sample T-Test
No ratings yet
Week 12 - Independent Sample T-Test
27 pages
Statistics For Management Unit 2 Note
No ratings yet
Statistics For Management Unit 2 Note
24 pages
Homework 8 PDF
No ratings yet
Homework 8 PDF
2 pages
Problems On Confidence Interval
100% (3)
Problems On Confidence Interval
6 pages
Chapter 04 Reliability
No ratings yet
Chapter 04 Reliability
16 pages
Real-Time Measurement of Solution Colour: Verein Deutscher Zuckertechniker, Annual Meeting in Neuss, May 2001
No ratings yet
Real-Time Measurement of Solution Colour: Verein Deutscher Zuckertechniker, Annual Meeting in Neuss, May 2001
6 pages
MST 004 Jan Dec 2025 (Pgdast)
No ratings yet
MST 004 Jan Dec 2025 (Pgdast)
5 pages
Statistical Inference: Dr. Mona Hassan Ahmed
No ratings yet
Statistical Inference: Dr. Mona Hassan Ahmed
34 pages
JIAFM 10 VOL 32 (2) April June
No ratings yet
JIAFM 10 VOL 32 (2) April June
96 pages
6 Sebaran Penarikan Contoh
No ratings yet
6 Sebaran Penarikan Contoh
15 pages
Seminar 1-2
No ratings yet
Seminar 1-2
21 pages
Chapter 11 PDF
No ratings yet
Chapter 11 PDF
25 pages
18 Rijal 2022 Chapter-17 Proof
No ratings yet
18 Rijal 2022 Chapter-17 Proof
21 pages
Module - 2: Geodetic Surveying and Theory of Errors
No ratings yet
Module - 2: Geodetic Surveying and Theory of Errors
33 pages
NFHS Data Quality Assurance
No ratings yet
NFHS Data Quality Assurance
15 pages
U-3 Notes
No ratings yet
U-3 Notes
42 pages
Monte Carlo Integration Lecture
No ratings yet
Monte Carlo Integration Lecture
8 pages
SPL-3 Unit 2
No ratings yet
SPL-3 Unit 2
11 pages
Heteroskedasticity
No ratings yet
Heteroskedasticity
30 pages
Solutions: Stat 101 Final
No ratings yet
Solutions: Stat 101 Final
15 pages
Deviatia Standard Si Eroarea Standard
No ratings yet
Deviatia Standard Si Eroarea Standard
17 pages
SCM 320 Lecture 3
No ratings yet
SCM 320 Lecture 3
96 pages
RIVERA ECE11 Enabling Assessment 3
No ratings yet
RIVERA ECE11 Enabling Assessment 3
4 pages
Comprehensive Project Report On A Study On Credit Risk Analysis of Commercial Bank in India
No ratings yet
Comprehensive Project Report On A Study On Credit Risk Analysis of Commercial Bank in India
41 pages
2012 ECON 1203 S1 Solutions
No ratings yet
2012 ECON 1203 S1 Solutions
9 pages

Bus 173 - Lecture 5

Uploaded by

Bus 173 - Lecture 5

Uploaded by

Lecture 5

Linear regression with one regressor

Y= consumption (dependent variable)

• Regression versus correlation

Correlation coefficient measures the strength of association. But in regression, we try to

• Regression versus causation

Y= + +µ - population regression line

Dependent variable Independent variable

Explained variable Explanatory variable

Controlled variable Control variable

Test scores = + + other factors

2) The observations of X and Y are independently and identically distributed.

• Efficiency : how reliable your estimates are

Why least squares? Two useful characteristics of least squares:

= 698.9 – 2.28 STR

Question: what happens if STR increases by 20 students per class?

, also known as coefficient of determination, explains the variability of X in Y. In other words, it

• = 0 means X does not have any explanatory power for Y whatsoever

• Bigger values of imply that X has more explanatory power for Y.

• is equal to correlation between X and Y squared

• ESS (within variation) = - )^2

• TSS = ESS +SSR. Thus, can also be expressed as 1-

SER: A measure of accuracy

• What should we make of this low and high SER?

Three steps to test the two sided hypothesis:

t = estimator – hypothesized value

Check if it is less than the critical value or not

suggests that schooling explain 10.36% of variability in earnings.

- Refer to slide 27.

= 1 if STR in the district < 20

earning bachelor female

You might also like