Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

Chapter two: Simple Linear Regression Model

2.1 Introduction to Regression Analysis


Learning Outcomes: In this Chapter, you will learn how to
1. Understand the nature of regression analysis
2. Define regression analysis as a statistical technique
3. Recognize the distinction between correlation and regression in terms of
causality and prediction
4. Recognize the distinction between Statistical versus Deterministic Relationships
What Does a Regression Look Like?
Figure .1 is one way to present a regression.

1. Origin of the Term ‘’Regression’’1


The term regression was introduced by Francis Galton. In a famous paper, Galton
found that, although there was a tendency for tall parents to have tall children and for short
parents to have short children, the average height of children born of parents of a given
height tended to move or “regress” toward the average height in the population as a whole.
In other words, the height of the children of unusually tall or unusually short parents
tends to move toward the average height of the population. Galton’s law of universal
regression was confirmed by his friend Karl Pearson, who collected more than a thousand
records of heights of members of family groups.

1 Damodar N. Gujarati and Dawn C. Porter (2009). BASIC ECONOMETRICS. 5th edition. McGraw-Hill Companies, Inc. Boston.
P15.
University of Algiers 3 Subject: Statistical Modeling
Lecture 2.1: Introduction to Regression Analysis
He found that the average height of sons of a group of tall fathers was less than their
fathers’ height and the average height of sons of a group of short fathers was greater than
their fathers’ height, thus “regressing” tall and short sons alike toward the average height
of all men. In the words of Galton, this was “regression to mediocrity.”
2. The Nature of Regression Analysis
Much of applied statistical analysis begins with the following premise: y and x are two
variables, representing some population, and we are interested in “explaining y in terms of
x,” or in “studying how y varies with changes in x.” For examples; y is soybean crop yield and
x is amount of fertilizer; y is hourly wage and x is years of education; and y is a community
crime rate and x is number of police officers.
In writing down a model that will “explain y in terms of x,” we must confront three
issues. First, since there is never an exact relationship between two variables, how do we
allow for other factors to affect y? Second, what is the functional relationship between y
and x? And third, how can we be sure we are capturing a ceteris paribus2 relationship
between y and x (if that is a desired goal)?
We can resolve these ambiguities by writing down an equation relating y to x. A simple
equation is:
y = b0 + b1 x + u …… [1]
Equation (1), which is assumed to hold in the population of interest, defines the simple
linear regression model. It is also called the two-variable linear regression model or
bivariate linear regression model because it relates the two variables x and y.
The variables y and x have several different names used interchangeably, as follows:

3. The definition of regression analysis

2
The term ceteris paribus is a Latin phrase meaning "all other things being equal" or "holding other
things constant." It is often used in economics and other social sciences to isolate the effect of one
variable while assuming that other relevant factors remain unchanged. For example, when analyzing the
effect of price on demand, one might say "ceteris paribus, an increase in price leads to a decrease in
demand," assuming no other factors like consumer income or preferences change during the analysis.

2
University of Algiers 3 Subject: Statistical Modeling
Lecture 2.1: Introduction to Regression Analysis
Regression analysis is a statistical technique that attempts to “explain” movements in
one variable, the dependent variable, as a function of movements in a set of other variables,
called the independent variables, through the quantification of a single equation.
Regression analysis is a powerful tool for making predictions, understanding cause-
and-effect relationships, and identifying trends in data.
In simpler terms, regression analysis helps you answer questions like:
• How does one variable change in response to changes in another variable?
• Can we predict the value of one variable based on the values of other variables?
• What is the strength of the relationship between two or more variables?
There are several types of regression analysis, including:
1. Simple linear regression: This involves modeling the relationship between a single
dependent variable and a single independent variable using a linear equation.
2. Multiple linear regression: This extends simple linear regression to include multiple
independent variables.
3. Logistic regression: Used when the dependent variable is categorical (e.g., yes/no,
pass/fail).
4. Non-linear regression: Used when the relationship between the variables is not
linear.
Regression analysis is widely used in various fields, such as economics, finance,
marketing, social sciences, and engineering. It is a valuable tool for researchers and
analysts who need to understand and predict relationships between variables.
Regression analysis and economic relationships3
Econometricians use regression analysis to make quantitative estimates of economic
relationships that previously have been completely theoretical in nature. After all, anybody
can claim that the quantity of PCs demanded will increase if the price of those PCs
decreases (holding everything else constant), but not many people can put specific numbers
into an equation and estimate by how many PCs the quantity demanded will increase for
each dollar that price decreases.
To predict the direction of the change, you need a knowledge of economic theory and
the general characteristics of the product in question. To predict the amount of the change,
though, you need a sample of data, and you need a way to estimate the relationship. The
most frequently used method to estimate such a relationship in econometrics is regression
analysis.4

3
Jeffrey M. Wooldridge (2013). Introductory Econometrics. 5th edition. South-Western. Mason. USA. P22-23.
4
A.H. Studenmund (2014). Using Econometrics, A Practical Guide. Sixth Edition. Pearson Education Limited. England.
P5.

3
University of Algiers 3 Subject: Statistical Modeling
Lecture 2.1: Introduction to Regression Analysis
Much of economics and business is concerned with cause-and-effect propositions. If
the price of a good increases by one unit, then the quantity demanded decreases on average
by a certain amount, depending on the price elasticity of demand (defined as the percentage
change in the quantity demanded that is caused by a one percent increase in price).
Similarly, if the quantity of capital employed increases by one unit, then output
increases by a certain amount, called the marginal productivity of capital. Propositions such
as these pose an if-then, or causal, relationship that logically postulates that a dependent
variable’s movements are determined by movements in a number of specific independent
variables.
2. Regression versus correlation5
The correlation between two variables measures the degree of linear association
between them. If it is stated that y and x are correlated, it means that y and x are being
treated in a completely symmetrical way. Thus, it is not implied that changes in x cause
changes in y, or indeed that changes in y cause changes in x. Rather, it is simply stated that
there is evidence for a linear relationship between the two variables, and that movements
in the two are on average related to an extent given by the correlation coefficient.
In regression, the dependent variable (y) and the independent variable(s) (xs) are
treated very differently. The y variable is assumed to be random or ‘stochastic’ in some way,
i.e. to have a probability distribution. The x variables are, however, assumed to have fixed
(‘non-stochastic’) values in repeated samples. Regression as a tool is more flexible and
more powerful than correlation
In short, correlation measures the strength and direction of the linear relationship
between two variables, showing how closely they move together. It doesn't imply causality
and ranges from -1 (perfect negative) to +1 (perfect positive).
Regression goes beyond measuring relationship strength and models how one variable
(dependent) changes in response to changes in another (independent). It implies a cause-
and-effect relationship, predicting the dependent variable based on the independent
variables.
Thus, while correlation quantifies association, regression explains and predicts

5
Chris Brooks (2008). Introductory Econometrics for Finance. SECOND EDITION. Cambridge University Press. New
York. USA. P28.

4
University of Algiers 3 Subject: Statistical Modeling
Lecture 2.1: Introduction to Regression Analysis
4. Statistical versus Deterministic Relationships
4.1. Deterministic Relationships
A deterministic relationship occurs when one variable precisely determines another. For
example, in physics, the speed of an object in uniform motion is exactly determined by its
distance and time. There's no room for variation, and the relationship holds with certainty.
Precise and Predictable: In a deterministic relationship, the value of one variable is
completely determined by the value of another variable. There's a fixed, predictable rule
governing their relationship.
No Randomness: There's no room for chance or randomness. Given the value of one
variable, you can always accurately predict the value of the other.
Examples:
4.1.1. The relationship between the circumference of a circle and its diameter.

The circumference of a circle is the distance around its edge. The formula is:

C = 2πr

where:

• C is the circumference
• π (pi) is a mathematical constant approximately equal to 3.14159
• r is the radius of the circle (distance from the center to any point on the edge)
• The formula for calculating the area of a square based on its side length.
4.1.2. Area of a Square:

The area of a square is the amount of space it occupies. The formula is:

A = s²
where:

• A is the area
• s is the side length of the square

4.2. Statistical Relationships


In contrast, a statistical relationship allows for variability. In economics or social
sciences, for instance, one variable (like income) influences another (like consumption), but
not perfectly. Factors like randomness or unmeasured variables introduce variation,
meaning the relationship only holds on average, not exactly.

5
University of Algiers 3 Subject: Statistical Modeling
Lecture 2.1: Introduction to Regression Analysis
Regression analysis often deals with statistical relationships to estimate trends and
predict outcomes.
Probabilistic: Statistical relationships involve a degree of uncertainty or randomness. The
value of one variable is not perfectly determined by the value of another, but rather, there's
a probability that a certain value of one variable will occur given a specific value of the
other.
Influenced by Factors: Other factors besides the independent variable can influence the
dependent variable.
The general form for a regression model that could be used to analyze these relationships
might look like this:
Y = β0 + β1X1 + β2X2 + ... + βnXn + ε

Where:

• Y: The dependent variable (e.g., weight, income)


• β0: The intercept (the value of Y when all independent variables are 0)
• β1, β2, ..., βn: The regression coefficients representing the impact of each independent variable
(e.g., height, education) on Y
• X1, X2, ..., Xn: The independent variables
• ε: The error term, representing the unexplained variation in Y

Examples:
4.2.1. The relationship between height and weight in humans: While there's a general
correlation, there are variations due to genetics, diet, and other factors. To model the
relationship between height and weight, you might use:
Weight = β0 + β1Height + ε

To model the relationship between education and income, you might use:

Income = β0 + β1Education + β2Experience + ε

In these models, the regression coefficients (β1, β2, etc.) would indicate the expected change
in the dependent variable (weight or income) for a one-unit increase in the corresponding
independent variable (height or education), while controlling for the other variables.

In summary, deterministic relationships are characterized by a fixed, predictable


connection between variables, while statistical relationships involve a degree of uncertainty
and are influenced by multiple factors. Understanding the distinction between these two
types of relationships is crucial for accurately interpreting and analyzing data.

6
University of Algiers 3 Subject: Statistical Modeling
Lecture 2.1: Introduction to Regression Analysis
Feature Statistical Relationship Deterministic Relationship
Nature of Probabilistic Precise and Predictable
Relationship
Role of Significant Minimal or None
Randomness
Examples Height and weight, education Circumference and diameter, area
and income of a square

You might also like