0% found this document useful (0 votes)
73 views27 pages

CHAPTER II - Nature of Regression Analysis

1) Regression analysis examines the relationship between a dependent variable and one or more independent variables. It aims to predict the average value of the dependent variable based on the known values of the independent variables. 2) Some key types of relationships are statistical relationships versus deterministic relationships, regression versus causation, and regression versus correlation. Regression does not necessarily imply causation. 3) Terminology used in regression analysis includes dependent variable, explanatory variable, predictor, predictand, regressor, and regressand.

Uploaded by

k61.2212150555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views27 pages

CHAPTER II - Nature of Regression Analysis

1) Regression analysis examines the relationship between a dependent variable and one or more independent variables. It aims to predict the average value of the dependent variable based on the known values of the independent variables. 2) Some key types of relationships are statistical relationships versus deterministic relationships, regression versus causation, and regression versus correlation. Regression does not necessarily imply causation. 3) Terminology used in regression analysis includes dependent variable, explanatory variable, predictor, predictand, regressor, and regressand.

Uploaded by

k61.2212150555
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

CHAPTER II: THE NATURE OF

REGRESSION ANALYSIS
Thi Phuong Mai VU- FIE- FTU
[email protected]

1
OUTLINE

1. Historical origin of the term Regression


2. Regression Analysis
3. Different types of relationships
4. Terminology and notation
5. The nature and sources of data for economic analysis
6. Summary and conclusions

2
1. Historical origin of the term Regression
• The term regression was introduced by Francis Galton.

• Galton found that, although there was a tendency for tall


parents to have tall children and for short parents to have short
children, the average height of children born of parents of a
given height tended to move or “regress” toward the average
height in the population as a whole.

• In other words, the height of the children of unusually tall or


unusually short parents tends to move toward the average
height of the population.

3
1. Historical origin of the term Regression

• Galton’s law of universal regression was confirmed that:

“the average height of sons of a group of tall fathers was less


than their fathers’ height and the average height of sons of a
group of short fathers was greater than their fathers’ height, thus
“regressing” tall and short sons alike toward the average height
of all men.”

→ This was “regression to mediocrity.”

4
2. Regression analysis

• Reconsider Galton’s law of universal regression. Galton was


interested in finding out why there was a stability in the
distribution of heights in a population.

• But in the modern view our concern is not with this explanation
but rather with finding out how the average height of sons
changes, given the fathers’ height.

• In other words, our concern is with predicting the average


height of sons knowing the height of their fathers.

5
2. Regression
analysis
Giá trị trung bình

75
• The distribution of heights of sons in a
hypothetical population corresponding
Chiều cao của con trai (inches)

to the given or fixed values of the


70 father’s height.
• Despite the variability of the height of
sons for a given value of father’s height,
65
the average height of sons increases as
the height of the father increases.

60
• The circled crosses in the figure indicate
the average height of sons
corresponding to a given height of the
father.
60 65 70 75

Chiều cao của bố (inches)


• Connecting these averages, we obtain
the line shown in the figure. This line is
known as the regression line → shows
how the average height of sons
increases with the father’s height.

6
2. Regression analysis

• Regression analysis is concerned with the study of the dependence of


one variable, the dependent variable, on one or more other
variables, the explanatory variables, with a view to estimating
and/or predicting the (population) mean or average value of the
former in terms of the known or fixed (in repeated sampling) values
of the latter.
• Examples:
✓The dependence of the quantity demanded on the price of a
good.
✓The dependence of personal consumption on personal
income.
✓The dependence of output on labor, capital and technology.

7
DEPENDENT VARIABLE INDEPENDENT
(Y) VARIABLE(S)
(X)

Explained variable Explanatory variable

Endogenous variable Exogenous variable

2. Regression Regressand Regressor

analysis Response Stimulus

Examples: Examples:
Quantity demanded Price, income, other
goods’ prices
Crop yield Temperature, rainfall,
sunshine, fertilizer
Consumption Income, wealth, personal
needs, age

8
3. Different types of relationships

• Statistical vs. Deterministic Relationships


• Regression vs. Causation
• Regression vs. Correlation

9
Statistical vs. Deterministic relationships

Deterministic (functional) relationship Statistical relationship

Variables are not stochastic Deals with stochastic variables

For one value of X, there is only one For one value of X, there are a lot of
corresponding value of Y corresponding values of Y

Example: y = 3x Example: With the income of 8 million


dongs per month, the level of expenditure
for each individual can be 4, 5, 7, or 10
million dongs.

The relationship between economic


variables are statistical.

10
Regression vs. causation

• Regression does not imply causation.


• Example:
• Your score in Econometrics depends on your Math scores at high school.
However, if you get high score in Econometrics doesn’t mean that your Math
scores are high.
• Crop yield depends on rainfall but rainfall does not depend on crop yield.

11
Regression vs. correlation

Correlation Regression
The primary objective is to measure Try to estimate or predict the average
the strength or degree of linear value of one variable on the basis of the
association between two variables fixed values of other variables

Ex: the correlation (coefficient) Ex: predict the average score on a


between scores on statistics and statistics examination by knowing a
mathematics examinations student’s score on a mathematics
examination
Correlation concerns with the strength of Regression concerns with the
the linear association between 2 dependence of the dependent variable on
variables. independent variables
Variables are treated symmetrically There is an asymmetry in the way the
dependent and explanatory variables are
treated.
12
4. Terminology and notation

Dependent variable Explanatory variable


Explained variable Independent variable
Predictand Predictor
Regressand Regressor
Response Stimulus
Endogenous Exogenous
Outcome Covariate
Controlled variable Control variable
13
4. Terminology and notation

• If we are studying the dependence of a variable on only a single


explanatory variable, such as that of consumption expenditure on
real income, such a study is known as simple, or two-variable,
regression analysis.
• If we are studying the dependence of one variable on more than one
explanatory variable, as in the crop-yield, rainfall, temperature,
sunshine, and fertilizer examples, it is known as multiple regression
analysis.
• In other words, in two-variable regression there is only one
explanatory variable, whereas in multiple regression there is .

14
5. The nature and sources of data

• The success of any econometric analysis ultimately depends on the


availability of the appropriate data.

• It is therefore essential that we spend some time discussing the


nature, sources, and limitations of the data that one may encounter
in empirical analysis.

15
5. The nature and sources of data

Types of • Cross-sectional data


• Time Series data
Data • Pooled data (Panel)

Sources • Internet
• Experimental – non-experimental
of data
Accuracy • Observation errors, measurement approximations.
• Various sampling methods make comparing results obtained from
various samples difficult.

of data • economic data are generally available at a highly aggregate level.


• Confidential data

16
Types of data: Time series data

• Time Series Data: A time series is a set of observations on the values


that a variable takes at different times.
• Such data may be collected at regular time intervals, such as daily
(e.g., stock prices, weather reports), weekly (e.g., money supply
figures), monthly [e.g., the unemployment rate, the Consumer Price
Index (CPI)], quarterly (e.g., GDP), annually (e.g., government
budgets)…
• The observation subscript t will be used for time series data (i.e.,
data collected over a period of time).

17
Type of data: Cross-section data

• Cross-section data are data on one or more variables collected at the


same point in time.
• Ex: the census of population conducted by the Census Bureau every
10 years (the latest being in year 2000), the surveys of consumer
expenditures conducted by the University of Michigan…
• The observation subscript i will be used for cross sectional data (i.e.,
data collected at one point in time)

18
Type of data: Pooled data

• In pooled, or combined, data are elements of both time series and


cross-section data.
• Panel, Longitudinal, or Micro-panel Data : This is a special type of
pooled data in which the same cross-sectional unit (say, a family or a
firm) is surveyed over time.

19
The sources of data

• The data used in empirical analysis may be collected by a


governmental agency (e.g., the Department of Commerce), an
international agency (e.g., the International Monetary Fund (IMF) or
the World Bank), a private organization (e.g., the Standard&Poor’s
Corporation), or an individual.
• The Internet The Internet has literally revolutionized data gathering.
If you just “surf the net” with a keyword (e.g., exchange rates), you
will be swamped with all kinds of data sources

20
The sources of data

• The data collected by various agencies may be experimental or non-


experimental.
• In experimental data, often collected in the natural sciences, the
investigator may want to collect data while holding certain factors
constant in order to assess the impact of some factors on a given
phenomenon.
• For instance, in assessing the impact of obesity on blood pressure,
the researcher would want to collect data while holding constant the
eating, smoking, and drinking habits of the people in order to
minimize the influence of these variables on blood pressure.

21
The sources of data

• In the social sciences, the data that one generally encounters are
non-experimental in nature, that is, not subject to the control of the
researcher.
• Ex: the data on GNP, unemployment, stock prices, etc., are not directly under
the control of the investigator.
• This lack of control often creates special problems for
the researcher in pinning down the exact cause or
causes affecting a particular situation.
• Ex: is it the money supply that determines the (nominal) GDP or is it the
other way round?

22
Accuracy of data

• Although plenty of data are available for economic research, the


quality of the data is often not that good. There are several reasons
for that:
• Most social science data are non-experimental in nature. → the possibility of
observational errors, either of omission or commission.
• Second, even in experimentally collected data errors of measurement arise
from approximations and round offs.
• Third, in questionnaire-type surveys, the problem of non response can be
serious; Analysis based on such partial response may not truly reflect the
behavior of the part who did not respond→ leading to the (sample)
selectivity bias.

23
Accuracy of data

• Fourth, the sampling methods used in obtaining the data may vary so
widely that it is often difficult to compare the results obtained from
the various samples.
• Fifth, economic data are generally available at a highly aggregate
level. Such highly aggregated data may not tell us much about the
individual or micro units that may be the ultimate object of study.
• Sixth, because of confidentiality, certain data can be published only in
highly aggregate form.

24
Accuracy of data

• Because of all these and many other problems, the researcher should
always keep in mind that the results of research are only as good as
the quality of the data.
• Therefore, if in given situations researchers find that the results of
the research are “unsatisfactory,” the cause may be not that they
used the wrong model but that the quality of the data was poor.

25
Summary and conclusions

• The key idea behind regression analysis is the statistical dependence


of one variable, the dependent variable, on one or more other
variables, the explanatory variables.
• The objective of such analysis is to estimate and/or predict the mean
or average value of the dependent variable on the basis of the known
or fixed values of the explanatory variables.
• In practice the success of regression analysis depends on the
availability of the appropriate data. This chapter discussed the
nature, sources, and limitations of the data that are generally
available for research, especially in the social sciences.

26
Summary and conclusions

• In any research, the researcher should clearly state the sources of the
data used in the analysis, their definitions, their methods of
collection, and any gaps or omissions in the data as well as any
revisions in the data. Keep in mind that the macroeconomic data
published by the government are often revised.
• Since the reader may not have the time, energy, or resources to track
down the data, the reader has the right to presume that the data
used by the researcher are properly gathered and that the
computations and analysis are correct.

27

You might also like