By: Domodar N. Gujarati: Prof. M. El-Sakka

405: ECONOMETRICS
Chapter # 1: THE NATURE OF REGRESSION ANALYSIS

By: Domodar N. Gujarati
Prof. M. El-Sakka
Dept of Economics: Kuwait University
THE MODERN INTERPRETATION OF REGRESSION
• Regression analysis is concerned with the study of the dependence of

one variable, the dependent variable, on one or more other variables,
the explanatory variables, with a view to estimating and/or predicting
the (population) mean or average value of the former in terms of the
known or fixed (in repeated sampling) values of the latter.
Examples
1. Consider Galton’s law of universal regression. Our concern is finding
out how the average height of sons changes, given the fathers’ height.
To see how this can be done, consider Figure 1.1, which is a scatter
diagram, or scattergram.
2. Consider the scattergram in Figure 1.2, which gives the distribution in a
hypothetical population of heights of boys measured at fixed ages.
3. studying the dependence of personal consumption expenditure on after tax
or disposable real personal income. Such an analysis may be helpful in
estimating the marginal propensity to consume (MPC.
4. A monopolist who can fix the price or output (but not both) may want to
find out the response of the demand for a product to changes in price. Such
an experiment may enable the estimation of the price elasticity of the
demand for the product and may help determine the most profitable price.
5. We may want to study the rate of change of money wages in relation to the
unemployment rate. The curve in Figure 1.3 is an example of the Phillips
curve. Such a scattergram may enable the labor economist to predict the
average change in money wages given a certain unemployment rate.
6. The higher the rate of inflation π, the lower the proportion (k) of their
income that people would want to hold in the form of money. Figure 1.4.
• 7. The marketing director of a company may want to know how the
demand for the company’s product is related to, say, advertising
expenditure. Such a study will be of considerable help in finding out the
elasticity of demand with respect to advertising expenditure. This
knowledge may be helpful in determining the “optimum” advertising
budget.
• 8. Finally, an agronomist may be interested in studying the dependence of

crop yield, say, of wheat, on temperature, rainfall, amount of sunshine, and
fertilizer. Such a dependence analysis may enable the prediction of the
average crop yield, given information about the explanatory variables.
1.3 STATISTICAL VERSUS DETERMINISTIC RELATIONSHIPS
• In statistical relationships among variables we essentially deal with random

or stochastic variables, that is, variables that have probability distributions.
In functional or deterministic dependency, on the other hand, we also deal
with variables, but these variables are not random or stochastic.
• The dependence of crop yield on temperature, rainfall, sunshine, and
fertilizer, for example, is statistical in nature
• In deterministic phenomena, we deal with relationships of the type, say,
exhibited by Newton’s law of gravity, which states: Every particle in the
universe attracts every other particle with a force directly proportional to
the product of their masses and inversely proportional to the square of the
distance between them. Symbolically, F = k(m1m2/r2), where F = force, m1
and m2 are the masses of the two particles, r = distance, and k = constant of
proportionality. we are not concerned with such deterministic relationships.
1.4 REGRESSION VERSUS CAUSATION
• Although regression analysis deals with the dependence of one variable on

other variables, it does not necessarily imply causation. In the crop-yield
example cited previously, there is no statistical reason to assume that rainfall
does not depend on crop yield. The fact that we treat crop yield as dependent
on rainfall (among other things) is due to non-statistical considerations:
Common sense suggests that the relationship cannot be reversed, for we
cannot control rainfall by varying crop yield. A statistical relationship in
itself cannot logically imply causation. To ascribe causality, one must appeal
to a priori or theoretical considerations.
1.5 REGRESSION VERSUS CORRELATION
• In correlation analysis, the primary objective is to measure the strength or

degree of linear association between two variables. For example, smoking
and lung cancer, scores on statistics and mathematics examinations, and so
on. In regression analysis, we try to estimate or predict the average value of
one variable on the basis of the fixed values of other variables.
• Regression and correlation have some fundamental differences. In
regression analysis there is an asymmetry in the way the dependent and
explanatory variables are treated.
• In correlation analysis, we treat any (two) variables symmetrically; there is
no distinction between the dependent and explanatory variables. The
correlation between scores on mathematics and statistics examinations is
the same as that between scores on statistics and mathematics
examinations. Moreover, both variables are assumed to be random.
Whereas most of the regression theory to be dealt with here is conditional
upon the assumption that the dependent variable is stochastic but the
explanatory variables are fixed or nonstochastic.
1.6 TERMINOLOGY AND NOTATION
• In the literature the terms dependent variable and explanatory variable are
described variously. A representative list is:
• We will use the dependent variable/explanatory variable or the more
neutral, regressand and regressor terminology.
• The term random is a synonym for the term stochastic. A random or
stochastic variable is a variable that can take on any set of values, positive
or negative, with a given probability.
1.7 THE NATURE AND SOURCES OF DATA FOR ECONOMIC ANALYSIS
• Types of Data
• There are three types of data: time series, cross-section, and pooled (i.e.,
combination of time series and cross-section) data.
• A time series is a set of observations on the values that a variable takes at
different times. It is collected at regular time intervals, such as daily,
weekly, monthly quarterly, annually, quinquennially, that is, every 5 years
(e.g., the census of manufactures), or decennially (e.g., the census of
population).
• Most empirical work based on time series data assumes that the underlying
time series is stationary. Loosely speaking a time series is stationary if its
mean and variance do not vary systematically over time.
• Cross-Section Data. Cross-section data are data on one or more variables
collected at the same point in time, such as the census of population
conducted by the Census Bureau every 10 years. example of cross-sectional
data is given in Table 1.1. For each year the data on the 50 states are cross-
sectional data. because of the stationarity issue, cross-sectional data too
have their own problems, specifically the problem of heterogeneity.
• From Table 1.1 we see that we have some states that produce huge amounts
of eggs (e.g., Pennsylvania) and some that produce very little (e.g., Alaska).
When we include such heterogeneous units in a statistical analysis, the size
or scale effect must be taken into account. To see this clearly, we plot in
Figure 1.6 the data on eggs produced and their prices in 50 states for the
year 1990. This figure shows how widely scattered the observations are.
Alaska
California
• Pooled Data. In pooled, or combined, data are elements of both time series
and cross-section data. The data in Table 1.1 are an example of pooled data.
For each year we have 50 cross-sectional observations and for each state we
have two time series observations on prices and output of eggs, a total of
100 pooled (or combined) observations.
• Panel, Longitudinal, or Micropanel Data. This is a special type of pooled
data in which the same cross-sectional unit (say, a family or a firm) is
surveyed over time.
• The Sources of Data
• The data used in empirical analysis may be collected by a governmental
agency (e.g., the Department of Commerce), an international agency (e.g.,
the International Monetary Fund (IMF) or the World Bank), a private
organization (e.g., the Standard & Poor’s Corporation), or an individual.
Literally, there are thousands of such agencies collecting data for one
purpose or another.
• The Accuracy of Data
• The quality of the data is often not that good. Some reasons for that are:
• First, as noted, most social science data are nonexperimental in nature.
Therefore, there is the possibility of observational errors, either of omission
or commission.
• Second, even in experimentally collected data errors of measurement arise
from approximations and roundoffs.
• Third, in questionnaire-type surveys, the problem of nonresponse can be
serious; a researcher is lucky to get a 40% response to a questionnaire.
• Fourth, the sampling methods used in obtaining the data may vary so
widely that it is often difficult to compare the results obtained from the
various samples.
• Fifth, economic data are generally available at a highly aggregate level. For
example, most macrodata (e.g., GNP, inflation, unemployment).
• The researcher should always keep in mind that the results of research are
only as good as the quality of the data.
A Note on the Measurement Scales of Variables.
• The variables that we will generally encounter fall into four broad
categories: ratio scale, interval scale, ordinal scale, and nominal scale. It is
important that we understand each.
• Ratio Scale. For a variable X, taking two values, X1 and X2, the ratio X1/X2.
Comparisons such as X2 ≤ X1 or X2 ≥ X1 are meaningful.
• Interval Scale. The distance between two time periods, say (2000–1995) is
meaningful, but not the ratio of two time periods (2000/1995).
• Ordinal Scale. Examples are grading systems (A, B, C grades) or income
class (upper, middle, lower). For these variables the ordering exists but the
distances between the categories cannot be quantified.
• Nominal Scale. Variables such as gender and marital status simply denote
categories. Such variables cannot be expressed on the ratio, interval, or
ordinal scales.
• Econometric techniques that may be suitable for ratio scale variables may
not be suitable for nominal scale variables. Therefore, it is important to
bear in mind the distinctions among the four types.

By: Domodar N. Gujarati: Prof. M. El-Sakka

Uploaded by

Copyright:

Available Formats

By: Domodar N. Gujarati: Prof. M. El-Sakka

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

By: Domodar N. Gujarati: Prof. M. El-Sakka

Uploaded by

Copyright:

Available Formats

405: ECONOMETRICS

Chapter # 1: THE NATURE OF REGRESSION ANALYSIS

• Regression analysis is concerned with the study of the dependence of

• 8. Finally, an agronomist may be interested in studying the dependence of

• In statistical relationships among variables we essentially deal with random

• Although regression analysis deals with the dependence of one variable on

• In correlation analysis, the primary objective is to measure the strength or

You might also like