0% found this document useful (0 votes)
17 views6 pages

Regression Analysis

Uploaded by

aditidocmoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views6 pages

Regression Analysis

Uploaded by

aditidocmoc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Study Notes

Regression Analysis
Regression Analysis

Regression

 Regression is considered as analysis of dependence of dependent variable on the


independent variables with an objective to predict the average value of the
dependent variable given a specific value of the independent variable.
 It shows statistical relationship among variables, i.e.; it deals with variables that have
probability distributions (random or stochastic variables).
 It does not imply necessarily imply causation. The researcher specifies Y as
dependent variable based on his knowledge/existing theory. The analysis does not tell
you anything about the causation. And after estimation, by looking at the significance of
᷇ β you are saying that causal relationship is established.

Regression
Analysis of dependence of dependent variable on the independent variables with
an objective to predict the average value of the dependent variable

Types of Regression

Simple Regression Multiple Regression


Analysis is confined to 2 Analysis is confined to more
variables at a time than 2 variables at a time

Linear Regression Non-linear or Curvilinear Regression


Linear in parameters, the β’s; i.e.; the The regression equation will have the parameters with
parameters are raised to the first degree higher than 1, involving terms of the type β 2 , β 3,
power only. It may or may not be β 1 β 2 , β 1∨β 2 , etc.
linear in the explanatory variables,
the X’s.

Dependent Variable Independent Variable


 Effect  Cause
 Explained Variable  Explanatory Variable
 Predictand  Predictor
 Regressand  Regressor
 Response  Stimulus
 Endogenous  Exogenous
 Outcome  Covariate
 Controlled Variable  Control Variable

2
Regression Analysis

Historical Background

 Traditionally, regression meant tending towards average.

 The term was first introduced by Sir Francis Galton in the study of hereditary.
 In a population, if you take any child’s height; it will tend towards the population average.
In other words, taller parents had taller child and shorter parents had shorter child. It was
“regression to mediocrity”.
 Karl Pearson confirmed this in his study where he found, “average height of sons of tall
fathers was less than their father’s height and average height of sons of short fathers
was greater than their father’s height, thus “regressing” tall and short sons toward
average height of all men”.

How do you proceed?

 Consider two statements:


o S1: model generates data or
o S2 : data generates the model.
 Obviously, S1 is correct.
 It can be broadly thought that the model exists in nature but is unknown to the
experimenter.
 When some values to the explanatory variables are provided, then the values for the
output or study variable are generated accordingly, depending on the form of the
function f and the nature of the phenomenon.
 So ideally, the pre-existing model gives rise to the data. Our objective is to determine the
functional form of this model.
 Now we move in the backward direction. We propose to first collect the data on study
and explanatory variables. Then we employ some statistical techniques and use this
data to know the form of function f.
 Equivalently, the data from the model is recorded first and then used to determine the
parameters of the model.
 Thus, the literal meaning of regression analysis is to move in backward direction (used
to determine unknown parameters)

Example

 Suppose the yield of the crop (Y) depends linearly on two explanatory variables, viz., the
quantity of fertilizer ( X 1 ) and level of irrigation ( X 2 ) as

Y=bX +bX +
1 1 2 2

 There exist the true values of β 1 and β 2 in nature but are unknown to the experimenter.
 Some values on Y are recorded by providing different values to X 1 and X 2 . There exists
some relationship between Y and X 1 , X 2 which gives rise to a systematically behaved
data on Y, X 1 and X 2 . Such a relationship is unknown to the experimenter.

3
Regression Analysis

 To determine the model, we move in the backward direction in the sense that the
collected data is used to determine the unknown parameters β 1 and β 2of the model.
 In this sense, such an approach is termed as regression analysis.

Steps in Regression Analysis

 Statement of the problem under consideration


For example, the height and weight of children are related. Now there can be two issues
to be addressed.
(i) Determination of height for a given weight, or
(ii) Determination of weight for a given height.
 Choice of relevant variables
For example, in any agricultural experiment, the yield depends on explanatory variables
like quantity of fertilizer, rainfall, irrigation, temperature etc. These variables are denoted
by X 1 , X 2, X 3 , X 4, ….., X kas a set of k explanatory variables.
 Collection of data on relevant variables
For example, suppose we want to collect the data on age. For this, it is important to
know how to record the data on age. Then either the date of birth can be recorded which
will provide the exact age on any specific date or the age in terms of completed years as
on specific date can be recorded. Moreover, it is also important to decide whether the
data has to be collected on variables as quantitative variables or qualitative variables.
For example, if the ages (in years) are 15,17,19,21,23, then these are quantitative
values. If the ages are defined by a variable that takes value 1 if ages are less than 18
years and 0 if the ages are more than 18 years, then the earlier recorded data is
converted to 1,1,0,0,0.
 If the study variable is binary, then logistic and probit regressions etc. are used.
 If all explanatory variables are quantitative, then analysis of variance technique
is used.
 If some explanatory variables are qualitative and others are quantitative, then
analysis of covariance technique is used.
 Specification of model
Only the form of the tentative model can be ascertained, and it will depend on some
unknown parameters. For example, a general form will be like Y = f ( X , X ,..., X ; b , b
1 2 k 1 2

,..., b ) +  where  is the random error reflecting mainly the difference in the observed
k

value of Y and the value of Y obtained through the model. The form of f ( X , X ,..., X ; b ,
1 2 k 1

b ,..., b ) can be linear as well as non-linear depending on the form of parameters b , b


2 k 1 2

,..., b A model is said to be linear if it is linear in parameters.


k.

 Choice of method for fitting the data


After the model has been defined, and the data have been collected, the next task is to
estimate the parameters of the model based on the collected data. This is also referred
to as parameter estimation or model fitting. The most commonly used method of
estimation is the least-squares method. Under certain assumptions, the least-squares
method produces estimators with desirable properties. The other estimation methods are

4
Regression Analysis

the maximum likelihood method (needs knowledge of distribution of Y), principle of least
squares, method of moments, ridge method, principal components method etc.
 Fitting of model
The estimation of unknown parameters using appropriate method provides the values of
the parameter. Substituting these values in the equation gives us a usable model. This is
termed as model fitting. Estimates of parameters b , b ,..., b in the model Y = f ( X , X
1 2 k. 1 2

,..., X ; b , b ,..., b ) +  are denoted by bˆ 1, bˆ2 ,..., bˆk which gives the fitted model as Y
k 1 2 k 1

= f ( X , X ,..., X ; bˆ 1, bˆ2 ,..., bˆk) . When the value of Y is obtained for the given values
1 2 k 1

X1 X 2,..., X k , it is denoted as Yˆ and called as fitted value.


,

 Model validation and criticism


The validation of the assumptions must be made before drawing any statistical
conclusion. Regression analysis is an iterative process where the outputs are used to
diagnose, validate, criticize and modify the inputs.

 Using the chosen model(s) for the solution of the posed problem.
 The determination of the explicit form of the regression equation is the ultimate
objective of regression analysis.
 To determine the role of any explanatory variable in the joint relationship in any
policy formulation,
 To forecast the values of the response variable for a given set of values of
explanatory variables.

Regression v/s Correlation

Regression Correlation
Purpose Predicts the average value of one Measures the direction and strength or
variable on the basis of fixed values of degree of linear association between the
other variables. two variables.
Usage There is an asymmetry in the way the Variables are treated symmetrically, i.e.;
dependent and explanatory variables are there is no difference between the
treated. dependent and explanatory variables.
The dependent variable is assumed to Both variables are assumed to be random.
be statistical, random or stochastic (i.e.,
to have a probability distribution). The
explanatory variables, are assumed to
have fixed values (in repeated sampling).

5
Regression Analysis

Coefficient Represented by b Represented by r


value Only one of the regression coefficients Can be between -1 to 1
can be greater than one.
Origin and Regression coefficients are independent Correlation coefficient is independent of
scale of change of origin but not of scale both change of origin and scale
Cause Can be used to establish cause effect Does not establish
and effect relationship

You might also like