Basic Economterics - I
Basic Economterics - I
N Senthil Kumar
Member of Faculty, RBSC, Chennai
The goal is to turn data into
information, and information into
insight.
Carly Fiorina, Former CEO of HP
• Null Hypothesis
1. Neutral Position,
2. Known Position,
3. Established Ideas,
4. No Difference Position,
5. Conservative Position
• Notations
A null hypothesis is rejected if the observed data a. Null Hypothesis is Denoted by Ho
are significantly unlikely to have occurred if the null b. Alternative Hypothesis as H 1
hypothesis were true.
1. God Exists
2. 118th Grade B Batch is the best.
3. The Investment Portfolio returns 8%
4. The additives in the food packet cause
cancer
5. Rejection Rate in CVPS machine has
increased from the last year
6. The Drug is effective
7. Higher Interest Rate leads to high risk
credit portfolio
Key Ideas
• Statistical versus Deterministic Relationship
• In statistical relationships, we deal with random variables, while in the case of
deterministic relationship, we deal with variable which are not random.
• In the case of statistical relationship, the dependency is not completely established by
the explanatory variables while in the case of deterministic relationship, the dependence
variables is completely explained by the explanatory variables.
• Regression versus Causation
• A Statistical Relationship however strong and however suggestive, can never establish
causal connection. Our ideas of causation must come from outside statistics, ultimately
from some theory or other – Kendall and Stuart
• Regression Vs Correlation
• Correlation measures the degree of linear association between two variables. In
regression, we tries to estimate the average value of the dependent variables based on a
fixed value of the explanatory variable
• Regression treats the variable differently, one is random the other one is fixed, while
correlations treats both as random symmetrically.
Data and Variables
• Key Terminologies
1. Dependent / Independent
2. Explained / Explanatory
3. Regressand / Regressor
4. Endogenous / Exogenous
5. Random == Stochastic
6. Simple Regression Analysis Vs Multiple Regression Analysis
• Nature of Data
1. Time Series Data
• An aspect of an unit is measured over time
2. Cross Sectional Data
• At a particular time point measurement is taken on one or more aspects of the units
3. Pooled Data
• Both Time series and cross sectional data are combined
4. Panel Data
• Measurements are taken on the same units over time
Regression Defined
• The main objective of Regression is prediction and
therefore the objective is to estimate the dependent
variable for a given fixed value of the independent
variable.
• Prediction is done by establishing relationships between
the dependent and independent variables. We also note
here that unlike deterministic relationship, regression
relies on statistical relationships between the dependent
and independent variables.
• In a statistical relationship, we deal with random
variables. Therefore we recognize that the dependent
variable is a random variable following certain distinct
distribution for each given fixed value of the
independent variable.
• Now it is natural to pickup the mean of the distribution
(we make assumptions that such distribution is
expected to be symmetrical and hence we select mean)
as a candidate estimator of the dependent variable for a
given value of the independent variable.
• Therefore, Regression is defined to be the conditional
mean of the dependent variable, i.e., 𝑬(𝒀/𝑿=𝒙𝒊 )
Linear Regression Defined
• We can see this conditional mean 𝐸(𝑌/𝑋=𝑥𝑖) therefore is a function of 𝑥𝑖
and other parameters. In other words, 𝐸(𝑌/𝑋=𝑥𝑖) = 𝑓 𝑥𝑖 .
• If this function 𝑓 𝑥𝑖 = α + β𝑥𝑖 is in linear form, we call this function as a
Linear Regression Function of the population.
• Practically, if we now limit our requirements only to prediction, our
problem is reduced to just estimating the two parameters α and β using
the data available. Let these estimators be denoted as ෝ
α and β.
• Once we have the estimators viz., ෝ we will have the predicted value
α and β,
as 𝑌𝑖 = ෝ 𝑖 for a given 𝑥𝑖.
α + β𝑥
• The actual relationships of Y and X can therefore now be modelled as 𝑌𝑖 =
α + β𝑋𝑖 + ϵ𝑖 where ϵ𝑖 is random variable accounting for all variation not
due to X.
• But, based on the data, we would have an estimated the relationship as
𝑖 +ෝ
𝑌𝑖 = ෝα + β𝑋 ϵ𝑖
Estimating the Parameters – Ordinary
Least Square (OLS) Method
• We are essentially going to estimate
the Population Regression Function
(PRF) 𝒀𝒊 = 𝜶 + 𝜷𝑿𝒊 + 𝒖𝒊 through the
Sample Regression Function (SRF)
𝑖 +ෝ
𝑌𝑖 = ෝα + β𝑋 ϵ𝑖 .
• We can choose the ෝα and β in such a
way that Σ ϵෝ𝑖 is minimal or we can
choose the ෝα and β in such a way that
Σ ϵො2 is minimal.
• We choose Σ ϵෝ𝑖 2 as in the first case
positive and negatives errors cancel
each other.
Estimating the Parameters –
Ordinary Least Square Method
• We need to minimize 𝜮 𝝐ෝ𝒊 2 =
𝜮 ( 𝒀𝒊 − ෝ 𝒊 )𝟐
𝜶 + 𝜷𝑿 by choosing
appropriate ෝα and β.
• This can be done by differentiating the above
equation partially with respect to ෝα and β
and making the result equal to zero.
• Thus we get two equation in terms of
ෝα and β which we can solve and we get
ෝα and β as below.