Chapter Two Part One
Chapter Two Part One
Covariance
is a statistical measure that indicates the extent to which two random variables change
together. Specifically, it measures whether increases in one variable correspond with
increases or decreases in another variable. Here‟s a breakdown of the concept:
Positive Covariance: If the covariance between two variables is positive, it means that as
one variable increases, the other variable tends to increase as well. Conversely, when one
variable decreases, the other tends to decrease.
Correlation
is a statistical measure that describes the strength and direction of a linear relationship
between two random variables. Unlike covariance, correlation is standardized, meaning its
value ranges from -1 to 1, making it easier to interpret. Here are the key points about
correlation:
Types of Correlation
1. Positive Correlation: A positive correlation (close to +1) indicates that as one variable
increases, the other variable also tends to increase. For example, height and weight might
have a positive correlation in a population.
2. Negative Correlation: A negative correlation (close to -1) indicates that as one variable
increases, the other tends to decrease. For example, the amount of time spent watching TV
and academic performance might have a negative correlation in students.
Key Concepts
1. Dependent Variable: The variable you are trying to predict or explain, often denoted as Y.
2. Independent Variable(s): The variable(s) used to make predictions or explain the dependent
variable, often denoted as X (or X1,X2,…,Xn in the case of multiple predictors).
3. Regression Coefficient: Represents the change in the dependent variable for a one-unit change in an
independent variable, holding other variables constant.
Simple linear regression function
Simple linear regression function is the simplest form of a regression analysis having a
single explanatory variable related in linear form.
# Simple: SLR consists only two variables (one dependent and one independent variable)
- Example: …………………………….Simple
- ……..multiple
# Linear
- linear” regression will always mean a regression that is linear in the parameters;
the β’s (that is, the parameters are raised to the first power only).
………linear
…….Non linear
Where:
- : stock price - : Dividend per share
# General form of SLR
𝒀 = 𝜷 𝜷 + 𝒖
Variation in Y = Systematic variation + Random variation
Where,
Y is dependent variable 𝒖 is error (disturbance) term
X is independent variable is number of cases or observations
- The disturbance term ui is a surrogate for all those variables that are omitted from the model but
that collectively affect Y.
- Error term is a term added to a regression model to capture all the variation in Y that can‟t be
explained by Xs.
It is a proxy for all variables that are not included in the regression model, but may
collectively affect Y.
a model is a simplification of reality. It is not always possible to include all relevant variables in a
functional form. For instance, we may construct a model relating ROE and Capital structure. But
ROE is influenced not only by Capital structure: operating leverage, size of assets and several
other variables also influence it. The omission of these variables from the model introduces an
error. In addition we may omit variable due to the following factors.
Lack of data and limited knowledge: we may not have information about variables
3. Principle of parsimony: we would like to keep our regression model as simple as possible. If we
can explain the behavior of Y “substantially” with two or three explanatory variables and if our
theory is not strong enough to suggest what other variables might be included, why introduce
more variables? Let ui represent all other variables.
4. Errors of Measurement: errors of measurements of variables which are inevitable due to the
method of collecting and processing statistical information.
# Assumptions of Classical Linear Regression Model
The classicals‟ made important assumption in their analysis of regression .The most imporntant of
these assumptions are discussed below.
The classicals assumed that the model should be linear in the parameters regardless of whether the
explanatory and the dependent variables are linear or not. This is because if the parameters are non-
linear it is difficult to estimate them since their value is not known.
Assumption 2: The mean value of the random variable(U) in any particular period is
zero
= =
This means that for each value of x, the random variable(u) may assume various values, some
greater than zero and some smaller than zero, but if we considered all the possible and negative
values of u, for any given value of X, they would have on average value equal to zero. In other
words the positive and negative values of u cancel each other.
For all values of X, the u‟s will show the same dispersion around their mean or error terms should be
homoscedastic. Homoscedasticity describes a situation in which the error term (that is, the “noise”
or random disturbance in the relationship between the independent variables and the dependent
variable) is the same across all values of the independent variables.
[ ] …………
Put simply, the variation around the regression line (which is the line of average relationship
between Y and X) is the same across the X values; it neither increases or decreases as X varies.
If this condition is not fulfilled or if the variance of the error terms varies as sample size
changes or as the value of explanatory variables changes, then this leads to
Heteroscedasticity problem.
This means the values of u (for each x) have a bell shaped symmetrical distribution about their zero
mean and constant variance , i.e.
This means the value which the random term assumed in one period does not depend on the
value which it assumed in any other period.
( )
The error terms across observations are NOT correlated with each other
The error term in one time period never affects the error term in the next.
- Covariance and correlation measure the relationship and the dependency between two
variables. Covariance indicates the direction of the linear relationship between variables
while correlation measures both the strength and direction of the linear relationship between
two variables.
Assumption 6: The explanatory variable Xi is fixed in repeated samples.
Each value of Xi does not vary for instance owing to change in sample size. This means the
explanatory variables are non- random.
Assumption 7: zero covariance between 𝒖 (No autocorrelation)
𝒖
No correlation between regressors and error terms.
Assumption 8: X-values in a given sample must not be the same (within a sample)
Assumption9: Randomness of 𝒖 :
The error terms „ ‟ are randomly distributed. is a random real variable. The value which
may assume in any period depends on chance: some may be positive, some may be negative or
some may be zero.
Assumption 10: Explanatory variables should not be perfectly, linearly and/or
highly correlated.
Using explanatory variables which are highly or perfectly correlated in a regression function causes
a biased function or model. It also results in multicollinearity problem.
Assumption 11:The variables are measured without error (the data are error free).
Since wrong data leads to wrong conclusion, it is important to make sure that our data is free from
any type of error.