DA Unit-3
DA Unit-3
The Gauss Markov theorem tells us that if a certain set of assumptions are met, the
ordinary least squares estimate for regression coefficients gives you the Best Linear
Unbiased Estimate (BLUE) possible.
Linearity:
o The parameters we are estimating using the OLS method must be themselves
linear.
Random:
o Our data must have been randomly sampled from the population.
Non-Collinearity:
o The regressors being calculated aren’t perfectly correlated with each other.
Exogeneity:
o The regressors aren’t correlated with the error term.
Homoscedasticity:
o No matter what the values of our regressors might be, the error of the variance is
constant.
Checking how well our data matches these assumptions is an important part of estimating
regression coefficients.
When you know where these conditions are violated, you may be able to plan ways to
change your experiment setup to help your situation fit the ideal Gauss Markov situation
more closely.
In practice, the Gauss Markov assumptions are rarely all met perfectly, but they are still
useful as a benchmark, and because they show us what ‘ideal’ conditions would be.
They also allow us to pinpoint problem areas that might cause our estimated regression
coefficients to be inaccurate or even unusable.
The Gauss-Markov Assumptions in Algebra
We can summarize the Gauss-Markov Assumptions succinctly in algebra, by saying that
a linear regression model represented by
and generated by the ordinary least squares estimate is the best linear unbiased estimate
(BLUE) possible if
The first of these assumptions can be read as “The expected value of the error term is
zero.”. The second assumption is collinearity, the third is exogeneity, and the fourth is
homoscedasticity.
Regression Concepts
Regression
Each xi corresponds to the set of attributes of the ith observation (known as explanatory
variables) and yi corresponds to the target (or response) variable.
The explanatory attributes of a regression task can be either discrete or continuous.
Regression (Definition)
Regression is the task of learning a target function f that maps each attribute set x into a
continuous-valued output y.
To find a target function that can fit the input data with minimum error.
The error function for a regression task can be expressed in terms of the sum of absolute
or squared error:
Simple Linear Regression
Suppose we wish to fit the following linear model to the observed data:
where w0 and w1 are parameters of the model and are called the regression coefficients.
A standard approach for doing this is to apply the method of least squares, which
attempts to find the parameters (w0,w1) that minimize the sum of the squared error
These equations can be summarized by the following matrix equation' which is also
known as the normal equation:
Since
the normal equations can be solved to obtain the following estimates for the parameters.
Thus, the linear model that best fits the data in terms of minimizing the SSE is
We can show that the general solution to the normal equations given in D.6 can be
expressed as follow:
Thus, linear model that results in the minimum squared error is given by
In summary, the least squares method is a systematic approach to fit a linear model to the
response variable g by minimizing the squared error between the true and estimated value
of g.
Although the model is relatively simple, it seems to provide a reasonably accurate
approximation because a linear model is the first-order Taylor series approximation for
any function with continuous derivatives.
Logistic Regression
Consider a procedure in which individuals are selected on the basis of their scores in a
battery of tests.
After five years the candidates are classified as "good" or "poor.”
We are interested in examining the ability of the tests to predict the job performance of
the candidates.
Here the response variable, performance, is dichotomous.
We can code "good" as 1 and "poor" as 0, for example.
The predictor variables are the scores in the tests.
In a study to determine the risk factors for cancer, health records of several people were
studied.
Data were collected on several variables, such as age, gender, smoking, diet, and the
family's medical history.
The response variable was the person had cancer (Y = 1) or did not have cancer (Y = 0).
The relationship between the probability π and X can often be represented by a logistic
response function.
It resembles an S-shaped curve.
The probability π initially increases slowly with increase in X, and then the increase
accelerates, finally stabilizes, but does not increase beyond 1.
Intuitively this makes sense.
Consider the probability of a questionnaire being returned as a function of cash reward,
or the probability of passing a test as a function of the time put in studying for it.
The shape of the S-curve can be reproduced if we model the probabilities as follows:
A sigmoid function is a bounded differentiable real function that is defined for all real
input values and has a positive derivative at each point.
Modeling the response probabilities by the logistic distribution and estimating the
parameters of the model given below constitutes fitting a logistic regression.
In logistic regression the fitting is carried out by working with the logits.
The Logit transformation produces a model that is linear in the parameters.
The method of estimation used is the maximum likelihood method.
The maximum likelihood estimates are obtained numerically, using an iterative
procedure.
OLS:
The ordinary least squares, or OLS, can also be called the linear least squares.
This is a method for approximately determining the unknown parameters located in a
linear regression model.
According to books of statistics and other online sources, the ordinary least squares is
obtained by minimizing the total of squared vertical distances between the observed
responses within the dataset and the responses predicted by the linear approximation.
Through a simple formula, you can express the resulting estimator, especially the single
regressor, located on the right-hand side of the linear regression model.
For example, you have a set of equations which consists of several equations that have
unknown parameters.
You may use the ordinary least squares method because this is the most standard
approach in finding the approximate solution to your overly determined systems.
In other words, it is your overall solution in minimizing the sum of the squares of errors
in your equation.
Data fitting can be your most suited application. Online sources have stated that the data
that best fits the ordinary least squares minimizes the sum of squared residuals.
“Residual” is “the difference between an observed value and the fitted value provided by
a model.”
Maximum likelihood estimation, or MLE,
is a method used in estimating the parameters of a statistical model, and for fitting a
statistical model to data.
If you want to find the height measurement of every basketball player in a specific
location, you can use the maximum likelihood estimation.
Normally, you would encounter problems such as cost and time constraints.
If you could not afford to measure all of the basketball players’ heights, the maximum
likelihood estimation would be very handy.
Using the maximum likelihood estimation, you can estimate the mean and variance of the
height of your subjects.
The MLE would set the mean and variance as parameters in determining the specific
parametric values in a given model.
For j = 1,2,···, (k - 1). The model parameters are estimated by the method of maximum
likelihood. Statistical software is available to do this fitting.