0% found this document useful (0 votes)
50 views11 pages

DA Unit-3

Data analytics unit 3 notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views11 pages

DA Unit-3

Data analytics unit 3 notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

BLUE Property assumptions

 The Gauss Markov theorem tells us that if a certain set of assumptions are met, the
ordinary least squares estimate for regression coefficients gives you the Best Linear
Unbiased Estimate (BLUE) possible.

 There are five Gauss Markov assumptions (also called conditions):

 Linearity:
o The parameters we are estimating using the OLS method must be themselves
linear.
 Random:
o Our data must have been randomly sampled from the population.
 Non-Collinearity:
o The regressors being calculated aren’t perfectly correlated with each other.
 Exogeneity:
o The regressors aren’t correlated with the error term.
 Homoscedasticity:
o No matter what the values of our regressors might be, the error of the variance is
constant.

Purpose of the Assumptions


 The Gauss Markov assumptions guarantee the validity of ordinary least squares for
estimating regression coefficients.

 Checking how well our data matches these assumptions is an important part of estimating
regression coefficients.

 When you know where these conditions are violated, you may be able to plan ways to
change your experiment setup to help your situation fit the ideal Gauss Markov situation
more closely.

 In practice, the Gauss Markov assumptions are rarely all met perfectly, but they are still
useful as a benchmark, and because they show us what ‘ideal’ conditions would be.

 They also allow us to pinpoint problem areas that might cause our estimated regression
coefficients to be inaccurate or even unusable.
The Gauss-Markov Assumptions in Algebra
 We can summarize the Gauss-Markov Assumptions succinctly in algebra, by saying that
a linear regression model represented by

 and generated by the ordinary least squares estimate is the best linear unbiased estimate
(BLUE) possible if

 The first of these assumptions can be read as “The expected value of the error term is
zero.”. The second assumption is collinearity, the third is exogeneity, and the fourth is
homoscedasticity.
Regression Concepts

Regression

 It is a Predictive modeling technique where the target variable to be estimated is


continuous.

Examples of applications of regression

 predicting a stock market index using other economic indicators


 forecasting the amount of precipitation in a region based on characteristics of the jet
stream
 projecting the total sales of a company based on the amount spent for advertising
 estimating the age of a fossil according to the amount of carbon-14 left in the organic
material.

 Let D denote a data set that contains N observations,

 Each xi corresponds to the set of attributes of the ith observation (known as explanatory
variables) and yi corresponds to the target (or response) variable.
 The explanatory attributes of a regression task can be either discrete or continuous.

Regression (Definition)

 Regression is the task of learning a target function f that maps each attribute set x into a
continuous-valued output y.

The goal of regression

 To find a target function that can fit the input data with minimum error.
 The error function for a regression task can be expressed in terms of the sum of absolute
or squared error:
Simple Linear Regression

 Consider the physiological data shown in Figure D.1.


 The data corresponds to measurements of heat flux and skin temperature of a person
during sleep.
 Suppose we are interested in predicting the skin temperature of a person based on the
heat flux measurements generated by a heat sensor.
 The two-dimensional scatter plot shows that there is a strong linear relationship between
the two variables.
Least Square Estimation or Least Square Method

 Suppose we wish to fit the following linear model to the observed data:

 where w0 and w1 are parameters of the model and are called the regression coefficients.
 A standard approach for doing this is to apply the method of least squares, which
attempts to find the parameters (w0,w1) that minimize the sum of the squared error

 which is also known as the residual sum of squares.


 This optimization problem can be solved by taking the partial derivative of E with respect
to w0 and w1, setting them to zero, and solving the corresponding system of linear
equations.

 These equations can be summarized by the following matrix equation' which is also
known as the normal equation:

 Since

 the normal equations can be solved to obtain the following estimates for the parameters.
 Thus, the linear model that best fits the data in terms of minimizing the SSE is

 Figure D.2 shows the line corresponding to this model.

 We can show that the general solution to the normal equations given in D.6 can be
expressed as follow:
 Thus, linear model that results in the minimum squared error is given by

 In summary, the least squares method is a systematic approach to fit a linear model to the
response variable g by minimizing the squared error between the true and estimated value
of g.
 Although the model is relatively simple, it seems to provide a reasonably accurate
approximation because a linear model is the first-order Taylor series approximation for
any function with continuous derivatives.
Logistic Regression

Logistic regression, or Logit regression, or Logit model

o is a regression model where the dependent variable (DV) is categorical.


o was developed by statistician David Cox in 1958.

 The response variable Y has been regarded as a continuous quantitative variable.


 There are situations, however, where the response variable is qualitative.
 The predictor variables, however, have been both quantitative, as well as qualitative.
 Indicator variables fall into the second category.

 Consider a procedure in which individuals are selected on the basis of their scores in a
battery of tests.
 After five years the candidates are classified as "good" or "poor.”
 We are interested in examining the ability of the tests to predict the job performance of
the candidates.
 Here the response variable, performance, is dichotomous.
 We can code "good" as 1 and "poor" as 0, for example.
 The predictor variables are the scores in the tests.
 In a study to determine the risk factors for cancer, health records of several people were
studied.
 Data were collected on several variables, such as age, gender, smoking, diet, and the
family's medical history.
 The response variable was the person had cancer (Y = 1) or did not have cancer (Y = 0).

 The relationship between the probability π and X can often be represented by a logistic
response function.
 It resembles an S-shaped curve.
 The probability π initially increases slowly with increase in X, and then the increase
accelerates, finally stabilizes, but does not increase beyond 1.
 Intuitively this makes sense.
 Consider the probability of a questionnaire being returned as a function of cash reward,
or the probability of passing a test as a function of the time put in studying for it.
 The shape of the S-curve can be reproduced if we model the probabilities as follows:

 A sigmoid function is a bounded differentiable real function that is defined for all real
input values and has a positive derivative at each point.

 It has an “S” shape. It is defined by below function:


 The process of linearization of logistic regression function is called Logit
Transformation.

 Modeling the response probabilities by the logistic distribution and estimating the
parameters of the model given below constitutes fitting a logistic regression.
 In logistic regression the fitting is carried out by working with the logits.
 The Logit transformation produces a model that is linear in the parameters.
 The method of estimation used is the maximum likelihood method.
 The maximum likelihood estimates are obtained numerically, using an iterative
procedure.

OLS:

 The ordinary least squares, or OLS, can also be called the linear least squares.
 This is a method for approximately determining the unknown parameters located in a
linear regression model.
 According to books of statistics and other online sources, the ordinary least squares is
obtained by minimizing the total of squared vertical distances between the observed
responses within the dataset and the responses predicted by the linear approximation.
 Through a simple formula, you can express the resulting estimator, especially the single
regressor, located on the right-hand side of the linear regression model.
 For example, you have a set of equations which consists of several equations that have
unknown parameters.
 You may use the ordinary least squares method because this is the most standard
approach in finding the approximate solution to your overly determined systems.
 In other words, it is your overall solution in minimizing the sum of the squares of errors
in your equation.
 Data fitting can be your most suited application. Online sources have stated that the data
that best fits the ordinary least squares minimizes the sum of squared residuals.
 “Residual” is “the difference between an observed value and the fitted value provided by
a model.”
Maximum likelihood estimation, or MLE,

 is a method used in estimating the parameters of a statistical model, and for fitting a
statistical model to data.
 If you want to find the height measurement of every basketball player in a specific
location, you can use the maximum likelihood estimation.
 Normally, you would encounter problems such as cost and time constraints.
 If you could not afford to measure all of the basketball players’ heights, the maximum
likelihood estimation would be very handy.
 Using the maximum likelihood estimation, you can estimate the mean and variance of the
height of your subjects.
 The MLE would set the mean and variance as parameters in determining the specific
parametric values in a given model.

Multinomial Logistic Regression

 We have n independent observations with p explanatory variables.


 The qualitative response variable has k categories.
 To construct the logits in the multinomial case one of the categories is considered the
base level and all the logits are constructed relative to it. Any category can be taken as the
base level.
 We will take category k as the base level in our description of the method.
 Since there is no ordering, it is apparent that any category may be labeled k. Let 7rj
denote the multinomial probability of an observation falling in the jth category.
 We want to find the relationship between this probability and the p explanatory variables,
Xl, X 2 , ... ,Xp. The multiple logistic regression model then is

 Since all the 7r'S add to unity, this reduces to

 For j = 1,2,···, (k - 1). The model parameters are estimated by the method of maximum
likelihood. Statistical software is available to do this fitting.

You might also like