Unit III Regression

The document discusses the Gauss-Markov theorem and its assumptions necessary for Ordinary Least Squares (OLS) to provide the best linear unbiased estimates in regression analysis. It explains the concept of regression, including linear and logistic regression, and highlights the importance of checking assumptions for accurate coefficient estimation. Additionally, it covers methods like Maximum Likelihood Estimation (MLE) and Multinomial Logistic Regression for modeling relationships among variables.

Uploaded by

Shaik Javeed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views24 pages

Unit III Regression

Uploaded by

Shaik Javeed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 24

UNIT-III

• Regression
• The Gauss-Markov theorem is based on five assumptions,
or conditions, that must be met for the Ordinary Least
Squares (OLS) estimate to be the best linear unbiased
estimate (BLUE):
• Linearity: The parameters being estimated must be
BLUE linear.

Property • Random sampling: The data must be randomly sampled

from the population.
assumptio • No perfect multicollinearity: The independent variables
should not be perfectly correlated with each other.
ns • Exogeneity: The regressors should not be correlated with
the error term.
• Homoscedasticity: The error variance should be constant
regardless of the values of the regressors.
• Not to be confused with Gauss–Markov process. "BLUE"
redirects here. For queue management algorithm
• In statistics, the Gauss–Markov theorem, states that the
ordinary least squares (OLS) estimator has the lowest
sampling variance within the class of linear unbiased
Gauss– estimators, if the errors in the linear regression model are
uncorrelated, have equal variances and expectation value
Markov of zero.[2] The errors do not need to be normal, nor do
they need to be independent and identically distributed
theorem (only uncorrelated with mean zero and homoscedastic
with finite variance). The requirement that the estimator
be unbiased cannot be dropped, since biased estimators
exist with lower variance. See, for example, the
James–Stein estimator (which also drops linearity),
ridge regression, or simply any degenerate estimator..
• The Gauss Markov assumptions guarantee the
validity of ordinary least squares for estimating
regression coefficients.
• Checking how well our data matches these
assumptions is an important part of estimating
Purpose of regression coefficients.
• When you know where these conditions are
the violated, you may be able to plan ways to change
your experiment setup to help your situation fit
Assumptio the ideal Gauss Markov situation more closely.
• In practice, the Gauss Markov assumptions are
ns rarely all met perfectly, but they are still useful as
a benchmark, and because they show us what
‘ideal’ conditions would be.
• They also allow us to pinpoint problem areas that
might cause our estimated regression coefficients
to be inaccurate or even unusable.
The Gauss-
Markov
Assumptio
ns in
Algebra
Suppose we have, in
matrix notation, the
linear relationship
What is
Regression ?

• A statistical procedure used to find

relationships among a set of
variables
• In regression analysis, there is a
dependent variable, which is the
one you are trying to explain, and
one or more independent variables
that are related to it.
• You can express the relationship as a
linear equation, such as:
• y = a + bx
• •y is the dependent variable
• • x is the independent variable
• • a is a constant
• • b is the slope of the line
• • For every increase of 1 in x, y changes by
an amount equal to b
• Some relationships are perfectly linear and fit
this equation exactly
Regression
• It is a Predictive modeling technique
where the target variable to be estimated
is continuous
Examples of applications of regression

Forecasting the
Predicting a stock amount of
market index using precipitation in a
other economic region based on
indicators characteristics of the
jet stream

Estimating the age of

Projecting the total
a fossil according to
sales of a company
the amount of carbon-
based on the amount
14 left in the organic
spent for advertising
material.
Least Square
Estimation or
Least Square
Method
Regression (Definition)
• Regression is the task of learning a target function f that maps each attribute set x into a
continuous-valued output y.
The goal of regression:
To find a target function that can fit the input data with minimum error.
• The error function for a regression task can be expressed in terms of the sum of absolute or
squared error
Simple Linear Regression (example)
• Consider the physiological data shown in Figure D.1.
• The data corresponds to measurements of heat flux and skin temperature of a person during sleep.
• Suppose we are interested in predicting the skin temperature of a person based on the heat flux measurements
generated by a heat sensor.
• The two-dimensional scatter plot shows that there is a strong linear relationship between the two variables.
Logistic Regression

• Logistic regression, or Logit regression, or Logit model

• Is a regression model where the dependent variable (DV) is categorical.
• was developed by statistician David Cox in 1958.
• The response variable Y has been regarded as a continuous quantitative variable.
• There are situations, however, where the response variable is qualitative.
• The predictor variables, however, have been both quantitative, as well as qualitative.
• Indicator variables fall into the second category.
Example:
• Consider a procedure in which individuals are selected on the basis of their scores in a battery of tests.
• After five years the candidates are classified as "good" or "poor.”
• We are interested in examining the ability of the tests to predict the job performance of the candidates.
• Here the response variable, performance, is dichotomous.
• We can code "good" as 1 and "poor" as 0, for example.
• The predictor variables are the scores in the tests.
• In a study to determine the risk factors for cancer, health records of several people were studied.
• Data were collected on several variables, such as age, gender, smoking, diet, and the family's medical
history.
• The response variable was the person had cancer (Y = 1) or did not have cancer (Y = 0).
• The relationship between the probability π and X can often be represented by a logistic response function.
• It resembles an S-shaped curve.
• The probability π initially increases slowly with increase in X, and then the increase accelerates, finally
stabilizes, but does not increase beyond 1.
• Intuitively this makes sense.
• Consider the probability of a questionnaire being returned as a function of cash reward, or the probability of
passing a test as a function of the time put in studying for it.
• The shape of the S-curve can be reproduced if we model the probabilities as follows:
• A sigmoid function is a bounded differentiable real function that is defined for all real input values and has a
positive derivative at each point.
• It has an “S” shape. It is defined by below function:

• The process of linearization of logistic regression function is called Logit Transformation.

The process of linearization of logistic regression function is called Logit Transformation.

 Modeling the response probabilities by the logistic distribution and estimating the parameters of the
model given below constitutes fitting a logistic regression.
 In logistic regression the fitting is carried out by working with the logits.
 The Logit transformation produces a model that is linear in the parameters.
 The method of estimation used is the maximum likelihood method.
 The maximum likelihood estimates are obtained numerically, using an iterative procedure
OLS: Ordinary Least Squares
 The ordinary least squares, or OLS, can also be called the linear least squares.
 This is a method for approximately determining the unknown parameters located in a linear regression model.
 According to books of statistics and other online sources, the ordinary least squares is obtained by minimizing
the total of squared vertical distances between the observed responses within the dataset and the responses
predicted by the linear approximation.
 Through a simple formula, you can express the resulting estimator, especially the single regressor, located on
the right-hand side of the linear regression model.
 For example, you have a set of equations which consists of several equations that have unknown parameters.
 You may use the ordinary least squares method because this is the most standard approach in finding the
approximate solution to your overly determined systems.
 In other words, it is your overall solution in minimizing the sum of the squares of errors in your equation.
 Data fitting can be your most suited application. Online sources have stated that the data that best fits the
ordinary least squares minimizes the sum of squared residuals.
 “Residual” is “the difference between an observed value and the fitted value provided by a model.”
Maximum likelihood estimation, or MLE

 MLE is a method used in estimating the parameters of a statistical model, and for fitting a statistical
model to data.
 If you want to find the height measurement of every basketball player in a specific location, you can
use the maximum likelihood estimation.
 Normally, you would encounter problems such as cost and time constraints.
 If you could not afford to measure all of the basketball players’ heights, the maximum likelihood
estimation would be very handy.
 Using the maximum likelihood estimation, you can estimate the mean and variance of the height of your
subjects.
• The MLE would set the mean and variance as parameters in determining the specific parametric values
in a given model.
Multinomial Logistic Regression

 We have n independent observations with p explanatory variables.

 The qualitative response variable has k categories.
 To construct the logits in the multinomial case one of the categories is considered the base level and all
the logits are constructed relative to it. Any category can be taken as the base level.
 We will take category k as the base level in our description of the method.
 Since there is no ordering, it is apparent that any category may be labeled k. Let 7rjdenote the multinomial
probability of an observation falling in the jth category.
 We want to find the relationship between this probability and the p explanatory variables, Xl, X 2 , ... ,Xp.
The multiple logistic regression model then is
• Since all the 7r'S add to unity, this reduces to

For j = 1,2,···, (k - 1). The model parameters are estimated by the method of maximum likelihood. Statistical
software is available to do this fitting.

Kcse 2024 Math PP1 QS - Code 01
97% (31)
Kcse 2024 Math PP1 QS - Code 01
16 pages
Princeton Review AP Physics 1 (Scanned Version) (The Princeton Review)
No ratings yet
Princeton Review AP Physics 1 (Scanned Version) (The Princeton Review)
431 pages
QCD-F-18 TDC - 165, Rev.00, Dtd. - 14.04.2022
No ratings yet
QCD-F-18 TDC - 165, Rev.00, Dtd. - 14.04.2022
2 pages
Partial Differential Equations Analytical and Numerical Methods, Second Edition
No ratings yet
Partial Differential Equations Analytical and Numerical Methods, Second Edition
666 pages
Data Analytics Iii Unit
No ratings yet
Data Analytics Iii Unit
8 pages
DA Unit-3
No ratings yet
DA Unit-3
11 pages
Da Unit-Iii
No ratings yet
Da Unit-Iii
14 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
ML - LAB - BE CSE (DS) Final
No ratings yet
ML - LAB - BE CSE (DS) Final
110 pages
Unit III
No ratings yet
Unit III
18 pages
UNIT 2 Machine Learning BCAI601BCDS062
No ratings yet
UNIT 2 Machine Learning BCAI601BCDS062
244 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
89 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
DA Unit 3 Trio
No ratings yet
DA Unit 3 Trio
13 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Unit 2 - ML - SRM
No ratings yet
Unit 2 - ML - SRM
66 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Regression Model and Its Applications
100% (1)
Regression Model and Its Applications
30 pages
Regression Kann Ur 14
No ratings yet
Regression Kann Ur 14
43 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Unit 3 1
No ratings yet
Unit 3 1
41 pages
DA Unit-3
No ratings yet
DA Unit-3
13 pages
Module 3 EDA
No ratings yet
Module 3 EDA
14 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
55 pages
Unit-2: Machine Learning Techniques (KCS-055) Module-2
No ratings yet
Unit-2: Machine Learning Techniques (KCS-055) Module-2
199 pages
Unit-Iii-1 1
No ratings yet
Unit-Iii-1 1
31 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Daunit 3
No ratings yet
Daunit 3
32 pages
Mungadze Linear
No ratings yet
Mungadze Linear
21 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
Module05 Notes
No ratings yet
Module05 Notes
19 pages
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
No ratings yet
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
19 pages
Regression: Dr. Agustinus Suryantoro, M.S
No ratings yet
Regression: Dr. Agustinus Suryantoro, M.S
31 pages
Regression
No ratings yet
Regression
60 pages
Data Analysis
No ratings yet
Data Analysis
70 pages
Regression
No ratings yet
Regression
14 pages
Least Square Regression
No ratings yet
Least Square Regression
13 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Linear Regression
No ratings yet
Linear Regression
19 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Simple Regression
No ratings yet
Simple Regression
45 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
STAT22209 - Chapter 03-Multiple Regression - 2022
No ratings yet
STAT22209 - Chapter 03-Multiple Regression - 2022
41 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Unit Iii Da
No ratings yet
Unit Iii Da
46 pages
Linear Models
No ratings yet
Linear Models
92 pages
Lecture2 241007 162001
No ratings yet
Lecture2 241007 162001
11 pages
Unit III
No ratings yet
Unit III
13 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
Ra Web
No ratings yet
Ra Web
70 pages
1-Chap II Econometrics ABC DR Mitiku
No ratings yet
1-Chap II Econometrics ABC DR Mitiku
80 pages
Control System Notes
No ratings yet
Control System Notes
127 pages
Ch-9 Light - Refraction
No ratings yet
Ch-9 Light - Refraction
13 pages
Inter Department Transfer Form New
No ratings yet
Inter Department Transfer Form New
2 pages
18.4 Worksheets
No ratings yet
18.4 Worksheets
3 pages
BS en 1096 4 2018 2022 08 18 04 11 19 Am
No ratings yet
BS en 1096 4 2018 2022 08 18 04 11 19 Am
38 pages
1PH0 1H MSC 20210211
No ratings yet
1PH0 1H MSC 20210211
30 pages
Concentric Resonator
No ratings yet
Concentric Resonator
6 pages
Alternating Current
No ratings yet
Alternating Current
14 pages
State Model Syllabus For Under Graduate Course in Statistics (Bachelor of Arts Examination)
No ratings yet
State Model Syllabus For Under Graduate Course in Statistics (Bachelor of Arts Examination)
33 pages
Exercise 6.2
100% (2)
Exercise 6.2
2 pages
DSC and Dilatometry-P - 013333
No ratings yet
DSC and Dilatometry-P - 013333
63 pages
Lect No 1 Nuclear Imaging FRCR Physics
No ratings yet
Lect No 1 Nuclear Imaging FRCR Physics
22 pages
(I) All Questions Are Compulsory. (Ii) There Are Total 16 Questions. Questions 1 To 5 Carry One Mark Each, Questions 6
No ratings yet
(I) All Questions Are Compulsory. (Ii) There Are Total 16 Questions. Questions 1 To 5 Carry One Mark Each, Questions 6
1 page
SP3457-6 - PLP Compression Dead-End - II Color
100% (1)
SP3457-6 - PLP Compression Dead-End - II Color
12 pages
HAND BOOK - 1st - SEM
No ratings yet
HAND BOOK - 1st - SEM
25 pages
Midterm2 Cheatsheet Annotated
No ratings yet
Midterm2 Cheatsheet Annotated
3 pages
Lakshminarayan Hazra - Foundations of Optical System Analysis and Design-CRC Press (2021)
No ratings yet
Lakshminarayan Hazra - Foundations of Optical System Analysis and Design-CRC Press (2021)
775 pages
Document ISO 01106-1
No ratings yet
Document ISO 01106-1
12 pages
Behavior of Reinforced Soil Slope by Finite Element Methods IJERTV8IS020080
No ratings yet
Behavior of Reinforced Soil Slope by Finite Element Methods IJERTV8IS020080
5 pages
Makeup Exam Dec 23
No ratings yet
Makeup Exam Dec 23
30 pages
Model Test Civil MPPSC
No ratings yet
Model Test Civil MPPSC
11 pages
Types of Universes
No ratings yet
Types of Universes
1 page
Electromagnetic Acoustic Waves Bioengineering Applications
No ratings yet
Electromagnetic Acoustic Waves Bioengineering Applications
8 pages
Exercise of Design
No ratings yet
Exercise of Design
19 pages
Hall Effect
No ratings yet
Hall Effect
16 pages
Hitank Loft Tank
No ratings yet
Hitank Loft Tank
12 pages

Unit III Regression

Uploaded by

Unit III Regression

Uploaded by

UNIT-III

Property • Random sampling: The data must be randomly sampled

• A statistical procedure used to find

Estimating the age of

• Logistic regression, or Logit regression, or Logit model

• The process of linearization of logistic regression function is called Logit Transformation.

The process of linearization of logistic regression function is called Logit Transformation.

 We have n independent observations with p explanatory variables.

You might also like