History of Regression: Dr. Deepak Mehta Associate Professor Ait Cse
History of Regression: Dr. Deepak Mehta Associate Professor Ait Cse
Deepak Mehta
Associate Professor
AIT CSE
History of Regression
• T H E T E R M R E G R E S S I O N WA S I N T R O D U C E D B Y F R A N C I S G A LT O N I N 1 8 8 6 I N H I S R E S E A R C H PA P E R
“ FA M I LY L I K E N E S S I N S TAT U R E ”
• H E O B S E RV E D T H AT TA L L PA R E N T S H AV E TA L L S O N S A N D S H O R T PA R E N T S H AV E S H O R T S O N S
• H E C A M E U P W I T H T H E V I E W T H AT T H E H E I G H T O F T H E S O N S R E G R E S S “ R E V E R T B A C K ” T O WA R D S
T H E AV E R A G E H E I G H T O F T H E P O P U L AT I O N
History of Regression
Galton’s law of universal regression was confirmed by his friend Karl Pearson, who collected data
from more than 1000 families.
He found that the average height of the sons of tall father is less than the their father’s height
and average height of the sons of short father is more than their fathers’ height.
Research paper of Karl Pearson
“On law of inheritance, biometrika 1903
Modern Regression
It investigate the dependence of one variable, conventionally called dependent variable, on one
or more variables called independent variable and provide an equation to be used for estimating
or predicting the average of the dependent variable from the known values of independent
variable.
Modern Regression
A variable whose variation we try to explain is dependent variable while the independent
variable is a variable that is used to explain the variation in the dependent variable.
When dependency of variable is studied upon one variable it is called simple linear regression or
two variable regression
While dependency upon two or more variable is studied in multiple regression
Deterministic vs Probabilistic
If there is a exact relationship b.w. variables s.t knowing value of one variable we can have
unique value of other, such relation is known as Deterministic relationship or mathematical
relation.
e.g. consider y=a + bx. Substituting value of x we can have unique value of y
Example of such relationship is F=32+(9/5)C, relationship b.w Fahrenheit and Celsius.
Area of circle pi*r*r
Deterministic vs Probabilistic
If relationship b.w. variable is not exct, we cannot have precisely value of one variable by putting
the value of the other, such relationship is known as probabilistic or stochastic or statistical
relationship
e.g. consider y=a + bX+e. substituting value of x we cannot have unique value of y, until we know
e. where e is unknown random error.
For ex. Consider relationship b.w. weight and height of person . All person with same height will
not have same weight.
In deterministic model all the point of order pairing of two variable (x,y) fall on the line of
relationship.
Dependent Variable
Whereas in probabilistic model all the points of order pairing of two variable (x,y) donot fall on
the line of relationship, there must be scattering around the line of relationship
Dependent variable is random variable also called
◦ Explained variable
◦ Predictand
◦ Regressand
◦ Response
◦ Endogenous
◦ Outcome
◦ Controlled variable
Independent variable
The independent variable is fixed variable also called
◦ Explanatory variable
◦ Predictor
◦ Regressor
◦ Stimulus
◦ Exogenous
◦ Covariate
◦ Control variable
Assumption of Linear Regression
Linear Relationship
Very low /no multicollinearity
No heteroscedastic
No autocorrelation b.w the errors
Normal distribution of errors
All observation are independent
Regression Analysis
•It shows the significant relationship b.w. label( dependent variable) and the features
(independent variables)
•It indicate the strength of impact of multiple independent variables on a dependent variable
•Regression model is used to predict the continuous value
•LR establishes a relationship b.w. dependent variable (y) and one or more independent
variables(X) using best fit straight line (also known as regression line)