Regression Modelling
Regression Modelling
The above graph presents the linear relationship between the output(y) variable
and predictor(X) variables. The blue line is referred to as the best fit straight
line. Based on the given data points, we attempt to plot a line that fits the points
the best.
Linear Regression
Calculate best-fit line linear regression uses a traditional slope-intercept form which is
given below,
Yi = β0 + β1Xi
Where Yi = Dependent variable to the given value of Independent variable.
β0 = constant/Intercept the predicted value of y when x is 0.
β1 = Slope or regression coefficient (how much we except y to change as x increase).
Xi = Independent variable (The variable we expect influencing the dependent variable
y).
This algorithm explains the linear relationship between the dependent(output) variable
y and the independent(predictor) variable X using a straight line.
But how the linear regression finds out which is the best fit line?
The goal of the linear regression algorithm is to get the best values for B0 and B1 to find
the best fit line. The best fit line is a line that has the least error which means the error
between predicted values and actual values should be minimum.
You can use the simple linear regression when you want to know:
1. How strong the relationship between two variables.
2. The value of dependent variable at a certain value of independent variable.
Assumptions Linear Regression
1. Homogeneity of Variance: The size of the error in our prediction doesn’t change
significantly across the value of independent variable.
2. Independence of observations: the observations in the dataset were collected using
statistically valid sampling methods, and there is no hidden relationships among
variables.
3. Normality: The data follows a normal distribution.
( y y)
2
p
This is R2=
( y y)
2
Linear Regression Solved Numerical
Linear Regression Solved Numerical
Linear Regression Solved Numerical
Linear Regression Solved Numerical
Multiple Linear Regression
Multiple linear regression (MLR), also known simply as multiple
regression, is a statistical technique that uses several explanatory variables
to predict the outcome of a response variable. The goal of multiple linear
regression is to model the linear relationship between the explanatory
(independent) variables and response (dependent) variables.
yi=β0+β1xi1+β2xi2+...+βpxip+ϵ
where, for i=n observations:
yi=dependent variable
xi=explanatory variables Independent Variable
β0=y intercept (constant term)
βp=slope coefficients for each explanatory variable
ϵ=the model’s error term (also known as the residuals)
As the number of independent variable increases to 2 our graph become
3D. The added 3rd dimension represents other independent variable.
Multiple Linear Regression Numerical
Multiple Linear Regression Numerical
Multiple Linear Regression Numerical
Multiple Linear Regression Numerical
Multiple Linear Regression Numerical
Multiple Linear Regression
Numerical
Multiple Linear Regression Numerical
Logistic Regression
Logistic Regression is a “Supervised machine learning” algorithm that can be
used to model the probability of a certain class or event. It is used when the data is
linearly separable and the outcome is binary in nature.
That means Logistic regression is usually used for Binary classification problems.
Logistic Regression =