6 ML Updated
6 ML Updated
Machine Learning
with Python
Linear Regression in Machine
Learning
Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis.
Linear regression algorithm shows a linear relationship between a dependent
(y) and one or more independent (y) variables, hence called as linear regression.
The linear regression model provides a sloped straight line representing the
relationship between the variables.
Linear Regression in Machine
Learning
y= a0+a1x+ ε
Here,
When working with linear regression, our main goal is to find the best fit line
that means the error between predicted values and actual values should be
minimized. The best fit line will have the least error.
The different values for weights or the coefficient of lines (a0, a1) gives a
different line of regression, so we need to calculate the best values for a0 and
a1 to find the best fit line, so to calculate this we use cost function.
Cost function-
The different values for weights or coefficient of lines (a0, a1) gives the different
line of regression, and the cost function is used to estimate the values of the
coefficient for the best fit line.
Cost function optimizes the regression coefficients or weights. It measures how
a linear regression model is performing.
We can use the cost function to find the accuracy of the mapping function,
which maps the input variable to the output variable. This mapping function is
also known as Hypothesis function.
Cost function-
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average
of squared error occurred between the predicted values and actual values. It can be written as:
For the above linear equation, MSE can be calculated as:
Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Cost function-
Residuals: The distance between the actual value and predicted values is
called residual.
If the observed points are far from the regression line, then the residual will be
high, and so cost function will high.
If the scatter points are close to the regression line, then the residual will be
small and hence the cost function.
Model Performance:
The Goodness of fit determines how the line of regression fits the set of
observations. The process of finding the best model out of various models is
called optimization.
Assumptions of Linear
Regression
These are some formal checks while building a Linear Regression model, which
ensures to get the best possible result from the given dataset.
Linear relationship between the features and target:
Linear regression assumes the linear relationship between the dependent and
independent variables.
Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables. Due to
multicollinearity, it may difficult to find the true relationship between the predictors
and target variables. Or we can say, it is difficult to determine which predictor
variable is affecting the target variable and which is not. So, the model assumes
either little or no multicollinearity between the features or independent variables.
Assumptions of Linear
Regression
Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the values of
independent variables. With homoscedasticity, there should be no clear pattern distribution of
data in the scatter plot.
Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal distribution pattern. If
error terms are not normally distributed, then confidence intervals will become either too wide
or too narrow, which may cause difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without any deviation,
which means the error is normally distributed.
No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there will be any
correlation in the error term, then it will drastically reduce the accuracy of the model.
Autocorrelation usually occurs if there is a dependency between residual errors.
Logistic Regression
in Machine
Learning
Logistic Regression in Machine
Learning
On the basis of the categories, Logistic Regression can be classified into three
types:
Binomial: In binomial Logistic regression, there can be only two possible types
of the dependent variables, such as 0 or 1, Pass or Fail, etc.
Multinomial: In multinomial Logistic regression, there can be 3 or more
possible unordered types of the dependent variable, such as "cat", "dogs", or
"sheep"
Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered
types of dependent variables, such as "low", "Medium", or "High"
Thank You!!