0% found this document useful (0 votes)
20 views23 pages

6 ML Updated

Uploaded by

Dhruv Tyagi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views23 pages

6 ML Updated

Uploaded by

Dhruv Tyagi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Lecture-6

Machine Learning
with Python
Linear Regression in Machine
Learning
Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis.
Linear regression algorithm shows a linear relationship between a dependent
(y) and one or more independent (y) variables, hence called as linear regression.
The linear regression model provides a sloped straight line representing the
relationship between the variables.
Linear Regression in Machine
Learning

y= a0+a1x+ ε
Here,

Y= Dependent Variable (Target Variable)


X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of
freedom)
a1 = Linear regression coefficient (scale factor to each input
value).
ε = random error
The values for x and y variables are training datasets for Linear Regression
model representation.
Linear Regression in Machine
Learning
Linear Regression in Machine
Learning
Finding the best fit line:

When working with linear regression, our main goal is to find the best fit line
that means the error between predicted values and actual values should be
minimized. The best fit line will have the least error.
The different values for weights or the coefficient of lines (a0, a1) gives a
different line of regression, so we need to calculate the best values for a0 and
a1 to find the best fit line, so to calculate this we use cost function.
Cost function-
The different values for weights or coefficient of lines (a0, a1) gives the different
line of regression, and the cost function is used to estimate the values of the
coefficient for the best fit line.
Cost function optimizes the regression coefficients or weights. It measures how
a linear regression model is performing.
We can use the cost function to find the accuracy of the mapping function,
which maps the input variable to the output variable. This mapping function is
also known as Hypothesis function.
Cost function-

For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the average
of squared error occurred between the predicted values and actual values. It can be written as:
For the above linear equation, MSE can be calculated as:

Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
Cost function-

Residuals: The distance between the actual value and predicted values is
called residual.
If the observed points are far from the regression line, then the residual will be
high, and so cost function will high.
If the scatter points are close to the regression line, then the residual will be
small and hence the cost function.
Model Performance:
The Goodness of fit determines how the line of regression fits the set of
observations. The process of finding the best model out of various models is
called optimization.
Assumptions of Linear
Regression

These are some formal checks while building a Linear Regression model, which
ensures to get the best possible result from the given dataset.
Linear relationship between the features and target:
Linear regression assumes the linear relationship between the dependent and
independent variables.
Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent variables. Due to
multicollinearity, it may difficult to find the true relationship between the predictors
and target variables. Or we can say, it is difficult to determine which predictor
variable is affecting the target variable and which is not. So, the model assumes
either little or no multicollinearity between the features or independent variables.
Assumptions of Linear
Regression
Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for all the values of
independent variables. With homoscedasticity, there should be no clear pattern distribution of
data in the scatter plot.
Normal distribution of error terms:
Linear regression assumes that the error term should follow the normal distribution pattern. If
error terms are not normally distributed, then confidence intervals will become either too wide
or too narrow, which may cause difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight line without any deviation,
which means the error is normally distributed.
No autocorrelations:
The linear regression model assumes no autocorrelation in error terms. If there will be any
correlation in the error term, then it will drastically reduce the accuracy of the model.
Autocorrelation usually occurs if there is a dependency between residual errors.
Logistic Regression
in Machine
Learning
Logistic Regression in Machine
Learning

Logistic regression is a supervised machine learning algorithm mainly used for


classification tasks where the goal is to predict the probability that an instance
of belonging to a given class or not.
It is a kind of statistical algorithm, which analyze the relationship between a set
of independent variables and the dependent binary variables.
It is a powerful tool for decision-making.
For example email spam or not.
Logistic Regression in Machine
Learning

It’s referred to as regression because it takes the output of the linear


regression function as input and uses a sigmoid function to estimate the
probability for the given class.
The difference between linear regression and logistic regression is that linear
regression output is the continuous value that can be anything while logistic
regression predicts the probability that an instance belongs to a given class or
not.
**NOTE**
Logistic regression predicts the output of a categorical dependent variable.
Therefore the outcome must be a categorical or discrete value.
It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the
exact value as 0 and 1, it gives the probabilistic values which lie between 0 and
1.
Logistic Regression is much similar to the Linear Regression except that how
they are used.
Linear Regression is used for solving Regression problems, whereas Logistic
regression is used for solving the classification problems.
**NOTE**
In Logistic regression, instead of fitting a regression line, we fit an
“S” shaped logistic function, which predicts two maximum values (0
or 1).
The curve from the logistic function indicates the likelihood of
something such as whether the cells are cancerous or not, a mouse
is obese or not based on its weight, etc.
Logistic Regression is a significant machine learning algorithm
because it has the ability to provide probabilities and classify new
data using continuous and discrete datasets.
Logistic Function (Sigmoid
Function):
sigmoid function
The sigmoid function is a mathematical function used to map the predicted values to
probabilities.
It maps any real value into another value within a range of 0 and 1.
The value of the logistic regression must be between 0 and 1, which cannot go beyond this limit,
so it forms a curve like the "S" form. The S-form curve is called the Sigmoid function or the
logistic function.
In logistic regression, we use the concept of the threshold value, which defines the probability of
either 0 or 1. Such as values above the threshold value tends to 1, and a value below the
threshold values tends to 0.
Assumptions for Logistic
Regression:

The dependent variable must be categorical in nature.


The independent variable should not have multi-collinearity.
Type of Logistic Regression:

On the basis of the categories, Logistic Regression can be classified into three
types:
Binomial: In binomial Logistic regression, there can be only two possible types
of the dependent variables, such as 0 or 1, Pass or Fail, etc.
Multinomial: In multinomial Logistic regression, there can be 3 or more
possible unordered types of the dependent variable, such as "cat", "dogs", or
"sheep"
Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered
types of dependent variables, such as "low", "Medium", or "High"
Thank You!!

You might also like