0% found this document useful (0 votes)
24 views

(Unit-04) Part-01 - ML Algo

This document provides an overview of machine learning algorithms, specifically regression. It defines regression as constructing a model to predict dependent variables from independent variables. Regression uses continuous output variables like salary or weight. Examples of regression applications given include sales forecasting, price analysis, and risk assessment. Simple linear regression fits a linear relationship between one dependent and independent variable. It finds the slope and y-intercept that minimize the sum of squared errors between the predicted and actual values using the least squares method. The goodness of fit is measured by R-squared, which indicates the percentage of variation explained by the model. Logistic regression is also introduced as a technique to predict categorical dependent variables.

Uploaded by

suma varanasi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views

(Unit-04) Part-01 - ML Algo

This document provides an overview of machine learning algorithms, specifically regression. It defines regression as constructing a model to predict dependent variables from independent variables. Regression uses continuous output variables like salary or weight. Examples of regression applications given include sales forecasting, price analysis, and risk assessment. Simple linear regression fits a linear relationship between one dependent and independent variable. It finds the slope and y-intercept that minimize the sum of squared errors between the predicted and actual values using the least squares method. The goodness of fit is measured by R-squared, which indicates the percentage of variation explained by the model. Logistic regression is also introduced as a technique to predict categorical dependent variables.

Uploaded by

suma varanasi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

Artificial Intelligence

(CSE3007)
Unit – 04 (Part-I)

Machine Learning Algorithms

Dr. Susant Kumar Panigrahi


Assistant Professor
School of Electrical & Electronics Engineering
What is Regression?

• The main goal of regression is the construction of an efficient model


to predict the dependent attributes from a bunch of attribute
variables. A regression problem is when the output variable is either
real or a continuous value i.e. salary, weight, area, etc.

• We can also define regression as a statistical means that is used in


applications like housing, investing, etc. It is used to predict the
relationship between a dependent variable and a bunch of
independent variables.
Examples
Applications of Regression

1. Evaluating Trends and Sales Estimates

• Linear regressions can be used in business to


evaluate trends and make estimates or forecasts.
• For example, if a company’s sales have increased
steadily every month for the past few years,
conducting a linear analysis on the sales data
with monthly sales on the y-axis and time on the
x-axis would produce a line that that depicts the
upward trend in sales. After creating the trend
line, the company could use the slope of the line
to forecast sales in future months.
2. Analyzing the Impact of Price Changes

• Linear regression can also be used to analyze the


effect of pricing on consumer behavior.

• For example, if a company changes the price on a


certain product several times, it can record the
quantity it sells for each price level and then
performs a linear regression with quantity sold as
the dependent variable and price as the
explanatory variable. The result would be a line
that depicts the extent to which consumers reduce
their consumption of the product as prices
increase, which could help guide future pricing
decisions.
3. Assessing Risk
Simple Linear Regression

• One of the most interesting


and common regression
technique is simple linear
regression. In this, we predict
the outcome of a dependent
variable based on the
independent variables, the
relationship between the
variables is linear. Hence, the
word linear regression.
• Simple linear regression is a regression
technique in which the independent variable
has a linear relationship with the dependent
variable. The straight line in the diagram is
the best fit line.

• The main goal of the simple linear regression


is to consider the given data points and plot
the best fit line to fit the model in the best
way possible.
The Main Idea of Least Square and Linear Regression

Data Points of some


observations
Dependent Variable

But which among these lines best fit


Independent Variable
the data for future prediction..!!
The Main Idea of Least Square and Linear Regression
Lets measure how well this line fits the data….
Lets start with a worst case scenario….

 So far the distance between the data points and the
line is:
The Main Idea of Least Square and Linear Regression
 So far the distance between the data points and the
line is:
The Main Idea of Least Square and Linear Regression
Finally …….
  So to make the cost positive and more
mathematically meaningful, each difference
terms are squared and added together to find
the fit:

= 24.62

This measure indicate how well the line


fits the data
The Main Idea of Least Square and Linear Regression
Rotate the line a little bit and check how well it
fits = 18.72

Rotate the line a little bit more and check how


well it fits = 14.05
The Main Idea of Least Square and Linear Regression

Rotate the line a whole lot then how well it fits


= 31.71
The Main Idea of Least Square and Linear Regression
There is a sweet spot between the horizontal
line and the last case of “whole lot rotated
line” for which we may get the optimal value of
the fit.
 The generic line equation for the above linear
regression is:

or Slope
y- intercept

 We need to find out the optimum value of and


so that we minimize the sum of squared
As we are looking to find the value
residual.
of m and c so that we will get
 
Mathematically: smallest sum of residual, so it is
Sum of squared residual = called as “Least Square”
The Main Idea of Least Square and Linear Regression

How do we find the optimal rotation:


“We take the derivative of this
function”

Derivative tells us the slope of the function at


every point…

Notice: The slope at the best point (the “Least


Square”) is zero.

Different rotations are the different values of slope m and y-intercept c.


The big concepts…!!!!!

••  We
want to minimize the squares of the distance between the
observed value and the line.

• We do this by taking the derivative and finding the values of


slope and y-intercept where it is equal to zero.

• The final line minimizes the sums of squares (“least square”)


between it and the real data.
Understanding Linear Regression Algorithm
 
Mean of
Mean of

 Centroid ()

The best fit regression line must pass


through the centroid.

So we need to find out the equation of line that should pass


through the centroid point using least square approach.
Finding the equation of line …..

 The generic line equation for the above linear


regression is:

or

𝑐=𝑦 −𝑚 𝑥
 
4
 

𝑚= =0 . 4
𝑐=3.6−0.4×3=2.4
 

10
The Predicted Line…..
The Predicted Line…..
Goodness of fit…. – R2

WHAT IS R-SQUARED?

 R-squared is a statistical measure of how close the data are to the fitted
regression line.
 It is also known as the coefficient of determination, or the coefficient of
multiple determination for multiple regression.
 The definition of R-squared is fairly straight-forward; it is the percentage of
the response variable variation that is explained by a linear model.
 R-squared = Explained variation / Total variation
Calculation of – R2
Calculation of – R2

 
2
𝑅 ≈0.3
Interpretation of values of R2

R2=1
Regression line is a Perfect fit
on actual values

R2=0
There is larger distance
between Actual and predicted
values.
Advantages And Disadvantages

Advantages Disadvantages
Linear regression performs exceptionally The assumption of linearity between
well for linearly separable data dependent and independent variables
Easier to implement, interpret and It is often quite prone to noise and
efficient to train overfitting
It handles overfitting pretty well using Linear regression is quite sensitive to
dimensionally reduction techniques,
regularization, and cross-validation outliers
One more advantage is the extrapolation It is prone to multicollinearity
beyond a specific data set
Solve it
• Use least-squares regression to fit a straight line to

• Also find the goodness of fit. Analyze the result.


Logistic Regression
What is Regression?

• Regression analysis is a powerful statistical analysis technique. A dependent


variable of our interest is used to predict the values of other independent
variables in a data-set.
• We come across regression in an intuitive way all the time. Like predicting the
weather using the data-set of the weather conditions in the past.
• It uses many techniques to analyses and predict the outcome, but the
emphasis is mainly on relationship between dependent variable and one or
more independent variable.
• Logistic regression analysis predicts the outcome in a binary variable which
has only two possible outcomes.
What Is Logistic Regression?
• Logistic regression is a classification algorithm, used when the
value of the target variable is categorical in nature. Logistic
regression is most commonly used when the data in question
has binary output, so when it belongs to one class or another, or
is either a 0 or 1.

• Remember that classification tasks have discrete categories,


unlike regressions tasks.
• Logistic Regression is a Machine Learning algorithm
which is used for the classification problems, it is a
predictive analysis algorithm and based on the
concept of probability.
Logistic Regression

• It is a technique to analyze a data-set which has a dependent


variable and one or more independent variables to predict
the outcome in a binary variable, meaning it will have only
two outcomes.

• The dependent variable is categorical in nature. Dependent


variable is also referred as target variable and the
independent variables are called the predictors.
• Logistic regression is a special case of linear regression
where we only predict the outcome in a categorical
variable. It predicts the probability of the event using
the log function.

• We use the Sigmoid function/curve to predict the


categorical value. The threshold value decides the
outcome(win/lose).
• We can call a Logistic Regression a Linear Regression model but the Logistic
Regression uses a more complex cost function, this cost function can be
defined as the ‘Sigmoid function’ or also known as the ‘logistic function’
instead of a linear function.
• The hypothesis of logistic regression tends it to limit the cost function between
0 and 1. Therefore linear functions fail to represent it as it can have a value
greater than 1 or less than 0 which is not possible as per the hypothesis of
logistic regression.
What is the Sigmoid Function?

• In order to map predicted values to probabilities, we use the


Sigmoid function. The function maps any real value into
another value between 0 and 1. In machine learning, we use
sigmoid to map predictions to probabilities.

• The sigmoid function/logistic function is a function that


resembles an “S” shaped curve when plotted on a graph. It
takes values between 0 and 1 and “squishes” them towards
the margins at the top and bottom, labeling them as 0 or 1.
 • The equation for the Sigmoid function is this:

• What is the variable e in this instance? The e represents the


exponential function or exponential constant, and it has a value of
approximately 2.71828.
Example:

Age group people those who either brought


insurance or not.

Have_insurance = 1 (Brought Insurance)

Have_insurance = 0 (Have no Insurance)


Applying Linear Regression
Applying Linear Regression  Thesholding

Likely to buy insurance

Applying Linear Regression  Thesholding


[Lets assume we have another extreme value]
Unlikely to buy
insurance

New predictions are more


erroneous
Sigmoid or Logit Function
 
1
𝑆 ( 𝑦 )= 𝑦
1+ 𝑒
Linear Regression and Logistic Regression
Relationship
Linear Regression Logistic Regression
1. Definition To predict a continuous dependent To predict a categorical dependent
variable based on values of variable based on values of
independent variable independent variables
2. Variable Type Continuous dependent variable Categorical dependent variable

3. Estimation Method Least square estimation Maximum likelihood estimation


4. Equation Y= a0+a1x Log()=a0+a1x1+a2x2 +…+ anxn
5. Best fit line Straight line Curve
6. Relationship between Linear Non linear
dependent and independent
variable
7. Output Predicted Integer value Predicted binary value (0/1)
Types Of Logistic Regression

• Binary logistic regression – It has


only two possible outcomes.
Example- yes or no
• Multinomial logistic regression – It
has three or more nominal
categories. Example- cat, dog,
elephant.
• Ordinal logistic regression- It has
three or more ordinal categories,
ordinal meaning that the categories
will be in a order. Example- user
ratings(1-5).

You might also like