Exp No 03
Exp No 03
Ex.No: 03
Simple Linear, Multiple Linear and Logistic Regression
Date: 12.01.24
Aim:
To perform linear, multiple linear and logistic regression for the given dataset.
Description:
Regression:
The term regression is used when you try to find the relationship between variables. It is a
Supervised Learning task where the output has continuous value. The goal here is to predict a
value as much closer to the actual output value as our model can and then evaluation is done by
calculating the error value. The smaller the error the greater the accuracy of our regression model.
Regression Analysis:
Regression analysis is a statistical method to model the relationship between dependent (target)
and independent (predictor) variables with one or more independent variables. More specifically,
Regression analysis helps us to understand how the value of the dependent variable is changing
corresponding to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.
Dependent Variable: The variable you want to predict is called the dependent variable.
Independent Variables: The variable you are using to predict the other variable's value is called
the independent variable. Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent (target) and independent variable (s) (predictor).
This technique is used for forecasting, time series modelling and finding the causal effect
relationship between the variables. For example, the relationship between rash driving and the
number of road accidents by a driver is best studied through regression. Regression analysis is an
important tool for modelling and analyzing data. Here, we fit a curve/line to the data points, in
such a manner that the differences between the distances of data points from the curve or line are
minimized. There are multiple benefits of using regression analysis. They are as follows:
20CS2032L - Machine Learning Lab URK21CS1097
1. It indicates the significant relationships between a dependent variable and the independent
variable.
2. It indicates the strength of the impact of multiple independent variables on a dependent variable.
There are various kinds of regression techniques available to make predictions. These techniques
are mostly driven by three metrics (number of independent variables, type
of dependent variables and shape of regression line).
Linear Regression:
Linear Regression establishes a relationship between the dependent variable (Y) and one or more
independent variables (X) using a best-fit straight line (also known as a regression line). It is used
to estimate real values (cost of houses, number of calls, total sales etc.) based on a continuous
variable(s). Here, we establish a relationship between independent and dependent variables by
fitting the best line. This best-fit line is known as the regression line and is represented by a linear
equation Y= a *X + b.
Multiple Linear Regression attempts to model the relationship between two or more features
and a response by fitting a linear equation to observed data. The steps to perform multiple
linear Regression are almost similar to that of simple linear Regression. The Difference Lies
in the Evaluation. We can use it to find out which factor has the highest impact on the
predicted output and now different variables relate to each other.
Logistic Regression:
It’s a classification algorithm that is used where the target variable is of categorical nature. The
main objective behind Logistic Regression is to determine the relationship between features and
the probability of a particular outcome. For Example, when we need to predict whether a student
passes or fails an exam given the number of hours spent studying as a feature, the target variable
comprises two values i.e. pass and fail.
Performance Metrics:
• Root Mean Squared Error (RMSE)
• R-squared or Coefficient of determination (r2 )
• Mean Absolute Error
• Mean Square Error
2. Do necessary preprocessing
20CS2032L - Machine Learning Lab URK21CS1097
3. Choose independent variable (X) and dependent variable (Y) from given dataset
2. Do necessary preprocessing
20CS2032L - Machine Learning Lab URK21CS1097
3. Choose independent variables (X1,X2 ....) and dependent variable (Y) from given dataset
6. Calculate the SSE (sum of squared error)and RMSE (Root Mean Square Error) value
20CS2032L - Machine Learning Lab URK21CS1097
3. Choose independent variables (X1,X2 ....) and dependent variable (Y) from given dataset
20CS2032L - Machine Learning Lab URK21CS1097
6. Calculate the cost or error and reduce the cost or error using gradient descent.
20CS2032L - Machine Learning Lab URK21CS1097
Result:
Linear, Multiple-Linear and Logistic Regression is performed successfully for the given dataset.