0% found this document useful (0 votes)
22 views15 pages

Exp No 03

The document discusses performing simple linear, multiple linear, and logistic regression on a dataset. It provides descriptions of regression analysis, linear regression, multiple linear regression, and logistic regression. It also lists questions to help understand how to implement each technique and evaluate the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views15 pages

Exp No 03

The document discusses performing simple linear, multiple linear, and logistic regression on a dataset. It provides descriptions of regression analysis, linear regression, multiple linear regression, and logistic regression. It also lists questions to help understand how to implement each technique and evaluate the results.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

20CS2032L - Machine Learning Lab URK21CS1097

Ex.No: 03
Simple Linear, Multiple Linear and Logistic Regression
Date: 12.01.24

Aim:
To perform linear, multiple linear and logistic regression for the given dataset.

Description:
Regression:
The term regression is used when you try to find the relationship between variables. It is a
Supervised Learning task where the output has continuous value. The goal here is to predict a
value as much closer to the actual output value as our model can and then evaluation is done by
calculating the error value. The smaller the error the greater the accuracy of our regression model.

Regression Analysis:
Regression analysis is a statistical method to model the relationship between dependent (target)
and independent (predictor) variables with one or more independent variables. More specifically,
Regression analysis helps us to understand how the value of the dependent variable is changing
corresponding to an independent variable when other independent variables are held fixed. It
predicts continuous/real values such as temperature, age, salary, price, etc.
Dependent Variable: The variable you want to predict is called the dependent variable.
Independent Variables: The variable you are using to predict the other variable's value is called
the independent variable. Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent (target) and independent variable (s) (predictor).
This technique is used for forecasting, time series modelling and finding the causal effect
relationship between the variables. For example, the relationship between rash driving and the
number of road accidents by a driver is best studied through regression. Regression analysis is an
important tool for modelling and analyzing data. Here, we fit a curve/line to the data points, in
such a manner that the differences between the distances of data points from the curve or line are
minimized. There are multiple benefits of using regression analysis. They are as follows:
20CS2032L - Machine Learning Lab URK21CS1097

1. It indicates the significant relationships between a dependent variable and the independent
variable.
2. It indicates the strength of the impact of multiple independent variables on a dependent variable.
There are various kinds of regression techniques available to make predictions. These techniques
are mostly driven by three metrics (number of independent variables, type
of dependent variables and shape of regression line).

Linear Regression:
Linear Regression establishes a relationship between the dependent variable (Y) and one or more
independent variables (X) using a best-fit straight line (also known as a regression line). It is used
to estimate real values (cost of houses, number of calls, total sales etc.) based on a continuous
variable(s). Here, we establish a relationship between independent and dependent variables by
fitting the best line. This best-fit line is known as the regression line and is represented by a linear
equation Y= a *X + b.

In this equation: Y – Dependent Variable, a – Slope X – Independent variable, b – Intercept These


coefficients a and b are derived based on minimizing the sum of squared difference of
distance between data points and regression line.

Multiple Linear Regression (MLR):


MLR, also known simply as multiple regression, is a statistical technique that uses several
explanatory variables to predict the outcome of a response variable. The goal of multiple linear
regression (MLR) is to model the linear relationship between the explanatory (independent)
variables and the response (dependent) variable. In essence, multiple regression is the extension
of ordinary least-squares (OLS) regression that involves more than one explanatory variable.
The multiple regression model is based on the following assumptions:
•There is a linear relationship between the dependent variables and the independent variables.
•The independent variables are not too highly correlated with each other. (Absence of
Multicollinearity)
•Observations are selected independently and randomly from the population.
•Residuals should be normally distributed with a mean of 0 and variance σ.
20CS2032L - Machine Learning Lab URK21CS1097

Multiple Linear Regression attempts to model the relationship between two or more features
and a response by fitting a linear equation to observed data. The steps to perform multiple
linear Regression are almost similar to that of simple linear Regression. The Difference Lies
in the Evaluation. We can use it to find out which factor has the highest impact on the
predicted output and now different variables relate to each other.

Logistic Regression:
It’s a classification algorithm that is used where the target variable is of categorical nature. The
main objective behind Logistic Regression is to determine the relationship between features and
the probability of a particular outcome. For Example, when we need to predict whether a student
passes or fails an exam given the number of hours spent studying as a feature, the target variable
comprises two values i.e. pass and fail.

Performance Metrics:
• Root Mean Squared Error (RMSE)
• R-squared or Coefficient of determination (r2 )
• Mean Absolute Error
• Mean Square Error

Questions for Linear Regression:


1. Read CSV data into pandas dataframe object
20CS2032L - Machine Learning Lab URK21CS1097

2. Do necessary preprocessing
20CS2032L - Machine Learning Lab URK21CS1097

3. Choose independent variable (X) and dependent variable (Y) from given dataset

4. Find the bo and b1 values to get Ypredicted


20CS2032L - Machine Learning Lab URK21CS1097

5. After getting Ypred, calculate the SSE (sum of squared error)

6. Calculate the RMSE (Root Mean Square Error) value

7. Calculate the coefficient of determination (r2) r-square

8. Plot regression line along with the given data points


20CS2032L - Machine Learning Lab URK21CS1097

9. Predict the output for a given input value

Questions for Multi-Linear Regression


1. Read CSV data into pandas dataframe object
20CS2032L - Machine Learning Lab URK21CS1097

2. Do necessary preprocessing
20CS2032L - Machine Learning Lab URK21CS1097

3. Choose independent variables (X1,X2 ....) and dependent variable (Y) from given dataset

4. Print values of y-intercept and independent variable coefficients

5. Find the Ypred

6. Calculate the SSE (sum of squared error)and RMSE (Root Mean Square Error) value
20CS2032L - Machine Learning Lab URK21CS1097

7. Calculate the coefficient of determination (r2) r-square

8. Predict the output for a given input values

Questions for Logistic Regression:


1. Read CSV data into pandas dataframe object

2. Do necessary preprocessing [data imputation in null values, use encoding techniques to


convert categorical to numerical]
20CS2032L - Machine Learning Lab URK21CS1097
20CS2032L - Machine Learning Lab URK21CS1097

3. Choose independent variables (X1,X2 ....) and dependent variable (Y) from given dataset
20CS2032L - Machine Learning Lab URK21CS1097

4. Find the regression line

5. Convert the regression line into sigmoid curve


20CS2032L - Machine Learning Lab URK21CS1097

6. Calculate the cost or error and reduce the cost or error using gradient descent.
20CS2032L - Machine Learning Lab URK21CS1097

7. Plot the graph for iteration against cost.

8. Find the accuracy of the model

Result:
Linear, Multiple-Linear and Logistic Regression is performed successfully for the given dataset.

You might also like