LECTURE Regression
LECTURE Regression
Regression:
Regression is a statistical approach used to analyze the relationship
between a dependent variable (target variable) and one or more
independent variables (predictor variables). The objective is to
determine the most suitable function that characterizes the
connection between these variables.
31/12/2024
Regression in Machine Learning
• It is a supervised machine learning technique, used to predict the
value of the dependent variable for new, unseen data. It models the
relationship between the input features and the target variable,
allowing for the estimation or prediction of numerical values.
• Regression analysis problem works with if output variable is a real or
continuous value, such as “salary” or “weight”. Many different models
can be used, the simplest is the linear regression. It tries to fit data
with the best hyper-plane which goes through the points.
Terminologies Related to the Regression
Analysis in Machine Learning
• Response Variable: The primary factor to predict or understand in
regression, also known as the dependent variable or target variable.
• Predictor Variable: Factors influencing the response variable, used to
predict its values; also called independent variables.
• Outliers: Observations with significantly low or high values compared to
others, potentially impacting results and best avoided.
• Multicollinearity: High correlation among independent variables, which can
complicate the ranking of influential variables.
• Underfitting and Overfitting: Overfitting occurs when an algorithm
performs well on training but poorly on testing, while underfitting indicates
poor performance on both datasets.
Regression Types:
There are two main types of regression:
Linear Regression
Relationship between the dependent variable and independent variable(s) follows a linear
pattern.
Simple Regression
Used to predict a continuous dependent variable based on a single independent variable.
Simple linear regression should be used when there is only a single independent variable.
Multiple Regression
Used to predict a continuous dependent variable based on multiple independent
variables.
Multiple linear regression should be used when there are multiple independent variables.
NonLinear Regression
Relationship between the dependent variable and independent variable(s) follows a
nonlinear pattern.
Provides flexibility in modeling a wide range of functional forms
Regression Algorithms
• Linear Regression
• Linear regression is one of the simplest and most widely used statistical models. This assumes that there is a linear relationship
between the independent and dependent variables. This means that the change in the dependent variable is proportional to the
change in the independent variables.
• Polynomial Regression
• Polynomial regression is used to model nonlinear relationships between the dependent variable and the independent variables. It
adds polynomial terms to the linear regression model to capture more complex relationships.
• Support Vector Regression (SVR)
• Support vector regression (SVR) is a type of regression algorithm that is based on the support vector machine (SVM) algorithm. SVM
is a type of algorithm that is used for classification tasks, but it can also be used for regression tasks. SVR works by finding a
hyperplane that minimizes the sum of the squared residuals between the predicted and actual values.
• Decision Tree Regression
• Decision tree regression is a type of regression algorithm that builds a decision tree to predict the target value. A decision tree is a
tree-like structure that consists of nodes and branches. Each node represents a decision, and each branch represents the outcome of
that decision. The goal of decision tree regression is to build a tree that can accurately predict the target value for new data points.
• Random Forest Regression
• Random forest regression is an ensemble method that combines multiple decision trees to predict the target value. Ensemble
methods are a type of machine learning algorithm that combines multiple models to improve the performance of the overall model.
Random forest regression works by building a large number of decision trees, each of which is trained on a different subset of the
training data. The final prediction is made by averaging the predictions of all of the trees.
Applications of Regression:
• Predicting prices: For example, a regression model could be used to
predict the price of a house based on its size, location, and other
features.
• Forecasting trends: For example, a regression model could be used to
forecast the sales of a product based on historical sales data and
economic indicators.
• Identifying risk factors: For example, a regression model could be
used to identify risk factors for heart disease based on patient data.
• Making decisions: For example, a regression model could be used to
recommend which investment to buy based on market data.
Advantages of Regression
• where:
where:
In regression set of records are present with X and Y values and these
values are used to learn a function so if you want to predict Y from an
unknown X this learned function can be used.
In regression we have to find the value of Y, So, a function is required
that predicts continuous Y in the case of regression given X as
independent features.
What is the best Fit Line?