0% found this document useful (0 votes)
24 views11 pages

Regression

Uploaded by

sahdevsneha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views11 pages

Regression

Uploaded by

sahdevsneha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Regression

Definition
• Regression is a statistical approach to analyze the relationship
between a dependent variable (target variable) and one or more
independent variables (predictor variables). The objective is to
determine the most suitable function that characterizes the connection.
• It is a supervised machine learning technique, used to predict the
value of the dependent variable for new, unseen data. It models the
relationship between the input features and the target variable,
allowing for the estimation or prediction of numerical values. between
these variables.
Terminologies of Regression Analysis

• Response Variable: The primary factor to predict or understand in


regression, also known as the dependent variable or target variable.
• Predictor Variable: Factors influencing the response variable, used to
predict its values; also called independent variables.
• Outliers: Observations with significantly low or high values
compared to others, potentially impacting results and best avoided.
• Multicollinearity: High correlation among independent variables,
which can complicate the ranking of influential variables.
• Underfitting and Overfitting: Overfitting occurs when an algorithm
performs well on training but poorly on testing, while underfitting
indicates poor performance on both datasets.
Regression Types

• Simple Regression
• Used to predict a continuous dependent variable based on a single independent
variable.
• Simple linear regression should be used when there is only a single independent
variable.
• Multiple Regression
• Used to predict a continuous dependent variable based on multiple independent
variables.
• Multiple linear regression should be used when there are multiple independent
variables.
• NonLinear (Polynomial) Regression
• Relationship between the dependent variable and independent variable(s) follows a
nonlinear pattern.
• Provides flexibility in modeling a wide range of functional forms.
Characteristics of Regression

• Continuous Target Variable: Regression deals with predicting continuous target


variables that represent numerical values. Examples include predicting house prices,
forecasting sales figures, or estimating patient recovery times.
• Error Measurement: Regression models are evaluated based on their ability to minimize
the error between the predicted and actual values of the target variable. Common error
metrics include mean absolute error (MAE), mean squared error (MSE), and root mean
squared error (RMSE).
• Model Complexity: Regression models range from simple linear models to more
complex nonlinear models. The choice of model complexity depends on the complexity
of the relationship between the input features and the target variable.
• Overfitting and Underfitting: Regression models are susceptible to overfitting and
underfitting.
• Interpretability: The interpretability of regression models varies depending on the
algorithm used. Simple linear models are highly interpretable, while more complex
models may be more difficult to interpret.
Linear Regression
Linear regression is a quiet and the simplest statistical regression
technique used for predictive analysis in machine learning. It
shows the linear relationship between the independent(predictor)
variable i.e. X-axis and the dependent (output) variable i.e. Y-axis,
called linear regression. If there is a single input variable X
(independent variable), such linear regression is linear
regression.
Linear Regression
The goal of the linear regression algorithm is to get
the best values for B0 and B1 to find the best-fit line.
The best-fit line is a line that has the least error which
means the error between predicted values and actual
values should be minimal.
In Linear Regression, generally Mean Squared Error
(MSE) cost function is used, which is the average
squared error that occurred between the yi
predicted and yi.
Gradient Descent for Linear Regression
• Gradient Descent is one of the optimization algorithms
that optimize the cost function (objective function) to
reach the optimal minimal solution.
• To find the optimum solution, we need to reduce the
cost function (MSE) for all data points.
• This is done by updating the values of the slope
coefficient (B1) and the constant coefficient (B0)
iteratively until we get an optimal solution for the linear
function.
Logistic regression
• Logistic regression is a supervised machine learning algorithm used
for classification tasks where the goal is to predict the probability that an
instance belongs to a given class or not.
• Logistic regression is used for binary classification where we use a sigmoid
function, that takes input as independent variables and produces a probability
value between 0 and 1.
• Logistic regression predicts the output of a categorical dependent variable.
Therefore, the outcome must be a categorical or discrete value.
• It can be either Yes or No, 0 or 1, true or False, etc. but instead of giving the
exact value as 0 and 1, it gives the probabilistic values which lie between 0
and 1.
• In Logistic regression, instead of fitting a regression line, we fit an “S” shaped
logistic function, which predicts two maximum values (0 or 1).
Sigmoid Function
• Sigmoid function is a mathematical function that has an “S”-shaped
curve (sigmoid curve). It is widely used in various fields, including
machine learning, statistics, and artificial intelligence, particularly for its
smooth and bounded nature.
• In logistic regression, we use the concept of the threshold value, which
defines the probability of either 0 or 1. Such as values above the
threshold value tend to be 1, and a value below the threshold value tends
to be 0.
• The logistic regression model transforms the linear
regression function continuous value output into
categorical value output using a sigmoid function, which
maps any real-valued set of independent variables input
into a value between 0 and 1.
sigmoid function

Where z=w⋅X+b

You might also like