0% found this document useful (0 votes)
18 views

Module 4

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Module 4

Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 41

Module 4

Introduction to Machine Learning -

4CS1201
Syllabus • MODULE 4
Regression

• Regression is a supervised learning technique used to predict


continuous numerical values based on input features.

• It aims to establish a functional relationship between independent


variables and a dependent variable, such as predicting house prices
based on features like size, bedrooms, and location.

• The goal is to minimize the difference between predicted and actual


values
Example
• Dataset contains data related to the CO2 emissions from
different cars.
• The dataset contains attributes such as car engine size,
number of cylinders, fuel consumption and CO2 emission
from various automobile models.
• The dataset contains historical data from different cars.
• To estimate the approximate CO2 emission from a new car
model after its production, we can use machine learning
regression model.
• In regression, there are two types of variables: a dependent variable
and one or more independent variables.
• The dependent variable is the "state", "target" or "final goal" we
study and try to predict, and the independent variables, also known as
explanatory variables, are the causes of those "states".
• The independent variables are shown conventionally by X, and the
dependent variable is denoted by Y. A regression model relates Y, or
the dependent variable, to a function of X, i.e., the independent
variables.
• The key point in regression is that the dependent variable value is
continuous.
• However, the independent variable or variables can be measured on
either a categorical or continuous measurement scale.
• In our hypothetical example of estimating the approximate CO2
emission from a new car model after its production, we can use the
historical data of some cars, using one or more of their features, to
make a machine learning model.
• We can use regression to build such a regression/estimation model.
• The model can then be used to predict the expected CO2 emission
for a new or unknown car model.
Real-World Applications of Regression..
• Essentially, we use regression when we want to estimate a continuous
value
1.One of the applications of regression analysis could be in the area of
sales forecasting. We can try to predict a salesperson's total yearly
sales from independent variables such as age, education, and years of
experience in the sales profession.
2.We can use regression analysis to predict the price of a house in an
area, based on its size, number of bedrooms, and so on.
3.We can even use regression analysis to predict employment income
for independent variables, such as hours of work, education,
occupation, sex, age, years of experience, and so on.
Linear Regression
• Linear regression is a type of supervised machine learning algorithm
that computes the linear relationship between a dependent variable
and one or more independent features.
• When the number of the independent feature, is 1 then it is known as
Univariate Linear regression
• In the case of more than one feature, it is known as multivariate
linear regression.
The formula for linear regression is:
Syntax:
y = θx + b
where,
•θ – It is the model weights or parameters
•b – It is known as the bias.
Python

from sklearn.linear_model import LinearRegression

# Create a linear regression model


model = LinearRegression()

# Fit the model to the data


model.fit(X, y)

# Predict the response for a new data point


y_pred = model.predict(X_new)
Applications
• Economics and Finance: Linear regression is extensively used in economics and
finance for modeling relationships between variables. It can be used to analyze the
impact of interest rates on consumer spending, predict stock prices based on
historical data, assess the relationship between supply and demand, etc.
• Marketing and Business Analysis: Linear regression helps businesses
understand the relationship between marketing expenditures and sales. It can be
used to predict sales volumes based on advertising budgets, pricing strategies, or
market demographics. Companies often use regression analysis to optimize
marketing campaigns and maximize returns on investment.
• Healthcare: Linear regression is employed in healthcare to analyze the
relationship between medical treatments and patient outcomes. It can be used to
predict patient recovery times based on treatment protocols, assess the
effectiveness of new drugs, or forecast disease prevalence based on demographic
factors.
• Environmental Science: In environmental science, linear regression is used to
model relationships between environmental variables. For example, it can be
applied to predict the impact of pollution levels on public health, analyze the
relationship between temperature and precipitation, or forecast changes in
• Social Sciences: Linear regression is widely used in social sciences to study
various phenomena such as crime rates, educational outcomes, and demographic
trends. Researchers use regression analysis to identify factors influencing social
behaviors, predict voting patterns, or understand the determinants of income
inequality.
• Engineering: In engineering disciplines, linear regression is used for modeling
and optimization tasks. It can be applied to predict equipment failure rates based
on usage patterns, analyze the relationship between process parameters and
product quality, or optimize the design of mechanical systems.
• Sports Analytics: Linear regression is increasingly used in sports analytics to
analyze player performance, predict game outcomes, and optimize team strategies.
Analysts use regression models to identify key performance indicators, assess
player value, and inform decision-making in drafting, trading, and player
development.
• Forecasting: Linear regression is commonly used for time series forecasting in
various domains such as sales forecasting, demand forecasting, and financial
forecasting. By analyzing historical data trends, regression models can provide
valuable insights into future outcomes, helping businesses make informed
decisions and plan for the future.
Simple Linear Regression
• Simple Linear Regression is a type of Regression algorithms that
models the relationship between a dependent variable and a single
independent variable.
• The relationship shown by a Simple Linear Regression model is
linear or a sloped straight line, hence it is called Simple Linear
Regression.
• The key point in Simple Linear Regression is that the dependent
variable must be a continuous/real value
• The independent variable can be measured on continuous or
categorical values.
Simple Linear regression algorithm has mainly two
objectives:
• Model the relationship between the two variables. Such as the
relationship between Income and expenditure, experience and Salary,
etc.
• Forecasting new observations. Such as Weather forecasting
according to temperature, Revenue of a company according to the
investments.
• Simple linear regression is a statistical method you can use to
understand the relationship between two variables, x and y.
• x, is known as the predictor variable.
• y, is known as the response variable.

y=β0​+β1​X
where:
• Y is the dependent variable
• X is the independent variable
• β0 is the intercept
• β1 is the slope
• For example, suppose we have the following
dataset with the weight and height of seven
individuals:

Let weight be the predictor variable and let height be the response variable.
If we graph these two variables using a scatterplot, with weight on the x-axis and
height on the y-axis, here’s what it would look like:
From the scatterplot we can clearly see that as weight increases,
height tends to increase as well, but to actually quantify this
relationship between weight and height, we need to use linear
regression.
Implementation
• Simple Linear Regression in Machine learning - Javatpoint
Multiple Linear Regression
• Multiple linear regression is used to estimate the relationship
between two or more independent variables and one dependent
variable. You can use multiple linear regression when you want to
know:
1.How strong the relationship is between two or more
independent variables and one dependent variable (e.g. how rainfall,
temperature, and amount of fertilizer added affect crop growth).
2.The value of the dependent variable at a certain value of the
independent variables (e.g. the expected yield of a crop at certain
levels of rainfall, temperature, and fertilizer addition).
Multiple Linear Regression
Multiple Linear Regression

• Below is a list of what each variable represents, as seen in the picture above:
• Y = the dependent or response variable. This is the variable you are looking to
predict.
• B0 = This is the y-intercept, which is the value of y when all other parameters are
set to 0 (independent variables and error term).
• B1X1= (B1) is the coefficient of the first independent variable (X1) in your model.
This can be interpreted as the effect that changing the value of the independent
variable has on the predicted y value, holding all else equal. That is when X1 goes
up by one unit, then predicted y goes up by B1 value.
• “…” = the additional variables you have in your model.
• e = this is the model error. It explains how much variation there is in our prediction
of y.
• Steps Involved in any Multiple Linear Regression Model
• Step #1: Data Pre Processing
1.Importing The Libraries.
2.Importing the Data Set.
3.Encoding the Categorical Data.
4.Avoiding the Dummy Variable Trap.
5.Splitting the Data set into Training Set and Test Set.
• Step #2: Fitting Multiple Linear Regression to the Training set
Step #3: Predict the Test set results.
Implementation
• Multiple Linear Regression With scikit-learn – GeeksforGeeks

• Introduction to Multiple Linear Regression – Statology


• Multiple Linear Regression in Machine learning - Javatpoint
Polynomial Regression
• Polynomial Regression is a regression algorithm that models the
relationship between a dependent, y and independent variable (or
vector of independent variables), x as nth degree polynomial. The
Polynomial Regression equation is given below:
• It is also called the special case of Multiple Linear Regression in
ML. Because we add some polynomial terms to the Multiple Linear
regression equation to convert it into Polynomial Regression.
• It is a linear model with some modification in order to increase the
accuracy.
• It makes use of a linear regression model to fit the complicated and
non-linear functions and datasets.
• The dataset used in Polynomial regression for training is of non-
linear nature.
• Hence, "In Polynomial regression, the original features are
converted into Polynomial features of required degree (2,3,..,n) and
then modeled using a linear model."
Need for Polynomial Regression:
• The need of Polynomial Regression in ML can be understood in the
below points:
• If we apply a linear model on a linear dataset, then it provides us a
good result as we have seen in Simple Linear Regression, but if we
apply the same model without any modification on a non-linear
dataset, then it will produce a drastic output. Due to which loss
function will increase, the error rate will be high, and accuracy
will be decreased.
• So for such cases, where data points are arranged in a non-linear
fashion, we need the Polynomial Regression model.
We can understand it in a better way using the below
comparison diagram of the linear dataset and non-linear dataset.

•In the above image, we have taken a dataset which is arranged non-linearly. So if we try to cover it with a linear
model, then we can clearly see that it hardly covers any data point. On the other hand, a curve is suitable to cover
most of the data points, which is of the Polynomial model.
•Hence, if the datasets are arranged in a non-linear fashion, then we should use the Polynomial Regression model
instead of Simple Linear Regression.
• Steps :
• Data Preparation: Like any machine learning task, you need to prepare your
dataset. This involves cleaning the data, handling missing values, and splitting it
into training and testing sets.
• Feature Engineering: In polynomial regression, you might need to create
additional features by raising the original features (independent variables) to
different powers. For example, if you have a feature x, you might create new
features up to the desired degree.
• Model Selection: Choose the degree of the polynomial that best fits your data.
• Model Fitting: Once you've chosen the degree of the polynomial, fit the
polynomial regression model to your training data. This involves estimating the
coefficients of the polynomial terms that minimize the error between the
predicted and actual values.
• Model Evaluation: Evaluate the performance of your model using metrics like
Mean Squared Error (MSE), Root Mean Squared Error(RMSE),R-squared, etc.,
on the testing dataset.
• Prediction: Use the trained model to make predictions on new, unseen data.
Applications of polynomial regression:
• Curve Fitting: Polynomial regression is often used in curve fitting applications
where the relationship between variables is non-linear. For example, in physics,
it can be used to fit data to equations describing physical phenomena such as
projectile motion or the behavior of a spring.
• Finance: In finance, polynomial regression can be used to model the relationship
between factors such as interest rates, economic indicators, and stock prices.
For instance, it can be employed to analyze the behavior of stock prices over time,
taking into account non-linear patterns.
• Economics: Polynomial regression can be applied in economics to study the
relationship between economic variables like GDP, inflation rates,
unemployment rates, and other factors affecting economic growth. It allows
economists to capture non-linear trends in the data.
• Environmental Science: In environmental science, polynomial regression can be
used to analyze trends in environmental data such as temperature changes,
pollution levels, or species population dynamics. It helps researchers understand
the complex relationships between various environmental factors.
• Medicine and Biology: Polynomial regression is used in medical research and biology to model the relationship
between variables such as dosage and response in drug trials, growth patterns of organisms, or disease
progression. It enables researchers to identify non-linear relationships in biological data.
• Marketing and Sales: In marketing and sales, polynomial regression can be employed to analyze consumer
behavior, sales trends, and market demand. It helps businesses understand the non-linear relationship
between factors like advertising expenditure and sales revenue.
• Signal Processing: Polynomial regression can be used in signal processing applications such as noise filtering,
audio and image processing, and signal reconstruction. It helps in capturing non-linear patterns in signals and
extracting meaningful information.
• Geology and Geophysics: In geology and geophysics, polynomial regression can be utilized to analyze
geological data such as seismic measurements, rock properties, or soil composition. It assists in
understanding the non-linear relationships between geological variables.
• Quality Control and Manufacturing: Polynomial regression can be applied in manufacturing processes for
quality control and process optimization. It helps in modeling the relationship between process parameters
and product quality, identifying non-linear patterns affecting manufacturing outcomes.
• Astronomy and Astrophysics: Polynomial regression is used in astronomy and astrophysics to analyze
observational data, model celestial phenomena, and predict astronomical events. It helps researchers
understand the complex relationships between astronomical variables.
Implementation

• Implementation of Polynomial Regression - GeeksforGeeks


• Machine learning Polynomial Regression - Javatpoint
Logistic
Regression
• Logistic regression is a statistical method used for
binary classification tasks, where the target variable
(or dependent variable) is categorical and has only two
possible outcomes, typically represented as 0 and 1.
• It's named "logistic" because it's based on the logistic
function, also known as the sigmoid function.
• The logistic regression model estimates the
probability that a given input belongs to a particular
category. Unlike linear regression, which predicts
continuous outcomes, logistic regression predicts the
probability of a binary outcome.
Applications of logistic regression
• Medical Diagnosis: Predicting whether a patient has a certain disease based on
symptoms and medical test results.
• Credit Scoring: Determining the likelihood of a customer defaulting on a loan
based on their credit history and financial information.
• Marketing: Predicting whether a customer will respond positively to a
marketing campaign or purchase a product.
• Fraud Detection: Identifying fraudulent transactions based on transaction
patterns and customer behavior.
• Customer Churn Prediction: Predicting whether a customer will stop using a
service or unsubscribe based on their usage patterns and demographics.
• Sentiment Analysis: Classifying text data (e.g., customer reviews, social media
posts) as positive or negative sentiment.
• Click-Through Rate Prediction: Predicting the likelihood of a user clicking on
an online advertisement based on user demographics and ad features.
• Types of Logistic Regression

• On the basis of the categories, Logistic Regression can be classified into three
types:
• Binomial: In binomial Logistic regression, there can be only two possible types
of the dependent variables, such as 0 or 1, Pass or Fail, etc.
• Multinomial: In multinomial Logistic regression, there can be 3 or more
possible unordered types of the dependent variable, such as “cat”, “dogs”, or
“sheep”
• Ordinal: In ordinal Logistic regression, there can be 3 or more possible ordered
types of dependent variables, such as “low”, “Medium”, or “High”.
Implementation
• Logistic Regression in Machine Learning - Javatpoint

• Logistic Regression in Machine Learning - GeeksforGeeks


Metrics for Evaluating Regression Model
Performance |
• Mean Absolute Error (MAE)
In the fields of statistics and machine learning, the Mean Absolute Error (MAE) is a
frequently employed metric. It’s a measurement of the typical absolute
discrepancies between a dataset’s actual values and projected values.

The formula to calculate MAE for a data with “n” data points is:

Where:

xi represents the actual or observed values for the i-th data point.
yi represents the predicted value for the i-th data point.
• Mean Squared Error (MSE)
A popular metric in statistics and machine learning is the Mean Squared Error (MSE). It
measures the square root of the average discrepancies between a dataset’s actual values and
projected values. MSE is frequently utilized in regression issues and is used to assess how
well predictive models work.

For a dataset containing ‘n’ data points, the MSE calculation formula is:

where:

xi represents the actual or observed value for the i-th data point.
yi represents the predicted value for the i-th data point.
R-squared (R²) Score
A statistical metric frequently used to assess the goodness of fit of a regression model is the R-
squared (R2) score, also referred to as the coefficient of determination. It quantifies the
percentage of the dependent variable’s variation that the model’s independent variables
contribute to. R2 is a useful statistic for evaluating the overall effectiveness and explanatory
power of a regression model.

The formula to calculate the R-squared score is as follows:

Where:
R2 is the R-Squared.
SSR represents the sum of squared residuals between the predicted values and actual values.
SST represents the total sum of squares, which measures the total variance in the dependent
variable.
• Root Mean Squared Error (RMSE)
RMSE stands for Root Mean Squared Error. It is a usually used metric in regression analysis
and machine learning to measure the accuracy or goodness of fit of a predictive model,
especially when the predictions are continuous numerical values.

The RMSE quantifies how well the predicted values from a model align with the actual
observed values in the dataset.

The formula for RMSE for a data with ‘n’ data points is as follows:

Where:

RMSE is the Root Mean Squared Error.


xi represents the actual or observed value for the i-th data point.
yi represents the predicted value for the i-th data point.
Thank you

You might also like