0% found this document useful (0 votes)

2 views36 pages

Machine Learning - Develop Machine Learning Model - Regression

The document outlines the process of developing a machine learning model, focusing on regression techniques to predict continuous target variables like vehicle prices. It discusses various regression methods, including Linear Regression, Random Forest Regressor, Decision Tree Regressor, and Support Vector Regressor, along with evaluation metrics such as RMSE and R-squared. Additionally, it covers the importance of data preparation, model training, and testing, as well as the advantages and applications of different regression models.

Uploaded by

MBABAZI Louange Liza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views36 pages

Machine Learning - Develop Machine Learning Model - Regression

Uploaded by

MBABAZI Louange Liza

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

Develop machine learning model

 Before develop machine learning model:

 Define the target variable

 You split your dataset into training and testing sets using the train_test_split function from Scikit-
learn.
Develop machine learning model

 Why Do We Use random_state?

 Consistent reproducibility
 Collaboration
 Etc.
Develop machine learning model

 Model Building
 As we know our target variable (selling_price) is continuous data, so we will use regression
technique. i.e.:
 Linear Regression,
 Random Forest Regressor,
 Decision Tree Regressor, and
 Support Vector Regressor (SVR).
 We train each model using the training data and then test how well they could predict vehicle
prices using both the training and testing data.
 We used metrics like Root Mean Squared Error (RMSE) and R-squared values to see how
accurate each model is.
 This helps us to understand which method works best for predicting vehicle prices accurately
Develop machine learning model

Why metrics like Root Mean Squared Error (RMSE) and R-squared values to see how accurate each model is?
Develop machine learning model

Why RMSE is used?

 Measures Prediction Error.
 RMSE represents the average deviation of the model's predictions from the actual values.
 Smaller RMSE values indicate better model performance.
Develop machine learning model

Why RMSE is used?

 Punishes Larger Errors
 RMSE squares the differences between predicted and actual values, making large errors more
significant than smaller ones.
 This is useful if large deviations are especially undesirable in your problem.
Develop machine learning model

Why RMSE is used?

 Interpretable in Original Units
 RMSE has the same units as the target variable, making it easier to interpret in a real-world context.
 For example, if the target variable is in dollars, RMSE tells you the average error in dollars.
Develop machine learning model

RMSE Limitations
 RMSE is sensitive to outliers since errors are squared.
 A single large error can disproportionately affect the RMSE.
Develop machine learning model

Why R-squared (R²)?

 R-squared indicates how well the independent variables explain the variability of the
dependent variable, with values closer to 1 suggesting a better fit.
Develop machine learning model
Develop machine learning model
Develop machine learning model
Develop machine learning model

Regression Analysis In ML
 Regression analysis is a statistical technique that predicts continuous numeric values based on the
relationship between independent and dependent variables.
 The main goal of regression analysis is to plot a line or curve that best fit the data and to estimate
how one variable affects another.
 Regression analysis is a fundamental concept in machine learning and it is used in many
applications such as forecasting, predictive analytics, etc.
 Regression models use the input data features (independent variables) and their corresponding
continuous numeric output values (dependent or outcome variables) to learn specific associations
between inputs and corresponding outputs.
Develop machine learning model

Terminologies in Regression Analysis

 Independent Variables: Predictors or features used to estimate the dependent variable.
 Dependent Variables: Target variables whose values are predicted.
 Regression Line: A line or curve that best fits the data points.
 Overfitting: Occurs when a model performs well on training data but poorly on test data (high
variance).
 Underfitting: Happens when the model fails to capture patterns in training data (high bias).
 Outliers: Extreme values that deviate significantly from the rest of the data.
 Multicollinearity: When independent variables are highly correlated with each other.
Develop machine learning model

Types of Regression in Machine Learning

Generally, the classification of regression methods is done based on the three metrics:
1. the number of independent variables,
2. type of dependent variables, and
3. shape of the regression line.
Develop machine learning model

Types of Regression in Machine Learning

Generally, the classification of regression methods is done based on the three metrics:
 Linear Regression
 Logistic Regression
 Polynomial Regression
 Lasso Regression
 Ridge Regression
 Decision Tree Regression
 Random Forest Regression
 Support Vector Regression
Develop machine learning model

Linear Regression
 Linear Regression is a supervised learning algorithm used for predicting a continuous target
variable based on one or more input variables (features).
 It assumes a linear relationship between the dependent and independent variables and uses a linear
equation to model this relationship.
Develop machine learning model

What is Linear Regression?

 Linear regression is a statistical technique that estimates the linear relationship
between a dependent and one or more independent variables.
 In machine learning, linear regression is implemented as a supervised learning
approach.
 In machine learning, labeled datasets contain input data (features) and output labels
(target values).
 For linear regression in machine learning, we represent features as independent
variables and target values as the dependent variable.
Develop machine learning model

Linear Regression
 Linear regression is the most commonly used regression model in machine learning.
 It may be defined as the statistical model that analyses the linear relationship between a dependent
variable with a given set of independent variables.
 A linear relationship between variables means that when the value of one or more independent variables
changes (increase or decrease), the value of the dependent variable will also change accordingly (increase
or decrease).
 Linear regression is further divided into two subcategories:
1. simple linear regression and
2. multiple linear regression (also known as multivariate linear regression).
Develop machine learning model

Simple Linear Regression

 In simple linear regression, a single independent variable (or predictor) is used to predict the dependent
variable.
 Mathematically, the simple linear regression can be represented as follows Y=mX+b
Where,
 Y: is the dependent variable we are trying to predict.
 X: is the dependent variable we are using to make predictions.
 m: is the slope of the regression line, which represents the effect X has on Y
 b: is a constant known as the Y-intercept. If X=0, Y would be equal to b
Develop machine learning model

Simple Linear Regression (Single feature and single target)

Square Feet (X) House Price (Y)

1300 240
1500 320
1700 330
1830 295
1550 256
2350 409
1450 319
Develop machine learning model

Simple Linear Regression Where

• Y is the dependent variable (target).

• X is the independent variable (feature).
• w0 is the y-intercept of the line.
• w1 is the slope of the line, representing the
effect of X on Y.
• ε is the error term, capturing the variability
in Y not explained by X.
Develop machine learning model

Simple Linear Regression

Develop machine learning model
Kilometers_driven Selling_price
1.1 39343
1.3 46205

Exercises (Simple Linear Regression ) 1.5

2
37731
43525
2.2 39891
1. Perform Data Preparation 2.9 56642
3 60150
2. Check the correlation in data 3.2 54445
3.2 64445
3. Check if the dataset is linear or not(check data dispersion) 3.7 57189
3.9 63218
4. Split the dataset into training and testing sets 4 55794
4 56957
5. Perform Model Training (Fitting the Simple Linear Regression to Training Set) 4.1 57081
4.5 61111
6. Perform Model Testing 4.9 67938
5.1 66029
7. Perform Model Evaluation(root_mean_squared_error, mean_absolute_error and r2_score ) 5.3 83088
5.9 81363
8. Visualize Training Set Results (with Regression Line) 6 93940
6.8 91738
9. Visualize the Test Set Results (with Regression Line) 7.1 98273
7.9 101302
10.Predict for new values 8.2 113812
8.7 109431
11.Find the intercept and slope 9 105582
9.5 116969
9.6 112635
10.3 122391
10.5 121872
Develop machine learning model

Multiple Linear Regression

Multiple linear regression is basically the extension of simple linear regression that predicts a
response using two or more features.
Develop machine learning model

Exercises (Multiple Linear Regression )

1. Perform Data Preparation
2. Check the correlation in data
3. Check if the dataset is linear or not(check data dispersion)
4. Split the dataset into training and testing sets
5. Perform Model Training (Fitting the Simple Linear Regression to Training Set)
6. Perform Model Testing
7. Perform Model Evaluation(root_mean_squared_error, mean_absolute_error and
r2_score)
8. Predict for new values
9. Find the intercept and slope
Develop machine learning model

Random Forest Regressor

 A random forest is an ensemble learning method that combines the predictions from multiple
decision trees to produce a more accurate and stable prediction.
 It is used for predicting numerical values.
 It is a type of supervised learning algorithm that can be used for both classification and regression
tasks.
 It predicts continuous values by averaging the results of multiple decision trees.
Develop machine learning model

Working of Random Forest Regression

 Random Forest Regression works by creating multiple of decision trees each trained on a random subset of the
data.
 After the trees are trained each tree make a prediction and the final prediction for regression tasks is the
average of all the individual tree predictions and this process is called as Aggregation.
 This approach is beneficial because individual decision trees may have high variance and are prone to
overfitting especially with complex data.
 However by averaging the predictions from multiple decision trees Random Forest minimizes this variance
leading to more accurate and stable predictions and hence improving generalization of model
Develop machine learning model

Applications of Random Forest Regression

 Predicting continuous numerical values: Predicting house prices, stock prices or customer lifetime value.
 Identifying risk factors: Detecting risk factors for diseases, financial crises or other negative events.
 Handling high-dimensional data: Analysing datasets with a large number of
input features.
 Capturing complex relationships: Modeling complex relationships between input features and the target
variable.
Develop machine learning model

Advantages of Random Forest Regression

 Handles Non-Linearity: It can capture complex, non-linear relationships in the data
that other models might miss.
 Reduces Overfitting: By combining multiple decision trees and averaging predictions
it reduces the risk of overfitting compared to a single decision tree.
 Robust to Outliers: Random Forest is less sensitive to outliers as it aggregates the
predictions from multiple trees.
 Works Well with Large Datasets: It can efficiently handle large datasets and high-
dimensional data without a significant loss in performance.
 Handles Missing Data: Random Forest can handle missing values by using surrogate
splits and maintaining high accuracy even with incomplete data.
 No Need for Feature Scaling: Unlike many other algorithms Random Forest does not
require normalization or scaling of the data.
Develop machine learning model

Quiz
1. What is the difference between random forest and regression?
2. Why is random forest better than regression?
Develop machine learning model

Decision Tree Regressor

 Unlike traditional linear regression, which assumes a straight-line relationship between input
features and the target variable, Decision Tree Regression is a non-linear regression method that can
handle complex datasets with intricate patterns.
 It uses a tree-like model to make predictions, making it both flexible and easy to interpret.
 Decision Tree Regression predicts continuous values.
 It does this by splitting the data into smaller subsets based on decision rules derived from the input
features.
 At leaf node of the tree the model predicts a continuous value which is typically the average of the
target values in that node.
Develop machine learning model

How It Works (Step-by-Step)

 Choose the Best Feature to Split
 The algorithm selects a feature that best splits the data into two or more subsets.
 It minimizes the variance within each subset to ensure better predictions.
 Split the Data Recursively
 The process repeats at each node, creating smaller subgroups.
 Each split aims to reduce the prediction error.
 Stop Splitting (Stopping Criteria)

 The tree stops growing when:

✅ A maximum depth is reached

✅ A minimum number of samples per leaf is met

✅ Further splitting does not improve predictions

 Make Predictions
 Each leaf node contains a numerical value (the average of training samples in that node).
 Given a new input, the model follows the decision path and returns the leaf value.
Develop machine learning model

Support Vector Regressor (SVR)

 Support vector regression (SVR) is a type of support vector machine (SVM) that is used for
regression tasks.
 It tries to find a function that best predicts the continuous output value for a given input value.
 SVR can use both linear and non-linear kernels.
End!

Supervised and Unsupervised Learning
No ratings yet
Supervised and Unsupervised Learning
92 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
S&ML Unit 5 - Q & A
No ratings yet
S&ML Unit 5 - Q & A
15 pages
Unit 6
No ratings yet
Unit 6
107 pages
ML Combined
No ratings yet
ML Combined
254 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
ML Unit3b
No ratings yet
ML Unit3b
175 pages
Class 8 - Linear Regression
No ratings yet
Class 8 - Linear Regression
56 pages
LR LogReg
No ratings yet
LR LogReg
53 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Machine Learning
No ratings yet
Machine Learning
100 pages
2.1 Regression Analysis
No ratings yet
2.1 Regression Analysis
28 pages
chp6 (10) Fam
No ratings yet
chp6 (10) Fam
24 pages
ML U2 Regression
No ratings yet
ML U2 Regression
20 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
IDA117V Supervised ML
No ratings yet
IDA117V Supervised ML
39 pages
LECTURE Regression
No ratings yet
LECTURE Regression
12 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
Regression Models: by Mayuri Bhandari
No ratings yet
Regression Models: by Mayuri Bhandari
64 pages
Module 4
No ratings yet
Module 4
41 pages
ML Unit-4
No ratings yet
ML Unit-4
65 pages
Machine Learning
No ratings yet
Machine Learning
115 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
ML Ch-2 Linear Models For Supervised Learning
No ratings yet
ML Ch-2 Linear Models For Supervised Learning
18 pages
Lecture-2 Unit 2
No ratings yet
Lecture-2 Unit 2
56 pages
Unit I
No ratings yet
Unit I
14 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
UNIT 3 Regression
No ratings yet
UNIT 3 Regression
5 pages
Week 7. Intro To ML. Regression
No ratings yet
Week 7. Intro To ML. Regression
24 pages
Priyajitdutta ML
No ratings yet
Priyajitdutta ML
9 pages
Machine Learning - Regression Notes
No ratings yet
Machine Learning - Regression Notes
9 pages
ML LN 3
No ratings yet
ML LN 3
44 pages
Improved Research Paper - Linear Regression in Market Mix Modelling
No ratings yet
Improved Research Paper - Linear Regression in Market Mix Modelling
8 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
Unit 2
No ratings yet
Unit 2
18 pages
Unit 3
No ratings yet
Unit 3
30 pages
AI Lab7
No ratings yet
AI Lab7
13 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
ML Unit
No ratings yet
ML Unit
23 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
12 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
Unit-Vi 2
No ratings yet
Unit-Vi 2
31 pages
ML Unit Ii
No ratings yet
ML Unit Ii
30 pages
Machine Learning With Python Algorithms
No ratings yet
Machine Learning With Python Algorithms
28 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Linear Regression
No ratings yet
Linear Regression
5 pages
Slide 1
No ratings yet
Slide 1
29 pages
Supervised Learning
No ratings yet
Supervised Learning
24 pages
Unit 2 - NOTES1 - ML
No ratings yet
Unit 2 - NOTES1 - ML
35 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Linear Regression
No ratings yet
Linear Regression
26 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Four Special Cases in Simplex
87% (15)
Four Special Cases in Simplex
51 pages
Integer Programming
No ratings yet
Integer Programming
29 pages
Bayesian Analysis With R For Drug Development Concepts, Algorithms, and Case Studies - 1st Edition Academic PDF Download
100% (17)
Bayesian Analysis With R For Drug Development Concepts, Algorithms, and Case Studies - 1st Edition Academic PDF Download
17 pages
The Work of John Nash in Game Theory - Nobel Seminar 1994
No ratings yet
The Work of John Nash in Game Theory - Nobel Seminar 1994
33 pages
10 Sharpe S Single Index Model
100% (1)
10 Sharpe S Single Index Model
27 pages
Chapter 2 - Autocorrelation
No ratings yet
Chapter 2 - Autocorrelation
16 pages
Portfolio Analysis Tools
100% (5)
Portfolio Analysis Tools
6 pages
A00-485 Dumps - Modeling Using SAS Visual Statistics
No ratings yet
A00-485 Dumps - Modeling Using SAS Visual Statistics
14 pages
FamaPaper Fm73replication Extension PDF
No ratings yet
FamaPaper Fm73replication Extension PDF
71 pages
How To Conduct Propensity Score Matching - An Introduction
No ratings yet
How To Conduct Propensity Score Matching - An Introduction
10 pages
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
No ratings yet
Bayesian Linear Regression For Posterior Predictive Distribution MATLAB
46 pages
Study Guide - Describing Data
No ratings yet
Study Guide - Describing Data
18 pages
TCH 206 - Statistics For Chemical Engineers
No ratings yet
TCH 206 - Statistics For Chemical Engineers
2 pages
Dietetics Study Data Between Calcium Intake and Knowledge About Calcium
No ratings yet
Dietetics Study Data Between Calcium Intake and Knowledge About Calcium
5 pages
Board Independence - 1
No ratings yet
Board Independence - 1
12 pages
Estimation and Detection Theory Lab Manual v2
No ratings yet
Estimation and Detection Theory Lab Manual v2
5 pages
4 - Statistics
No ratings yet
4 - Statistics
79 pages
On Naive Bayes Algorithm
No ratings yet
On Naive Bayes Algorithm
17 pages
On Asymptotic Distribution Theory in Segmented Regression Problems
No ratings yet
On Asymptotic Distribution Theory in Segmented Regression Problems
36 pages
Measuring Risk: Risk Management For Enterprises and Individuals
No ratings yet
Measuring Risk: Risk Management For Enterprises and Individuals
50 pages
Game Theory B With Dominance Principle
No ratings yet
Game Theory B With Dominance Principle
35 pages
Pham Gia2006
No ratings yet
Pham Gia2006
20 pages
20MIS1025 - Regression - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Regression - Ipynb - Colaboratory
5 pages
Ba Yes Thinking W FM
No ratings yet
Ba Yes Thinking W FM
5 pages
University of Toronto Scarborough Department of Computer and Mathematical Sciences December 2013 Sample Exam STAC50H3: Data Collection
No ratings yet
University of Toronto Scarborough Department of Computer and Mathematical Sciences December 2013 Sample Exam STAC50H3: Data Collection
8 pages
Chapter 2
No ratings yet
Chapter 2
14 pages
Dawson1998
No ratings yet
Dawson1998
10 pages
SPSS Trends 10.0
No ratings yet
SPSS Trends 10.0
2 pages
Predictor Coef SE Coef T P
No ratings yet
Predictor Coef SE Coef T P
6 pages
Sop Table
No ratings yet
Sop Table
3 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet

Machine Learning - Develop Machine Learning Model - Regression

Uploaded by

Machine Learning - Develop Machine Learning Model - Regression

Uploaded by

Develop machine learning model

 Before develop machine learning model:

 Why Do We Use random_state?

Why RMSE is used?

Why RMSE is used?

Why RMSE is used?

Why R-squared (R²)?

Terminologies in Regression Analysis

Types of Regression in Machine Learning

Types of Regression in Machine Learning

What is Linear Regression?

Simple Linear Regression

Simple Linear Regression (Single feature and single target)

Square Feet (X) House Price (Y)

Simple Linear Regression Where

• Y is the dependent variable (target).

Simple Linear Regression

Exercises (Simple Linear Regression ) 1.5

Multiple Linear Regression

Exercises (Multiple Linear Regression )

Random Forest Regressor

Working of Random Forest Regression

Applications of Random Forest Regression

Advantages of Random Forest Regression

Decision Tree Regressor

How It Works (Step-by-Step)

 The tree stops growing when:

✅ A maximum depth is reached

✅ A minimum number of samples per leaf is met

✅ Further splitting does not improve predictions

Support Vector Regressor (SVR)

You might also like