0% found this document useful (0 votes)

9 views25 pages

Unit 3

The document discusses various data-based modeling techniques for prediction, focusing on regression models, including simple, multiple, and non-linear regression methods. It covers the mathematical foundations, assumptions, and applications of these models, such as polynomial, exponential, and logarithmic regression, as well as advanced techniques like splines and Kriging. Additionally, it differentiates between parametric and non-parametric non-linear regression approaches in machine learning.

Uploaded by

padminisuthakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views25 pages

Unit 3

Uploaded by

padminisuthakar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 25

Unit 3

Data based modelling for

prediction

Modelling for prediction introduction simple regression models,

nonlinear regression models, non-linear machine learning algorithm,
Distribution models: Model performance and validation, correlation and
casuality
Regression and Models
• Regression is a statistical analysis that allows inferring if there is a
relationship between two or more variables;
• the regression model the function that describes this relationship.
Regression analysis is a predictive modelling technique.
• Simple Regression Models
• Simple regression models include simple linear regression, which is the most
common form of regression analysis,
• multivariate linear regression,
• splines,
• multivariate adaptive regression splines,
• other functions (such as exponential, logarithmic, and polynomial functions),
• response surface regression,
• Kriging
Simple Linear Regression
• A simple linear regression model has an equation of the form y = mx + b
• explanatory or independent variable x and a dependent variable y.
• Simple linear regression models are often fitted using the linear least squares (LLS)
approach; the equation of the best line fitting the pairs ( x1, y1 ) , ( x2, y2 ) , …, ( xn, yn )
is obtained by following these steps: • Calculate the mean of the x-values and y-values:

Calculate the slope of the best line:

Calculate the y-intercept of the line: b = Y − mX

Linear Regression Question 1: Find the linear regression equation for the given data

x y

3 8

9 6

5 4

3 2

Calculate the following

x y x2 xy

∑x = 20 ∑y = 20 ∑x2 = 124 ∑xy = 104

Multiple Linear Regression and Multivariate
Linear Regression
• Multiple linear regression (MLR) is a statistical technique that uses two or
more independent variables to predict the outcome of one dependent
variable. The MLR model has the form
• The multivariate (multiple) linear regression (MvLR) model has the form
Exponential and Logarithmic Regression

• An exponential regression refers to best fitting a dataset to an exponential function.

• Y - response variable, x - predictor variable; a and bare the regression coefficients that
describe the relationship between y and x.
• The relative predictive power of an exponential model is denoted by R2; its value varies
between 0 and 1.
• Like exponential regression, logarithmic regression is used to model processes where
growth or decay accelerates rapidly at first and then slows over time. This regression
produces an equation of the form

• Y - response variable, x - the predictor variable; a and b are the regression coefficients
that describe the relationship between y and x.
Polynomial and Response Surface Regressions

• the relationship between the independent variable x and the dependent variable y is modelled as an nth-
degree polynomial in x.
• This regression produces an equation of the form
• ai are the coefficients of the polynomial terms, and n is the degree of the polynomial function. a0 is typically
referred to as the intercept.
• Polynomial regression is a special linear regression case since we fit the polyno-mial equation on the data
with a curvilinear relationship between the dependent and independent variables.
• The response surface regression (RSR) explores and finds the relationship between several independent
variables and one or more response or dependent variables.
• RSR produces a polynomial regression model with cross-product terms of variables denoting the
interaction between them. For instance, a response variable y, which depends on the variables x1, x2,and x3,
can be modelled using an RSR model with an equation of the form
Splines
• Spline is a function defined piecewise by polynomials. Spline Regression is a non-
parametric regression technique in which the dataset is divided into bins at intervals
called knots.
• Splines are polynomial segments strung together and joining at knots. This approach
allows smooth interpolating between knots.
• An efficient way to implement splines is to place more knots where we believe the
function might vary most quickly, and place fewer knots in the stable region.
• Nevertheless, in practice, it is common to place knots uniformly.
• This is done by specifying the desired degrees of freedom, and consequently, the
software places the knots at uniform quantiles of data.
• The desired degrees of freedom are set to minimize the residual sum of squares.
Multivariate Adaptive Regression Splines

• Algorithm designed for multivariate non-linear regression problems

• Regression problems are those where a model must predict a numerical value
• More than one of input variables
• Algorithm involves discovering a set of simple piecewise linear functions that characterize the data and
using them to aggregate to make a prediction
• Model is an ensemble of linear functions
• MARS captures the non-linear relationships in the data by assessing knots or cut-point. The algorithm
assesses each data point for each predictor as a knot and creates a linear regression model.
• The MARS algorithm first looks for the single point across the range of x values where two different
linear relationships between y and x achieve the smallest error, resulting in a hinge function h(x-a),
where a is the cut-point value.
• This procedure continues until all the knots are found, producing a non-linear prediction equation.
• Knots that do not significantly contribute to the model’s predictive accuracy are removed; this process is
known as pruning
• A hinge function takes the form
where c is a constant (knot).
Blue line represents predicted (y) values as a function of x for alternative approaches tomodeling explicit nonlinear regression
patterns. (A) Traditional linear regression approach does notcapture any nonlinearity unless the predictor or response is
transformed (i.e. log transformation). (B)Degree-2 polynomial, (C) Degree-3 polynomial, (D) Step function cutting x
into six categorical levels.
Kriging

• Kriging is a spatial interpolation method; it uses a limited set of sampled data points to
estimate the value of a variable by interpolation over a continuous spatial field.
• For instance, the average monthly carbon dioxide concentration over a city varies across
a random spatial field. It differs from other simple methods like linear regression or
splines since it uses the spatial correlation between sampled points to estimate the
variable’s value through interpolation in the spatial field.
• Kriging weights are estimated such that points close to the location of interest have
more weight than those located farther away.
• The Kriging procedure is performed in two steps:
• first, the spatial covariance structure of the sample points is fitted in a variogram;
• second, weights derived from this structure are used for interpolation in the spatial field.
• Covariance measures the direction of the relationship between two variables; thus, a
positive covariance indicates that both variables tend to be high or low simultaneously,
while a negative covariance means the opposite.
• A variogram is a visual representation of the covariance between each
pair of sampled data points.
• The gamma value (a measure of the half mean-square difference
between their values) is plotted against the distance (lag) between them
for each pair of points. We can choose between different variogram
models; the best-fitting models use different approaches such as least-
squares, maximum likelihood, and Bayesian methods.
• Kriging assumes (i) stationarity, which means that the joint probability
distribution does not vary across the space; and (ii) isotropy, or
uniformity in all directions.
• The Kriging interpolator is sensitive to the variogram model; moreover,
this regression method is limited if the number of data is limited in
spatial scope.
Non-linear Regression Models
• In simple linear regression, the linear model has two variables, x, the independent
variable, y, the dependent variable, and the parameters m and b
• We use a specific method to estimate the parameters of the model and apply a certain
criterion function:

where modifying above the estimated values of the dependent variable, and yi are the
measured values of the dependent variable. Here, we assumed that all the observations
are equally reliable;
otherwise, a weighted (w) sum of squares may be minimized:
Assumptions in NonLinear
Regression
• These assumptions are similar to those in linear regression but may
have nuanced interpretations due to the nonlinearity of the model.
Here are the key assumptions in nonlinear regression:
• Functional Form: The chosen nonlinear model correctly represents the true
relationship between the dependent and independent variables.
• Independence: Observations are assumed to be independent of each other.
• Homoscedasticity: The variance of the residuals (the differences between
observed and predicted values) is constant across all levels of the
independent variable.
• Normality: Residuals are assumed to be normally distributed.
• Multicollinearity: Independent variables are not perfectly correlated.
Types of Non-Linear Regression

• There are two main types of Non Linear regression in Machine Learning:
• Parametric non-linear regression
• assumes that the relationship between the dependent and independent variables can be
modeled using a specific mathematical function.
• For example, the relationship between the population of a country and time can be modeled
using an exponential function.
• Some common parametric non-linear regression models include: Polynomial regression,
Logistic regression, Exponential regression, Power regression etc.
• Non-parametric non-linear regression
• does not assume that the relationship between the dependent and independent variables can
be modeled using a specific mathematical function.
• Instead, it uses machine learning algorithms to learn the relationship from the data.
• Some common non-parametric non-linear regression algorithms include: Kernel smoothing,
Local polynomial regression, Nearest neighbor regression etc.
Non-Linear Regression Algorithms
• Nonlinear regression encompasses various types of models that capture relationships
betwee
• Polynomial Regression
• Polynomial regression is a type of nonlinear regression that fits a polynomial function to
the data. The general form of a polynomial regression model is:
• y=β0+β1X+β2X2+……….+βnXn
• y=β0+β1X+β2X2+……….+βnXn
• where,
• y : dependent variable
• X : independent variable
• β0,β1,…βnβ0,β1,…βn: parameters of the model
• n : degree of the polynomial
• n variables in a nonlinear manner.
• Exponential Regression
• Exponential regression is a type of nonlinear regression that fits an exponential function to the
data. The general form of an exponential regression model is:
• y=αe(βx)
• y=αe(βx)
• where,
• y – dependent variable
• X – independent variable
• α and β
• α and β – parameters of the model
• Logarithmic Regression
• Logarithmic regression is a type of nonlinear regression that fits a logarithmic function to the
data. The general form of a logarithmic regression model is:
• y=α+βln(x)
• y=α+βln(x)
• where,
• y – dependent variable
• X – independent variable
• Αandβ
• αandβ – parameters of the model
• Power Regression
• Power regression is a type of nonlinear regression that fits a power
function to the data. The general form of a power regression model is:
• y=αxβ
• y=αxβ
• where,
• y – dependent variable
• X – independent variable
• α and βα and β – parameters of the model
Generalized Additive Models
(GAMs)

DA unit-III
No ratings yet
DA unit-III
30 pages
BA3 4 5modules
No ratings yet
BA3 4 5modules
258 pages
Unit 2-1
No ratings yet
Unit 2-1
30 pages
DAV 2201079 Exp 2 2-1
No ratings yet
DAV 2201079 Exp 2 2-1
35 pages
ML Module3 Regression
No ratings yet
ML Module3 Regression
51 pages
AIML MSE 2 Notes
No ratings yet
AIML MSE 2 Notes
35 pages
Unit 2
No ratings yet
Unit 2
92 pages
Midterm 1 Exam A Answer Key
No ratings yet
Midterm 1 Exam A Answer Key
9 pages
Module 2
No ratings yet
Module 2
21 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
ML Unit-2 Final
No ratings yet
ML Unit-2 Final
32 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Module 4
No ratings yet
Module 4
41 pages
Applied Statistics For Social and Management Sciences 1st Edition by Abdul Quader Miah 9811003998 9789811003998instant Download
No ratings yet
Applied Statistics For Social and Management Sciences 1st Edition by Abdul Quader Miah 9811003998 9789811003998instant Download
49 pages
CIOT-701 Lab Manual DATA SCIENCE
No ratings yet
CIOT-701 Lab Manual DATA SCIENCE
64 pages
ML Using Python Unit3 PDF
No ratings yet
ML Using Python Unit3 PDF
8 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
BA Module 3 - As of 25th September 2020
No ratings yet
BA Module 3 - As of 25th September 2020
72 pages
Lecture3 221109 035214
No ratings yet
Lecture3 221109 035214
87 pages
Da Module 3
No ratings yet
Da Module 3
54 pages
Week 4
No ratings yet
Week 4
50 pages
Data Science
100% (1)
Data Science
14 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
Research Chapter 1 5
No ratings yet
Research Chapter 1 5
33 pages
Module 3
No ratings yet
Module 3
34 pages
Lec 20
No ratings yet
Lec 20
26 pages
Multiple Regression Analysis: Estimation
No ratings yet
Multiple Regression Analysis: Estimation
50 pages
Solution Manual-Chemical Engineering Thermodynamics - Smith Van Ness
87% (184)
Solution Manual-Chemical Engineering Thermodynamics - Smith Van Ness
621 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
Regress A o Linear
No ratings yet
Regress A o Linear
8 pages
Unit III
No ratings yet
Unit III
18 pages
Unit III
No ratings yet
Unit III
13 pages
Data Analytics Regression Unit III
No ratings yet
Data Analytics Regression Unit III
27 pages
AI Lec23
No ratings yet
AI Lec23
36 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Pharm D Syllabus
60% (5)
Pharm D Syllabus
120 pages
Chapter 11
No ratings yet
Chapter 11
18 pages
1 s2.0 S1389556712000457 Main
No ratings yet
1 s2.0 S1389556712000457 Main
14 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Kuliah 10 Simple Regression
No ratings yet
Kuliah 10 Simple Regression
16 pages
Business Statistics Formula - Sheet
100% (2)
Business Statistics Formula - Sheet
7 pages
18-Linear Regression
No ratings yet
18-Linear Regression
29 pages
Mod3 Eda
No ratings yet
Mod3 Eda
16 pages
Chapter 3 DEMAND ESTIMATION 3
No ratings yet
Chapter 3 DEMAND ESTIMATION 3
38 pages
Chap 3 Costs Concepts
No ratings yet
Chap 3 Costs Concepts
33 pages
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
No ratings yet
Statistical Analysis (SM 901B) Unit 2 - Regression: Goonjan Jain Department of Applied Mathematics DTU
19 pages
Unit-III (Data Analytics)
50% (2)
Unit-III (Data Analytics)
15 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Renewable Energy Engineering - Solar, Wind and Biomass Energy Systems
No ratings yet
Renewable Energy Engineering - Solar, Wind and Biomass Energy Systems
1 page
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Computational Fluid Dynamics Vol - II - Hoffmann
88% (8)
Computational Fluid Dynamics Vol - II - Hoffmann
479 pages
Paper Om Mixer
No ratings yet
Paper Om Mixer
25 pages
1.5.linear Regression
No ratings yet
1.5.linear Regression
5 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
Regression Notes
No ratings yet
Regression Notes
7 pages
Econometrics Jimma 1
No ratings yet
Econometrics Jimma 1
216 pages
Business Analytics
No ratings yet
Business Analytics
13 pages
SimpleMultipleLinearRegression FoundationalMathofAI S24
No ratings yet
SimpleMultipleLinearRegression FoundationalMathofAI S24
6 pages
Regression Notes
No ratings yet
Regression Notes
6 pages
Unit 2-PDA
No ratings yet
Unit 2-PDA
31 pages
DA-3rd Unit
No ratings yet
DA-3rd Unit
16 pages
Unit 5
No ratings yet
Unit 5
10 pages
Regression
No ratings yet
Regression
14 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Introductory Econometrics A Modern Approach 6th Edition Wooldridge Test Bank Compress
No ratings yet
Introductory Econometrics A Modern Approach 6th Edition Wooldridge Test Bank Compress
9 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Unit 2
No ratings yet
Unit 2
38 pages
Bios Tat
No ratings yet
Bios Tat
20 pages
Biofuels
100% (1)
Biofuels
46 pages
Regression For Everyone Vol. 1
No ratings yet
Regression For Everyone Vol. 1
25 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
22 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Unit 5-crt
No ratings yet
Unit 5-crt
23 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Chapter Three
No ratings yet
Chapter Three
35 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Introductory Statistics 8th Ed by Mann (PDFDrive) - 16-20
No ratings yet
Introductory Statistics 8th Ed by Mann (PDFDrive) - 16-20
5 pages
DR RS Process Engineering Thermodynamics - II-Course Content
No ratings yet
DR RS Process Engineering Thermodynamics - II-Course Content
18 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
A Review On Linear Regression Comprehensive in Machine Learning
No ratings yet
A Review On Linear Regression Comprehensive in Machine Learning
8 pages
Statistical Testing and Prediction Using Linear Regression: Abstract
No ratings yet
Statistical Testing and Prediction Using Linear Regression: Abstract
10 pages
Important Design Factors: Fixed Bed Adsorption
No ratings yet
Important Design Factors: Fixed Bed Adsorption
1 page
Combinepdf
No ratings yet
Combinepdf
8 pages
PSM Syllabus
No ratings yet
PSM Syllabus
13 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
11 pages
Linear With Polynomial Regression:Overview
No ratings yet
Linear With Polynomial Regression:Overview
3 pages
True/False: Chapter 10: Determining How Costs Behave
100% (1)
True/False: Chapter 10: Determining How Costs Behave
35 pages
The Simple Linear Regression Model: Specification and Estimation
No ratings yet
The Simple Linear Regression Model: Specification and Estimation
66 pages
Microsoft Word - PSPM 7SSMM405 - Course Outline 2014-15
No ratings yet
Microsoft Word - PSPM 7SSMM405 - Course Outline 2014-15
17 pages
Chaprter Vii Quantitative Methods For Planning and Control
No ratings yet
Chaprter Vii Quantitative Methods For Planning and Control
5 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
STA 342-TH9-Correlation and Regression
No ratings yet
STA 342-TH9-Correlation and Regression
13 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Exercises of Numerical Analysis
From Everand
Exercises of Numerical Analysis
Simone Malacrida
No ratings yet

Unit 3

Uploaded by

Unit 3

Uploaded by

Unit 3

Data based modelling for

Modelling for prediction introduction simple regression models,

Calculate the slope of the best line:

Calculate the y-intercept of the line: b = Y − mX

Calculate the following

∑x = 20 ∑y = 20 ∑x2 = 124 ∑xy = 104

• An exponential regression refers to best fitting a dataset to an exponential function.

• Algorithm designed for multivariate non-linear regression problems

You might also like