0% found this document useful (0 votes)
9 views25 pages

Unit 3

The document discusses various data-based modeling techniques for prediction, focusing on regression models, including simple, multiple, and non-linear regression methods. It covers the mathematical foundations, assumptions, and applications of these models, such as polynomial, exponential, and logarithmic regression, as well as advanced techniques like splines and Kriging. Additionally, it differentiates between parametric and non-parametric non-linear regression approaches in machine learning.

Uploaded by

padminisuthakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views25 pages

Unit 3

The document discusses various data-based modeling techniques for prediction, focusing on regression models, including simple, multiple, and non-linear regression methods. It covers the mathematical foundations, assumptions, and applications of these models, such as polynomial, exponential, and logarithmic regression, as well as advanced techniques like splines and Kriging. Additionally, it differentiates between parametric and non-parametric non-linear regression approaches in machine learning.

Uploaded by

padminisuthakar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Unit 3

Data based modelling for


prediction

Modelling for prediction introduction simple regression models,


nonlinear regression models, non-linear machine learning algorithm,
Distribution models: Model performance and validation, correlation and
casuality
Regression and Models
• Regression is a statistical analysis that allows inferring if there is a
relationship between two or more variables;
• the regression model the function that describes this relationship.
Regression analysis is a predictive modelling technique.
• Simple Regression Models
• Simple regression models include simple linear regression, which is the most
common form of regression analysis,
• multivariate linear regression,
• splines,
• multivariate adaptive regression splines,
• other functions (such as exponential, logarithmic, and polynomial functions),
• response surface regression,
• Kriging
Simple Linear Regression
• A simple linear regression model has an equation of the form y = mx + b
• explanatory or independent variable x and a dependent variable y.
• Simple linear regression models are often fitted using the linear least squares (LLS)
approach; the equation of the best line fitting the pairs ( x1, y1 ) , ( x2, y2 ) , …, ( xn, yn )
is obtained by following these steps: • Calculate the mean of the x-values and y-values:

Calculate the slope of the best line:

Calculate the y-intercept of the line: b = Y − mX


Linear Regression Question 1: Find the linear regression equation for the given data

x y

3 8

9 6

5 4

3 2

Calculate the following


x y x2 xy

∑x = 20 ∑y = 20 ∑x2 = 124 ∑xy = 104


Multiple Linear Regression and Multivariate
Linear Regression
• Multiple linear regression (MLR) is a statistical technique that uses two or
more independent variables to predict the outcome of one dependent
variable. The MLR model has the form
• The multivariate (multiple) linear regression (MvLR) model has the form
Exponential and Logarithmic Regression

• An exponential regression refers to best fitting a dataset to an exponential function.

• Y - response variable, x - predictor variable; a and bare the regression coefficients that
describe the relationship between y and x.
• The relative predictive power of an exponential model is denoted by R2; its value varies
between 0 and 1.
• Like exponential regression, logarithmic regression is used to model processes where
growth or decay accelerates rapidly at first and then slows over time. This regression
produces an equation of the form

• Y - response variable, x - the predictor variable; a and b are the regression coefficients
that describe the relationship between y and x.
Polynomial and Response Surface Regressions

• the relationship between the independent variable x and the dependent variable y is modelled as an nth-
degree polynomial in x.
• This regression produces an equation of the form
• ai are the coefficients of the polynomial terms, and n is the degree of the polynomial function. a0 is typically
referred to as the intercept.
• Polynomial regression is a special linear regression case since we fit the polyno-mial equation on the data
with a curvilinear relationship between the dependent and independent variables.
• The response surface regression (RSR) explores and finds the relationship between several independent
variables and one or more response or dependent variables.
• RSR produces a polynomial regression model with cross-product terms of variables denoting the
interaction between them. For instance, a response variable y, which depends on the variables x1, x2,and x3,
can be modelled using an RSR model with an equation of the form
Splines
• Spline is a function defined piecewise by polynomials. Spline Regression is a non-
parametric regression technique in which the dataset is divided into bins at intervals
called knots.
• Splines are polynomial segments strung together and joining at knots. This approach
allows smooth interpolating between knots.
• An efficient way to implement splines is to place more knots where we believe the
function might vary most quickly, and place fewer knots in the stable region.
• Nevertheless, in practice, it is common to place knots uniformly.
• This is done by specifying the desired degrees of freedom, and consequently, the
software places the knots at uniform quantiles of data.
• The desired degrees of freedom are set to minimize the residual sum of squares.
Multivariate Adaptive Regression Splines

• Algorithm designed for multivariate non-linear regression problems


• Regression problems are those where a model must predict a numerical value
• More than one of input variables
• Algorithm involves discovering a set of simple piecewise linear functions that characterize the data and
using them to aggregate to make a prediction
• Model is an ensemble of linear functions
• MARS captures the non-linear relationships in the data by assessing knots or cut-point. The algorithm
assesses each data point for each predictor as a knot and creates a linear regression model.
• The MARS algorithm first looks for the single point across the range of x values where two different
linear relationships between y and x achieve the smallest error, resulting in a hinge function h(x-a),
where a is the cut-point value.
• This procedure continues until all the knots are found, producing a non-linear prediction equation.
• Knots that do not significantly contribute to the model’s predictive accuracy are removed; this process is
known as pruning
• A hinge function takes the form
where c is a constant (knot).
Blue line represents predicted (y) values as a function of x for alternative approaches tomodeling explicit nonlinear regression
patterns. (A) Traditional linear regression approach does notcapture any nonlinearity unless the predictor or response is
transformed (i.e. log transformation). (B)Degree-2 polynomial, (C) Degree-3 polynomial, (D) Step function cutting x
into six categorical levels.
Kriging

• Kriging is a spatial interpolation method; it uses a limited set of sampled data points to
estimate the value of a variable by interpolation over a continuous spatial field.
• For instance, the average monthly carbon dioxide concentration over a city varies across
a random spatial field. It differs from other simple methods like linear regression or
splines since it uses the spatial correlation between sampled points to estimate the
variable’s value through interpolation in the spatial field.
• Kriging weights are estimated such that points close to the location of interest have
more weight than those located farther away.
• The Kriging procedure is performed in two steps:
• first, the spatial covariance structure of the sample points is fitted in a variogram;
• second, weights derived from this structure are used for interpolation in the spatial field.
• Covariance measures the direction of the relationship between two variables; thus, a
positive covariance indicates that both variables tend to be high or low simultaneously,
while a negative covariance means the opposite.
• A variogram is a visual representation of the covariance between each
pair of sampled data points.
• The gamma value (a measure of the half mean-square difference
between their values) is plotted against the distance (lag) between them
for each pair of points. We can choose between different variogram
models; the best-fitting models use different approaches such as least-
squares, maximum likelihood, and Bayesian methods.
• Kriging assumes (i) stationarity, which means that the joint probability
distribution does not vary across the space; and (ii) isotropy, or
uniformity in all directions.
• The Kriging interpolator is sensitive to the variogram model; moreover,
this regression method is limited if the number of data is limited in
spatial scope.
Non-linear Regression Models
• In simple linear regression, the linear model has two variables, x, the independent
variable, y, the dependent variable, and the parameters m and b
• We use a specific method to estimate the parameters of the model and apply a certain
criterion function:

where modifying above the estimated values of the dependent variable, and yi are the
measured values of the dependent variable. Here, we assumed that all the observations
are equally reliable;
otherwise, a weighted (w) sum of squares may be minimized:
Assumptions in NonLinear
Regression
• These assumptions are similar to those in linear regression but may
have nuanced interpretations due to the nonlinearity of the model.
Here are the key assumptions in nonlinear regression:
• Functional Form: The chosen nonlinear model correctly represents the true
relationship between the dependent and independent variables.
• Independence: Observations are assumed to be independent of each other.
• Homoscedasticity: The variance of the residuals (the differences between
observed and predicted values) is constant across all levels of the
independent variable.
• Normality: Residuals are assumed to be normally distributed.
• Multicollinearity: Independent variables are not perfectly correlated.
Types of Non-Linear Regression

• There are two main types of Non Linear regression in Machine Learning:
• Parametric non-linear regression
• assumes that the relationship between the dependent and independent variables can be
modeled using a specific mathematical function.
• For example, the relationship between the population of a country and time can be modeled
using an exponential function.
• Some common parametric non-linear regression models include: Polynomial regression,
Logistic regression, Exponential regression, Power regression etc.
• Non-parametric non-linear regression
• does not assume that the relationship between the dependent and independent variables can
be modeled using a specific mathematical function.
• Instead, it uses machine learning algorithms to learn the relationship from the data.
• Some common non-parametric non-linear regression algorithms include: Kernel smoothing,
Local polynomial regression, Nearest neighbor regression etc.
Non-Linear Regression Algorithms
• Nonlinear regression encompasses various types of models that capture relationships
betwee
• Polynomial Regression
• Polynomial regression is a type of nonlinear regression that fits a polynomial function to
the data. The general form of a polynomial regression model is:
• y=β0+β1X+β2X2+……….+βnXn
• y=β0​+β1​X+β2​X2+……….+βn​Xn
• where,
• y : dependent variable
• X : independent variable
• β0,β1,…βnβ0​,β1​,…βn​: parameters of the model
• n : degree of the polynomial
• n variables in a nonlinear manner.
• Exponential Regression
• Exponential regression is a type of nonlinear regression that fits an exponential function to the
data. The general form of an exponential regression model is:
• y=αe(βx)
• y=αe(βx)
• where,
• y – dependent variable
• X – independent variable
• α and β
• α and β – parameters of the model
• Logarithmic Regression
• Logarithmic regression is a type of nonlinear regression that fits a logarithmic function to the
data. The general form of a logarithmic regression model is:
• y=α+βln(x)
• y=α+βln(x)
• where,
• y – dependent variable
• X – independent variable
• Αandβ
• αandβ – parameters of the model
• Power Regression
• Power regression is a type of nonlinear regression that fits a power
function to the data. The general form of a power regression model is:
• y=αxβ
• y=αxβ
• where,
• y – dependent variable
• X – independent variable
• α and βα and β – parameters of the model
Generalized Additive Models
(GAMs)

You might also like