0% found this document useful (0 votes)
14 views29 pages

Advanced Regression

This document discusses probabilistic interpretations of regression and regularization techniques. It also covers using classification algorithms like decision trees, random forests, and support vector machines for regression tasks by adjusting their implementations to handle continuous target variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views29 pages

Advanced Regression

This document discusses probabilistic interpretations of regression and regularization techniques. It also covers using classification algorithms like decision trees, random forests, and support vector machines for regression tasks by adjusting their implementations to handle continuous target variables.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Advanced Regression

Probabilistic interpretation of LR. Classification


algorithms for regression
Contents
1. Bayesian explanation of regularized regression
2. Classification algorithms for regression
Bayesian regression
CLassical probabilistic view on linear regression
Consider we have n points Y drawn A good estimate of mean maximizes
i.i.d. from the normal that likelihood
distribution. The probability of
those points being drawn defines
the likelihood function which is
just a multiplication of their
densities in every points.
CLassical probabilistic view on linear regression
● Assume our mean is a function of predictors X

● Thus our target is distributed according to

● Estimating regression parameters given (2)


A probabilistic interpretation of regularization
● Using a Bayes theorem we can estimate the probability distribution of the
parameters θ given the data we observe Y

● That gives us an opportunity to set the a prior distribution of model


parameters
● Compare that with the classical method where we instead try to find the
best parameters to maximize the likelihood of parameters given the data
A probabilistic interpretation of regularization
● We maximize the posterior probability estimate using Bayes theorem
(Maximum A Posteriori estimation)

● Compare that to MLE estimate


A probabilistic interpretation of L2 regularization
● Assume our model parameters zero-mean normally distributed with t^2
variance (prior knowledge)

● Small variance t^2 (large λ) leads to reduction of coefficients. If we


have a large variance (small λ) the coefficients are not affected much.
A probabilistic interpretation of L1 regularization
● Laplace distribution with mean μ and diversity b defined by a probability
density function

Compare to
A probabilistic interpretation of L1 regularization
● Assume our model parameters zero-mean Laplace-distributed with diversity b

● L1 regularization promotes sparsity in comparison with “just reducing the


coefficients” in L2. That makes sense if you look at Laplacean density
where there is a sharp increase in x=mean.
Why L1 zero out coefficients whereas L2 does not?
Why L1 zero out coefficients whereas L2 does not?
● Laplace distribution (sharp in x=mean) vs Normal
distribution (smooth)
● Intuitive understanding through Gradient Descent
https://developers.google.com/machine-learning/crash-course/regulariz
ation-for-sparsity/l1-regularization
● Intuitive understanding through visualization in 2d case
https://explained.ai/regularization/L1vsL2.html
Generalized linear models
● What if we change the hypothesis

● https://scikit-learn.org/stable/modules/linear_model.html#generalized-line
ar-regression
Real world examples
Insurance cost Number of calls arriving in a call
(Tweedie distribution) center per hour
(Poisson distribution)
Classification algorithms for regression
KNN regressor
How to calculate continuous variable for KNN?
KNN regressor
How to calculate continuous variable for KNN?

● Intuitive – each object in training has known target value


● We have k neighbors for prediction – let’s average their target
value!
● We can use distancing

Pros:
● Simple, not many changes from
Classifier
Cons:
● All the cons of KNN
Decision tree regressor
How we can change decision tree to solve regression tasks?
Decision tree regressor
How we can change decision tree to solve regression tasks?
● Every leaf now contains the set of objects. Their average is the
prediction we are looking for.
● We have to use other, continuous
measures of information gain:
○ Variance (standard deviation)

Pros:
● Simplicity and interpretability
of DT
Cons:
● Limited set of predicted values
Random forest regressor
How we can change random forest to solve regression task?
Random forest regressor
How we can change random forest to solve regression task?

● Nothing has changed, just take decision tree regressor as basic


learner and average the result across estimators

Pros & cons:

● Everything is the same as in random forest classifier


Support Vector Machine
How this is going to work?
Support Vector Machine
How this is going to work?

● Reversing the SVM task: we create the plane, as narrow as


possible, which includes as many points as it can inside:

Minimize

Constraints
Gradient boosting
How do we use GB for regression tasks?
Gradient boosting
How do we use GB for regression tasks?

● Every new learner is fitted on error gradient with respect to


ensemble of previous learners
● That means we fit every new tree on residuals from previous step
Advanced Hyperparameter Tuning
Advanced Hyperparameter Tuning
Which technics do you already know?
Advanced Hyperparameter Tuning
Which technics do you already know?

● Blind pick
● Grid Search
● Random Search
Advanced Hyperparameter Tuning
● HyperOpt http://hyperopt.github.io/hyperopt/. The idea
behind can be explained through bayesian optimization

You might also like