0% found this document useful (0 votes)

37 views72 pages

MI - Unit 5

Unit 5 of the Machine Intelligence course covers resampling techniques such as cross-validation and bootstrapping for model selection and assessment. It discusses various methods including the validation set approach, leave-one-out cross-validation, and k-fold cross-validation, highlighting their advantages and disadvantages. Additionally, it addresses linear model selection strategies like subset selection, shrinkage methods, and dimension reduction to improve prediction accuracy and interpretability.

Uploaded by

ersenthilprabhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views72 pages

MI - Unit 5

Uploaded by

ersenthilprabhu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 72

19CSCN1602 – Machine Intelligence

Unit –5
Prabhu K, AP(SS)/CSE
Unit -V

Resampling and Model Selection

Resampling: Cross Validation, Bootstrapping – Linear Model Selection: Subset

selection, Shrinkage methods, Dimension reduction methods – High
Dimensional data.
Resampling: Cross Validation
Resampling

• Resampling involve repeatedly drawing samples from a training set and

refitting a model of interest on each sample in order to obtain additional
information about the fitted model.
• It involve fitting the same statistical method multiple times using different
subsets of the training data.
• It is computationally expensive: because it involve fitting the same statistical
method multiple times using different subsets of the training data

• The process of evaluating a model’s performance is known as model

assessment

• The process of selecting the proper level of flexibility for a model is known as
model selection
Resampling
• Two common methods of Resampling are
• Cross Validation
• The Validation Set Approach
• K-Fold cross validation
• Leave one out cross validation
• Bootstrapping
Cross-Validation

• Cross-Validation is used to estimate the test error associated with a

model to evaluate its performance.
• Different methods under Cross Validation are
• The Validation Set Approach or Holdout cross-validation
• Leave one out cross validation
• K-Fold cross validation
The Validation Set Approach or Holdout cross-
validation
• It involves randomly dividing the available set of observations into two
parts, a training set and a validation set or hold-out set
• The model is fit on the training set and the fitted model is used to make
predictions on the validation set.
• The resulting validation set error provides an estimate of the test error
assessed using MSE in the case of a quantitative response and
misclassification rate in the case of a qualitative (discrete) value
Example: automobile data

• Want to compare linear vs higher-order polynomial terms in a linear

regression

• We randomly split the 392 observations into two sets, a training set
containing 196 of the data points, and a validation set containing the
remaining 196 observations.
The Validation Set Approach or Holdout
cross-validation
• Advantage

• Very simple and easy to implement

• Can simply choose the method with the best validation score

• Disadvantages

• the validation estimate of the test error rate can be highly variable, depending

on precisely which observations are included in the training set and which

observations are included in the validation set.

• In the validation approach, only a subset of the observations — those that are

included in the training set rather than in the validation set — are used to fit

the model so validation set error rate may tend to overestimate the test error

rate for the model fit on the entire data set.

II
Leave-one-out-cross-validation(LOOCV)

• LOOCV is a better option than the validation set approach.

• Instead of splitting the entire dataset into two halves only one
observation is used for validation and the rest is used to fit the model.
• The statistical learning method is fit on the n − 1 training observations

• a prediction is made for the excluded observation using its value x1

• Though MSE1 is unbiased for the test error, it is a poor estimate because it is
highly variable, since it is based upon a single observation (x1, y1).

• repeat the procedure by selecting (x2, y2) for the validation data, training the
statistical learning procedure on the n − 1 observations

{(x1, y1),(x3, y3),...,(xn, yn)} computing MSE2 = ( y2− )2.

• Repeating this approach n times produces n squared errors, MSE1,..., MSEn.

• The LOOCV estimate for the test MSE is the average of these n test error
estimates:
Leave-one-out-cross-validation
• Advantages
• it has far less bias
• the LOOCV approach tends not to overestimate the test error rate
• performing LOOCV multiple times will always yield the same results
• Disadvantage
• LOOCV is expensive to implement since the model has to be fit n times
Hold out or Validation set approach
Vs LOOCV

Advantage Disadvantage
Holdout/ Cheap Variance: Unreliable
Validation set estimate of future
approach performance

Leave one out Doesn’t waste data Expensive

k-fold cross-validation
• This approach involves randomly dividing the set of observations into k folds of
nearly equal size.
• The first fold is treated as a validation set and the model is fit on the remaining
folds.
• The procedure is then repeated k times, where a different group each time is
treated as the validation set.
• The k-fold CV estimate is computed by averaging these value

• LOOCV is a special case of k-fold CV in which k is set to equal n

• Computational is the advantage of using k = 5 or k = 10 rather than k = n

• performing 10-fold CV requires fitting the learning procedure only ten times
which may be much more feasible
How many folds are needed?
• With a large number of folds
• The bias of the true error rate estimator will be small
• The variance of the true error rate estimator will be large
• The computational time will be very large
• With a small number of folds
• The number of experiments and computation time are reduced
• The variance of the estimator will be small
• The bias of the estimator will be large
• In practice the choice of the number of folds depends on the size of the dataset
• For larger dataset ,even 3-fold cross validation will be quite accurate
• For very sparse dataset ,we may have to use leave one out in order to train
on as many examples as possible
• A common choice for K-fold cross validation is K=10
Bias-Variance Trade-Off for k-Fold Cross-
Validation
• Validation set approach
• Overestimates the test error rate as training set contains only half
the observations of the entire data set
• LOOCV
• unbiased estimates of the test error as each training set contains n −
1 observations
• k-fold CV
• k = 5 or k = 10 intermediate level of bias as each training set
contains (k − 1)n/k observations
Bias-Variance Trade-Off for k-Fold Cross-
Validation
• LOOCV
• averaging the outputs of n fitted models, each of which is trained on an
almost identical set of observations therefore, outputs are highly
(positively) correlated with each other
• k-fold CV with k<n
• averaging the outputs of k fitted models that are somewhat less correlated
with each other , since the overlap between the training sets in each model is
smaller
• The mean of many highly correlated quantities has higher variance than does the
mean of many quantities that are not as highly correlated
• LOOCV have higher variance than k-fold CV.
• k-fold cross-validation with k = 5 or k = 10 suffer neither from excessively high
bias nor from very high variance.
Bias-Variance Trade-Off for k-Fold Cross-
Validation
Method Bias Variance

Validation overestimates of the test -

set approach error rate

LOOCV unbiased estimates of the test higher variance

error.

k-fold CV k-fold cross-validation using

For k = 5 or k = 10 have
k = 5 or k = 10 have neither
intermediate level of bias high bias nor high variance.
Cross-Validation on Classification
Problems

In the classification , the LOOCV error rate takes the form

where Erri = I(yi ≠ )
Bootstrapping
The Bootstrap

• The bootstrap is a flexible and powerful statistical tool that can be used to
quantify the uncertainty associated with a given estimator or statistical
learning method.

• For example, it can be used to estimate the standard errors of the

coefficients from a linear regression fit
Bootstrap
• The bootstrap is a resampling technique
with replacement
• From a dataset with N examples
• Randomly select (with replacement)N
examples and use this set for training
• The remaining examples that were not
selected for training are used for
testing
• This value is likely to change fold to
fold
• Repeat this process for a specified number
of folds(k)
• The true error is estimated as the average
error rate on test data
Example

• Assume a small dataset x={3,5,2,1,7} and we want to compute the bias and variance of the
sample mean =3.6

• We generate a number of bootstrap samples(three in this case)

• Assume that the first bootstrap yields the dataset{7,3,2,3,1}

• Compute the sample mean 𝞪*1=3.2

• The second bootstrap yields the dataset{5,1,1,3,7}

• Compute the sample mean 𝞪*2=3.4

• The third bootstrap yields the dataset{2,2,7,1,3}

• Compute the sample mean 𝞪*3=3.0

• Average these estimates and obtain an average of 𝞪*𝞱=3.2

• Bias()=3.2-3.6=-0.4

• Var()=1/2*(3.2-3.2)2+(3.4-3.2)2+(3.0-3.2)2=0.04
A graphical illustration of the bootstrap approach on a small sample containing n = 3
observations. Each bootstrap data set contains n observations, sampled with
replacement from the original data set. Each bootstrap data set is used to obtain an
estimate of α
A simple example

• To invest a fixed sum of money in two financial assets that yield returns of X
and Y , respectively, where X and Y are random quantities.
• We will invest a fraction α of our money in X, and will invest the remaining
1 − α in Y .
• choose α to minimize the total risk, or variance, of our investment. i.e
minimize Var(αX + (1 − α)Y ).
• The value that minimizes the risk is given by
𝞼 2 𝑌 − 𝞼 𝑋𝑌
𝞪=
𝞼 2 𝑋+𝞼 2 𝑌 − 2𝞼 𝑋𝑌

where σ 2 X = Var(X), σ2 Y = Var(Y ), and σXY = Cov(X, Y ).

Example Contd

• But the values of σ 2 X, σ 2 Y , and σXY are unknown.

• We can compute estimates for these quantities, 2

,
X
2
Y , and XY , using a
data set that contains measurements for X and Y .

• We can then estimate the value of α that minimizes the variance of our
investment using

^ 2𝑌 −𝞼
𝞼 ^ 𝑋𝑌
𝞪=
^ 2 𝑋+ 𝞼
𝞼 ^ 2 𝑌 − 2𝞼
^ 𝑋𝑌
Each panel displays 100 simulated returns for investments X and Y . From left
to right and top to bottom, the resulting estimates for α are 0.576, 0.532,
0.657, and 0.651.
Example Contd

• To estimate the standard deviation of , we repeated the process of

simulating 100 paired observations of X and Y , and estimating α 1,000
times.

• We thereby obtained 1,000 estimates for α, which we can call 1, 2, . . . ,

1000 .

• For these simulations the parameters were set to σ 2 X = 1, σ2 Y = 1.25, and

σXY = 0.5, and so we know that the true value of α is 0.6.

Example Contd

• The mean over all 1,000 estimates for α is

very close to α = 0.6, and the standard deviation of the estimates is

√
1000
1
∑
1000− 1 𝑟 =1
( 𝞪^ −𝞪 ) 2=0.083

• This gives us a very good idea of the accuracy of : SE() ≈ 0.083.

• So roughly speaking, for a random sample from the population, we would
expect to differ from α by approximately 0.08, on average
Left: A histogram of the estimates of α obtained by generating 1,000 simulated
data sets from the true population. Center: A histogram of the estimates of α
obtained from 1,000 bootstrap samples from a single data set. Right: The estimates
of α displayed in the left and center panels are shown as boxplots. In each panel,
the pink line indicates the true value of α
A graphical illustration of the bootstrap approach on a small sample containing n = 3
observations. Each bootstrap data set contains n observations, sampled with
replacement from the original data set. Each bootstrap data set is used to obtain an
estimate of α
Linear Model Selection
Linear Model Selection

• In the regression setting, the standard linear model

• Y = β0 + β1X1 + ··· + βpXp + ℇ

• describes the relationship between a response Y and a set of variables

X1, X2,...,Xp

• This simple linear model can be improved, by replacing plain least squares fitting

with some alternative fitting procedures.

• Alternative fitting procedures can yield better prediction accuracy and model

interpretability
Linear Model Selection
Prediction Accuracy
• If the true relationship between the response and the predictors is approximately
linear, the least squares estimates will have low bias
• If n>>p
• Where n number of observations and p the number of variables
• Least square estimate have low variance and will perform well on test
observations
• If n is not much larger than p
• Least squares fit result in overfitting
• consequently poor predictions on future observations not used in model training.
• If p>n
• There is no unique least squares coefficient estimate.
• The variance is infinite
Linear Model Selection
• Model Interpretability
• some or many of the variables used in a multiple regression model are not
associated with the response
• Including such irrelevant variables leads to unnecessary complexity in the
resulting model
• remove these variables by setting the corresponding coefficient estimates to
zero
• least squares is extremely unlikely to yield any coefficient estimates that are
exactly zero
• Need approaches for automatically performing feature selection or variable
selection,that is for excluding irrelevant variables from a multiple regression model
• Alternatives to using least squares to fit are
• Subset Selection
• This approach involves identifying a subset of the p predictors that we believe
to be related to the response
• Fit a model using least squares on the reduced set of variables.
• Shrinkage or regularization
• This approach involves fitting a model involving all p predictors
• the estimated coefficients are shrunken towards zero relative to the least
squares estimates
• It performs variable selection
• Dimension Reduction.
• involves projecting the p predictors into a M-dimensional subspace where M <p
• Achieved by computing M different linear combinations, or projections, of the
variables
• these M projections are used as predictors to fit a linear regression model by
least squares
Subset Selection
• Involves identifying a subset of the p predictors that believe to be related
to the response

• Then fit a model using least squares on the reduced set of variables

• Best subset selection

• Fit a separate least squares regression for each possible combination

of the p predictors
• Fit all p models that contain exactly one predictor , all = p(p−1)/2
models that contain exactly two predictors, and so forth
• Then look at all of the resulting models, with the goal of identifying
the one that is best
Algorithm-Best subset selection

• 1. Let M0 denote the null model, which contains no predictors. This model
simply predicts the sample mean for each observation.

• 2. For k = 1, 2,...p:

• (a) Fit all models that contain exactly k predictors.

• (b) Pick the best among these models, and call it Mk. Here best is
defined as having the smallest RSS, or equivalently largest R2.

• 3. Select a single best model from among M0,...,Mp using cross validated

prediction error, Cp (AIC), BIC, or adjusted R2.

For each possible model containing a subset of the ten predictors in the
Credit data set, the RSS and R2 are displayed. The red frontier tracks the
best model for a given number of predictors, according to RSS and R2.
Drawback of Subset Selection

• Computational limitations
• The number of possible models that must be considered grows rapidly as
p increases
• 2p models that involve subsets of p predictors
• If p = 10 approximately 1,000 possible models to be considered
• If p = 20, then there are over one million possibilities
• Computationally infeasible for values of p greater than around 40
Stepwise Selection
• For computational reasons, best subset selection cannot be applied with
very large p
• The larger the search space, the higher the chance of finding models that
look good on the training data
• an enormous search space can lead to overfitting and high variance of the
coefficient estimates.
• For both of these reasons, stepwise methods, which explore a far more
restricted set of models, are attractive alternatives to best subset selection.
• Stepwise Selection methods
• Forward Stepwise Selection
• Backward Stepwise Selection
Forward Stepwise Selection

• Forward stepwise selection is a computationally efficient alternative to

best subset selection

• Forward stepwise selection begins with a model containing no predictors

• then adds predictors to the model, one-at-a-time, until all of the

predictors are in the model

• at each step the variable that gives the greatest additional improvement
to the fit is added to the model
Algorithm - Forward stepwise selection

• 1. Let M0 denote the null model, which contains no predictors.

• 2. For k = 0,...,p − 1:

• (a) Consider all p − k models that augment the predictors in Mk with

one additional predictor.

• (b) Choose the best among these p − k models, and call it Mk+1.Here
best is defined as having smallest RSS or highest R2.

• 3. Select a single best model from among M0,...,Mp using crossvalidated

prediction error, Cp (AIC), BIC, or adjusted R2
Forward Stepwise Selection

• It involves fitting one null model, along with p − k models in the kth
iteration, for k = 0,...,P − 1
• This amounts to a total of 1 + =1+p(p+1)/2 models
• When p = 20, best subset selection requires fitting 1,048,576 models,
whereas forward stepwise selection requires fitting only 211 models.
• It is not guaranteed to find the best possible model out of all 2p models
containing subsets of the p predictors
Example

The first four selected models for best subset selection

and forward stepwise selection on the Credit data set.
Backward Stepwise Selection

• It begins with the full least squares model containing all p predictors, and
then iteratively removes the least useful predictor one-at-a-time

• The backward selection approach searches through only 1+p(p+ 1)/2

models

• Backward stepwise selection is not guaranteed to yield the best model

containing a subset of the p predictors
Algorithm - Backward stepwise selection

• 1. Let Mp denote the full model, which contains all p predictors.

• 2. For k = p, p − 1,..., 1:

• (a) Consider all k models that contain all but one of the predictors in
Mk, for a total of k − 1 predictors.

• (b) Choose the best among these k models, and call it Mk−1. Here best is
defined as having smallest RSS or highest R2.

• 3. Select a single best model from among M0,...,Mp using crossvalidated

prediction error, Cp (AIC), BIC, or adjusted R2.
Forward stepwise selection Vs Backward
stepwise selection

• Backward selection requires that the number of samples n is larger

than the number of variables p

• Forward stepwise can be used even when n<p

Hybrid Approaches

• Variables are added to the model sequentially, in analogy to forward

selection

• After adding each new variable, the method may also remove any
variables that no longer provide an improvement in the model fit
Choosing the Optimal Model

• Two common approaches to select the best model with respect to test
error are
• Indirectly estimate test error by making an adjustment to the training
error to account for the bias due to overfitting.
• Directly estimate the test error, using either a validation set approach
or a cross-validation approach
Choosing the Optimal Model
• Cp
• AIC(Akaike information criterion)
• BIC(Bayesian information criterion)
• Adjusted R2

• where ˆσ2 is an estimate of the variance of the error associated with each
response measurement in Y = β0 + β1X1 + ··· + βpXp + ℇ
• the Cp statistic adds a penalty of 2dσˆ2 to the training RSS in order to adjust
for the fact that the training error tends to underestimate the test error
• the penalty increases as the number of predictors in the model increases;
this is intended to adjust for the corresponding decrease in training RSS
• choose the model with the lowest Cp value
Choosing the Optimal Model

• The AIC criterion is defined for a large class of models fit by maximum
likelihood. AIC is given by

• For least squares models, Cp and AIC are proportional to each other

• BIC is derived from a Bayesian point of view, but ends up looking similar to
Cp and AIC.
• For the least squares model with d predictors BIC is given by

• BIC will tend to take on a small value for a model with a low test
error so select the model that has the lowest BIC value
Choosing the Optimal Model
• For a least squares model with d variables, the adjusted R2 statistic is
calculated as

• a large value of adjusted R2 indicates a model with a small test

error
Validation and Cross-Validation
• compute the validation set error or the cross-validation error for each model
under consideration, and then select the model for which the resulting estimated
test error is smallest
• it provides a direct estimate of the test error, and makes fewer assumptions about
the true underlying model
Shrinkage Methods
Shrinkage Methods

• The subset selection methods involve using least squares to fit a linear
model that contains a subset of the predictors

• As an alternative, we can fit a model containing all p predictors using a

technique that constrains or regularizes the coefficient estimates , or
equivalently, that shrinks the coefficient estimates towards zero

• The two best-known techniques for shrinking the regression coefficients

towards zero are
• ridge regression

• lasso
Ridge Regression
• Ridge regression is very similar to least squares, except that the
coefficients are estimated by minimizing a slightly different quantity
• The ridge regression coefficient estimates R
are the values that
minimize 2
+λ
𝑝
= RSS + λ ∑ β 𝑗 2
𝑗=1

• where λ ≥ 0 is a tuning parameter

• the second term, λ βj 2 called a shrinkage penalty is small when

β1,...,βp are close to zero,

• it has the effect of shrinking the estimates of βj towards zero

Ridge Regression
2
+λ
𝑝
= RSS + λ ∑ β 𝑗 2
𝑗=1

• The tuning parameter λ serves to control the relative impact of these two
terms on the regression coefficient estimates.
• When λ = 0, the penalty term has no effect, and ridge regression will
produce the least squares estimates
• λ→ ∞, the impact of the shrinkage penalty grows, and the ridge
regression coefficient estimates will approach zero
• ridge regression will produce a different set of coefficient estimates, R λ ,
for each value of λ
The standardized ridge regression coefficients are displayed for
the Credit data set, as a function of λ and || λ R ||2/|| ||2.
Why Does Ridge Regression Improve Over Least
Squares?

• As λ increases, the flexibility of the ridge regression fit decreases, leading

to decreased variance but increased bias

• When λ = 0, the variance is high but there is no bias

The Lasso
• Disadvantage of Ridge regression

• ridge regression will include all p predictors in the final model

• The penalty λ j will shrink all of the coefficients towards zero,

but it will not set any of them exactly to zero

• it can create a challenge in model interpretation in settings ,in

which the number of variables p is quite large

• Increasing the value of λ will tend to reduce the magnitudes of

the coefficients, but will not result in exclusion of any of the
variables.
The Lasso
• The lasso is a relatively recent alternative to ridge regression
that over comes this disadvantage.
• The lasso coefficients, , minimize the quantity

2
+λ

𝑝
= RSS + λ ∑ |β 𝑗|
𝑗=1
The Lasso

• In the case of the lasso, the ℓ1 penalty has the effect of

forcing some of the coefficient estimates to be exactly
equal to zero when the tuning parameter λ is
sufficiently large.
• like best subset selection, the lasso performs variable
selection
• depending on the value of λ, the lasso can produce a
model involving any number of variables
The standardized lasso coefficients on the Credit data set are shown
as a function of λ and || L λ ||1/|| ||1.
Dimension Reduction Methods
Dimension Reduction Methods

• Transforms the predictors and then fit a least squares model using the
transformed variables
• Let Z1, Z2,...,ZM represent M <p predictors

Zm=jmXj

• for some constants ϕ1m, ϕ2m ...,ϕpm, m = 1,...,M

• fit the linear regression model

yi=θ0+ mzim+ϵi, i=1,…n

Dimension Reduction Methods

• The term dimension reduction comes from the fact that this
approach reduces the problem of estimating the p+ 1
coefficients β0, β1,...,βp to the simpler problem of estimating

the M + 1 coefficients θ0, θ1,...,θM , where M<p

• the dimension of the problem has been reduced from p + 1 to

M + 1.

where
• p is large relative to n M << p can significantly
reduce the variance of the fitted coefficients.
• If M = p, and all the Zm are linearly independent no
dimension reduction occurs
Dimensionality reduction
• Principal components analysis
• PCA; also called the Karhunen-Loeve, or K-L, method
• Given N data vectors from k-dimensions, find c ≤ k orthogonal vectors (Principal
components) that can be best used to represent data
• Steps
• Normalize input data: Each attribute falls within the same range
• Compute c orthonormal (unit) vectors, i.e., principal components
• Each input data (vector) is a linear combination of the c principal component
vectors
• The principal components are sorted in order of decreasing “significance” or
strength
• Since the components are sorted, the size of the data can be reduced by
eliminating the weak components, i.e., those with low variance. (i.e., using the
strongest principal components, it is possible to reconstruct a good
approximation of the original data)
• Works for numeric data only
• Used for handling sparse data
High Dimensional data

16CST52 -DWM
High Dimensional data

Data sets containing more features than observations are

often referred to as high-dimensional

16CST52 -DWM
Reference(s):

• James G, Witten D, Hastie T and Tibshirani R, “An

Introduction to Statistical Learning with
Applications in R”, Springer,2013

Model Generalization
No ratings yet
Model Generalization
117 pages
ML 1 Lecture 2
No ratings yet
ML 1 Lecture 2
50 pages
Lecture Slide 02 - Supervised Learning - Summer 2023
No ratings yet
Lecture Slide 02 - Supervised Learning - Summer 2023
43 pages
Statistical Learning: Master in Data Science For Management
No ratings yet
Statistical Learning: Master in Data Science For Management
47 pages
ML 19.03 Sidenotes
No ratings yet
ML 19.03 Sidenotes
30 pages
Bias Variance Tradeoff
No ratings yet
Bias Variance Tradeoff
71 pages
L2 Supervised Learning
No ratings yet
L2 Supervised Learning
43 pages
Theory in Machine Learning
No ratings yet
Theory in Machine Learning
60 pages
Wk07 Topic07 2 - 202303
No ratings yet
Wk07 Topic07 2 - 202303
21 pages
INSY662 - F23 - Week 3-1
No ratings yet
INSY662 - F23 - Week 3-1
22 pages
Resampling Methods Class 2
No ratings yet
Resampling Methods Class 2
38 pages
KNN Bias Variance Classification Metrics
No ratings yet
KNN Bias Variance Classification Metrics
81 pages
Unit IV
No ratings yet
Unit IV
51 pages
Resampling Methods Class 1
No ratings yet
Resampling Methods Class 1
33 pages
Lecture-4 Model Evaluation
No ratings yet
Lecture-4 Model Evaluation
28 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
Machine Learning
No ratings yet
Machine Learning
63 pages
M1 - Evaluating Predictive Performance
No ratings yet
M1 - Evaluating Predictive Performance
58 pages
Resampling Methods - ML
No ratings yet
Resampling Methods - ML
115 pages
Lec 16
No ratings yet
Lec 16
18 pages
Ch5 Resampling Methods
No ratings yet
Ch5 Resampling Methods
66 pages
4 Model Order
No ratings yet
4 Model Order
10 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Crossvalidation - 1
No ratings yet
Crossvalidation - 1
30 pages
ML - Module 5
No ratings yet
ML - Module 5
80 pages
Cross Validation
No ratings yet
Cross Validation
37 pages
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
No ratings yet
2023 LSE MY474 Applied Machine Learning Social Science, Lecture4
57 pages
Validation Slides
No ratings yet
Validation Slides
18 pages
Lec 5
No ratings yet
Lec 5
28 pages
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
No ratings yet
CSO504 Machine Learning: Evaluation and Error Analysis Validation and Regularization Koustav Rudra 22/08/2022
28 pages
Jkkklphftbbhuii
No ratings yet
Jkkklphftbbhuii
17 pages
Machine Learning
No ratings yet
Machine Learning
24 pages
DATA ANALYSIS UNIT 4 Notes
No ratings yet
DATA ANALYSIS UNIT 4 Notes
19 pages
MIS410 Lecture8toLecture10
No ratings yet
MIS410 Lecture8toLecture10
13 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
4-ResamplingMethods 1
No ratings yet
4-ResamplingMethods 1
23 pages
Lect 03 Evaluation Part 2
No ratings yet
Lect 03 Evaluation Part 2
40 pages
ML 5
No ratings yet
ML 5
14 pages
Week7 Lecture 1 ML SPR25
No ratings yet
Week7 Lecture 1 ML SPR25
23 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
On Estimating Model Accuracy
No ratings yet
On Estimating Model Accuracy
6 pages
Accuracy Measures
No ratings yet
Accuracy Measures
61 pages
Embed Lec Midterm Reviewer
No ratings yet
Embed Lec Midterm Reviewer
14 pages
Classification
No ratings yet
Classification
4 pages
Lecture 9
No ratings yet
Lecture 9
16 pages
Unit 5 New
No ratings yet
Unit 5 New
9 pages
18 CV & Model Selection
No ratings yet
18 CV & Model Selection
11 pages
ML Nithish
No ratings yet
ML Nithish
16 pages
Assignment Solution 2
No ratings yet
Assignment Solution 2
8 pages
Unit 9 Model Evaluation
No ratings yet
Unit 9 Model Evaluation
26 pages
Notes - Unit 3 - Machine Learning Lnctu-Bca (Aida) - IV Sem
No ratings yet
Notes - Unit 3 - Machine Learning Lnctu-Bca (Aida) - IV Sem
19 pages
5 CV Boot-Handout PDF
No ratings yet
5 CV Boot-Handout PDF
44 pages
Model Evaluation and Cross-Validation Methods
No ratings yet
Model Evaluation and Cross-Validation Methods
3 pages
T05 Soln
No ratings yet
T05 Soln
4 pages
K Fold and Other Cross-Validation Techniques
No ratings yet
K Fold and Other Cross-Validation Techniques
10 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
Humareader Hs U. Manual
No ratings yet
Humareader Hs U. Manual
51 pages
Practical Intepretation Mineral Resource and Ore Reserve Classification
100% (2)
Practical Intepretation Mineral Resource and Ore Reserve Classification
21 pages
Time-series-Forecasting Time Series Forecasting Jupyter Code - Ipynb at Main Chetandudhane Time-series-Forecasting GitHub
100% (2)
Time-series-Forecasting Time Series Forecasting Jupyter Code - Ipynb at Main Chetandudhane Time-series-Forecasting GitHub
162 pages
COURSE SYLLABUS in Mathematics in The Modern World: School Year 2022-2023
No ratings yet
COURSE SYLLABUS in Mathematics in The Modern World: School Year 2022-2023
14 pages
Slides Prepared by John S. Loucks St. Edward's University
No ratings yet
Slides Prepared by John S. Loucks St. Edward's University
48 pages
Data Science Report
No ratings yet
Data Science Report
126 pages
Handling Missing Data
No ratings yet
Handling Missing Data
23 pages
Session 01 - Classical Machine Learning
No ratings yet
Session 01 - Classical Machine Learning
111 pages
Management Science Notes
No ratings yet
Management Science Notes
13 pages
ML Research Paper
No ratings yet
ML Research Paper
9 pages
Awp 01 2024
No ratings yet
Awp 01 2024
51 pages
AN Investigation: by Wang
No ratings yet
AN Investigation: by Wang
74 pages
Econometrics
No ratings yet
Econometrics
28 pages
Data8 Fa24 Final
No ratings yet
Data8 Fa24 Final
19 pages
What Is Stata?
No ratings yet
What Is Stata?
16 pages
Calculator Functions For The AP Stats Exam PDF
No ratings yet
Calculator Functions For The AP Stats Exam PDF
4 pages
Econometrics ch11
No ratings yet
Econometrics ch11
44 pages
Advanced Statistics and Probability-QA
No ratings yet
Advanced Statistics and Probability-QA
12 pages
Juremi Bana Dkk-Arman
No ratings yet
Juremi Bana Dkk-Arman
8 pages
Financial Development Impact On Domestic Investment Does Income Level Matter
No ratings yet
Financial Development Impact On Domestic Investment Does Income Level Matter
19 pages
Deng (2018) (Estimation For The Spatial Autoregressive Threshold Model)
No ratings yet
Deng (2018) (Estimation For The Spatial Autoregressive Threshold Model)
4 pages
The Predictive Power of Enabling Discipline and Time Management in Classroom On Reading Performance The Analysis of Pisa 2018 Data
No ratings yet
The Predictive Power of Enabling Discipline and Time Management in Classroom On Reading Performance The Analysis of Pisa 2018 Data
28 pages
Chapter8 - Solution Manual
No ratings yet
Chapter8 - Solution Manual
6 pages
The Impact of The Internal Welding Defects On The Joint Strength
No ratings yet
The Impact of The Internal Welding Defects On The Joint Strength
6 pages
2009 Wagstaff
No ratings yet
2009 Wagstaff
19 pages
# Tommy Trojan # ITP 449 Fall 2021 # Final Project # Q1
No ratings yet
# Tommy Trojan # ITP 449 Fall 2021 # Final Project # Q1
6 pages
The Research and Analysis of Factors Affecting Critical Flicker Frequency
No ratings yet
The Research and Analysis of Factors Affecting Critical Flicker Frequency
8 pages
14.74x Education Unit Homework Assignment
No ratings yet
14.74x Education Unit Homework Assignment
7 pages
Minitab Session Commands: Appendix
No ratings yet
Minitab Session Commands: Appendix
8 pages
Offline Analytics Plan
No ratings yet
Offline Analytics Plan
2 pages