Model Validation
Model Validation
Model validation is the process that is carried out after Model Training where the trained
model is evaluated with a testing data set. The testing data may or may not be a chunk of the
same data set from which the training set is procured. Model validation is the set of processes
and activities intended to verify that models are performing as expected.
Model validation is a technique where we try to validate the model that has been built by
gathering, preprocessing, and feeding appropriate data to the machine learning algorithms. We
cannot directly feed the data to the model, train it and deploy it. It is essential to validate the
performance or results of a model to check whether a model is performing as per our
expectations or not. There are multiple model validation techniques that are used to evaluate
and validate the model according to the different types of models and their behaviours.
Why Model Validation?
The goal of a model is to make predictions about data. Model validation determines whether the
trained model is trustworthy. Also Model validation benefits in reducing the costs, discovering
more errors, scalability and flexibility and enhancing the model quality.
Leave one out is also a variant of the K fold cross-validation technique where we have defined
the K as n. Where n is the number of samples or data observations we have in our dataset. Here
the model trains and tests on every data sample, and the model considers each sample as a
testing set and others as a training set.
Although this method is not used widely, the holdout and K fold approach solves most of the
issues related to model validation.
Hold out approach is also very similar to the train test split method; just here, we have an
additional split of the data. While using the train test split method, it may happen that there are
two splits of the data, and the data can be leaked, due to which the overfitting of the model can
take place. To overcome this issue, we can still split the data into one more part called hold out
or validation split.
So basically, here, we train our data on the big training set and then test the model on the testing
set. Once the model performs well on both the training and testing set, we try the model on the
final validation split to get an idea about the behavior of the model in unknown datasets.
How we choose the techniques of Model validation?
Actually no single technique can use in all scenarios. We should be quite familiar with our
data. Here is some suggests from Sebastian’s Blog may give us some ideas.