Week 4 R Programming Model Validation
Week 4 R Programming Model Validation
Trine University
Decision trees are an numerous type of models that help to illustrate decision-making
processes through considering several possible outcomes depending on the defined conditions.
Their efficiency, however, depends on how these systems are set as well as the characteristics of
4
the data that it processes. A well-optimized decision tree is thus able to achieve good accuracy
model responses while at the same time not having to make it too deep in order to fit the data.
When the decision trees are deep, they are prone to capturing noise instead of valuable
information and thus underperforms onerse data. On the other hand, trees that are too shallow
might underfit, that is they neglect important patterns, giving a simple model which performs
In order to confirm the stability of the various decision trees, it is recommended to carry
out the same analysis several times based on different divisions between the training and testing
datasets. For example, this technique, known as cross-validation, will enable us to work around
the overfitting problem since it will display good results in other subgroups. If model accuracy
varies greatly between different splits then the chances of overfitting are present. On the other
hand, when performance is constant across splits then this implies that the gotten results can
Classification accuracy metrics or error measures such as RMSE are used for model
comparison where one is comparing, for instance models based on different criteria such a Gini
index with Entropy or comparing trees using ANOVA with trees formed using Poisson
regression. A model that performs better in all the splits of data is normally regarded as better
than a model that doesn’t. However, they don’t stop at comparing which model is ‘better’ based
on the performance on V, W, and X but also assessing how well the model generalises to news
observations, how immune it is to overfitting, and the interpretability of the model for
5
stakeholders’ consumption.