0% found this document useful (0 votes)
61 views

Week 4 R Programming Model Validation

Programming

Uploaded by

saniya
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Week 4 R Programming Model Validation

Programming

Uploaded by

saniya
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

1

Week 4: R Programming Model Validation

Murtuza Hussain Ghouri

Trine University

Data Science and Big Data

October 18, 2024


2

Week 4: R Programming Model Validation


3

Overview of Decision Trees:

Decision trees are an numerous type of models that help to illustrate decision-making

processes through considering several possible outcomes depending on the defined conditions.

Their efficiency, however, depends on how these systems are set as well as the characteristics of
4

the data that it processes. A well-optimized decision tree is thus able to achieve good accuracy

model responses while at the same time not having to make it too deep in order to fit the data.

When the decision trees are deep, they are prone to capturing noise instead of valuable

information and thus underperforms onerse data. On the other hand, trees that are too shallow

might underfit, that is they neglect important patterns, giving a simple model which performs

even worse than on the training data.

Revaluation with Different Data Split :

In order to confirm the stability of the various decision trees, it is recommended to carry

out the same analysis several times based on different divisions between the training and testing

datasets. For example, this technique, known as cross-validation, will enable us to work around

the overfitting problem since it will display good results in other subgroups. If model accuracy

varies greatly between different splits then the chances of overfitting are present. On the other

hand, when performance is constant across splits then this implies that the gotten results can

generalize onto new data since the model is dependable.

Comparative Analysis of Model Performance :

Classification accuracy metrics or error measures such as RMSE are used for model

comparison where one is comparing, for instance models based on different criteria such a Gini

index with Entropy or comparing trees using ANOVA with trees formed using Poisson

regression. A model that performs better in all the splits of data is normally regarded as better

than a model that doesn’t. However, they don’t stop at comparing which model is ‘better’ based

on the performance on V, W, and X but also assessing how well the model generalises to news

observations, how immune it is to overfitting, and the interpretability of the model for
5

stakeholders’ consumption.

You might also like