Open In App

Hyperparameter Tuning with R

Last Updated : 12 Sep, 2024
Comments
Improve
Suggest changes
Like Article
Like
Report

In R Language several techniques and packages can be used to optimize these hyperparameters, leading to better, more reliable models. in this article, we will discuss all the techniques and packages for Hyperparameter Tuning with R.

What are Hyperparameters?

Hyperparameters are the settings that control how a machine-learning model learns from data. Examples include the learning rate in neural networks, the number of trees in a random forest, or the number of neighbors in a k-nearest neighbors (k-NN) algorithm. Choosing the correct hyperparameters can make the difference between a model that generalizes well to new data and one that overfits or underfits the training data. Unlike model parameters, which are adjusted during training to fit the data, hyperparameters must be set before training begins.

Why Hyperparameter Tuning is Important?

Correctly tuning hyperparameters can improve a model's performance. Poor settings may cause:

  • Underfitting: The model is too simple to capture the underlying patterns.
  • Overfitting: The model is too complex and captures noise rather than useful patterns, leading to poor generalization.
  • Slow Convergence: Poorly tuned models, such as those with too small a learning rate, can take excessively long to converge or may not converge at all.

Techniques for Hyperparameter Tuning

Here are the some of the main Techniques for Hyperparameter Tuning.

  • Grid Search: Grid search is an exhaustive search method that systematically evaluates all possible combinations of a predefined set of hyperparameters. For example, when tuning a random forest model, we may create a grid of values for parameters such as mtry (the number of variables randomly sampled for each split) and ntree (the number of trees in the forest).
  • Random Search: Random search selects random combinations of hyperparameters from predefined distributions. Unlike grid search, random search does not evaluate every possible combination, making it more efficient, especially when the search space is large.
  • Bayesian Optimization: Bayesian optimization is an advanced technique that models the relationship between hyperparameters and model performance. It uses this model to predict which hyperparameters will lead to better results, refining its predictions as it gathers more data.
  • Cross-Validation: Cross-validation is commonly used alongside the above methods to evaluate model performance. By splitting the data into multiple subsets, training the model on some subsets and testing on others, we ensure that the selected hyperparameters lead to a model that generalizes well to unseen data.

Now we implement stepwise to Hyperparameter Tuning with R Programming Langauge.

Step 1: Load the required liabries and dataset

Load the required libaries and dataset.

R
library(randomForest)
library(caret)

# Load the mtcars dataset
data <- mtcars

Step 2: Data Preparation

Convert the am column (which represents the type of transmission) to a factor with two levels: "Automatic" and "Manual".

R
# Convert 'am' (Transmission) to a factor for classification
mtcars$am <- factor(mtcars$am, levels = c(0, 1), labels = c("Automatic", "Manual"))

Step 3: Feature Selection

A subset of features (mpg, cyl, hp, wt) is selected for modeling.

R
# Subset of features for modeling
features <- mtcars[, c("mpg", "cyl", "hp", "wt")]

Step 4: Define Hyperparameter Grid

Define a grid of hyperparameter values for mtry, which controls the number of features randomly selected at each split in the Random Forest algorithm.

R
#Define Hyperparameter Grid
# Define a grid for the 'mtry' parameter in Random Forest
tuneGrid <- expand.grid(mtry = c(1, 2))

Step 5: Cross-Validation Setup

The data is split into 5 parts: 4 for training, and 1 for validation, repeated 5 times.

R
#Cross-Validation Setup (using stratified sampling)
control <- trainControl(method = "cv", 
                        number = 5,
                        summaryFunction = defaultSummary,
                        savePredictions = TRUE,
                        classProbs = FALSE,
                        sampling = "smote")  # Handle class imbalance in small datasets

Step 6: Model Training

Now train the model.

R
# Model Training with Hyperparameter Tuning
# Train the Random Forest model using the 'caret' package and grid search
model <- train(am ~ mpg + cyl + hp + wt, 
               data = mtcars, 
               method = "rf", 
               metric = "Accuracy",   # Set the metric explicitly for classification
               trControl = control, 
               tuneGrid = tuneGrid,
               allowParallel = TRUE)

Step 7:Print the result

Now print the best tuned model.

R
# Print the Best Tuned Model
print(model$bestTune)   # Output the best 'mtry' value
print(model)           

Output:

  mtry
2 2

Random Forest

32 samples
4 predictor
2 classes: 'Automatic', 'Manual'

No pre-processing
Resampling: Cross-Validated (5 fold)
Summary of sample sizes: 25, 26, 26, 26, 25
Addtional sampling using SMOTE

Resampling results across tuning parameters:

mtry Accuracy Kappa
1 0.7714286 0.5357971
2 0.9047619 0.8057971

Accuracy was used to select the optimal model using the largest value.
The final value used for the model was mtry = 2.

Setp 8: Visualize the Tuning Results

Now we will Visualize the Tuning Results.

R
# Visualize the Tuning Results
# Plot the performance of different hyperparameter values
plot(model)

Output:

gh
Hyperparameter Tuning with R

The plot shows accuracy across different values of mtry. As mtry increases, accuracy improves until it peaks at mtry = 2. Visualizing performance across different hyperparameters helps identify the optimal settings.

Conclusion

Hyperparameter tuning is a crucial step in refining machine learning models to achieve better performance. By carefully selecting and adjusting hyperparameters, such as those in neural networks or random forests, the model's ability to generalize to new data improves, reducing the risk of overfitting or underfitting. Techniques like grid search, random search, and Bayesian optimization, especially when combined with cross-validation, provide powerful ways to identify the optimal settings. Implementing these methods in R can significantly enhance model reliability and accuracy.


Next Article

Similar Reads