Optimized hyperparameters tuning of multi-class classification algorithms
Optimized hyperparameters tuning of multi-class classification algorithms
Sensitive to outliers Need to manually Based in the No interpretability, Poor results on very Cons
choose the assumption that the overfitting can easily small datasets,
number of features have same occur, need to choose overfitting can easily
neighbours ‘k’. statistical relevance. the number of trees occur.
manually.
What Number?
Random Forest Gradient Boosting Decision Tree K-nearest neighbors Naïve Bayes
(GB) (DS) (KNN)
Min_samples_leaf
Min_samples_split
Bootstrap
NO
TIME
TO
CALCULATE
Hyperparameter
GridSearchCV RandomizedSearchCV
sklearn.model_selection.GridSearchCV(estimat sklearn.model_selection.RandomizedSearchCV
or, param_grid, scoring=None, n_jobs=None, (estimator, param_distributions, n_iter=10,
refit=True, cv=None, return_train_score=False) scoring=None, n_jobs=None, refit=True,
cv=None, verbose=0, pre_dispatch='2*n_jobs',
random_state=None, error_score=nan,
return_train_score=False)
GridSearch & RandomSearch
• GridSearchCV and RandomSearchCV are libraries function that is a member of sklearn's
model_selection package
Grid Search starts with defining a search space grid. The grid consists of selected
hyperparameter names and values, and grid search exhaustively searches the best combination of
these given values.
In random search, we define distributions for each hyperparameter which can be defined
uniformly or with a sampling method. The key difference from grid search is in random search, not
all the values are tested and values tested are selected at random.
Grid Search vs Random Search
Note
• We don’t necessarily need to customize the range of numbers in
Gridsearch and Randomsearch for every problem.
[2, 3, 5, 10, 20] [5, 10, 20, 50, 100] ["gini", "entropy"]
Alpha
[2, 3, 5, 10, 20]