0% found this document useful (0 votes)
59 views2 pages

Hyperparametric Tuning of XG and RFC

This document discusses hyperparameter tuning for XGBoost, random forest classifiers, and support vector machines. It provides information on key hyperparameters like the booster, learning rate, and regularization parameters for XGBoost, the number of trees, depth, and minimum sample sizes for random forests, and the C value, kernel, gamma, and degree for support vector machines. Tuning these hyperparameters helps avoid overfitting or underfitting by controlling model complexity.

Uploaded by

karthik shirthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views2 pages

Hyperparametric Tuning of XG and RFC

This document discusses hyperparameter tuning for XGBoost, random forest classifiers, and support vector machines. It provides information on key hyperparameters like the booster, learning rate, and regularization parameters for XGBoost, the number of trees, depth, and minimum sample sizes for random forests, and the C value, kernel, gamma, and degree for support vector machines. Tuning these hyperparameters helps avoid overfitting or underfitting by controlling model complexity.

Uploaded by

karthik shirthik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 2

hyperparametric tuning for xg boost and rfc

Booster : This parameter helps us choose what booster we are going to use whither:
Gbtree: which is the default optimizer and it uses a tree-based model
Dart: like the gbtree it uses the tree-based model
Gblinear: uses a linear based model

N thread
determines the cores of computer we can use for running the model

xg boost = learning rate(default = 0.3) varies from 0 to 1


min_chid_weight = defines the minimum sum of all weights required in a child.
iT HELPS IN reducing overfitting because higher values helps in not learning
specific relationship
but it may also cause underfitting.
max_depth = levels of trees (default =6) but can range from 3 to 10
more depth may cause overfitting
less depth may cause underfitting
max_leaf_nodes it can be placed in place of max_depth too
gamma default = 0 it is used for pruning of trees varies from 0 to positive
infinity
higher the gamma more conservative the model

alpha(default = 0) ridge regularization L1 it can be used in case of higher


dimensional data for amaking the model run faster.
lambda (default=1) lasso regularization L2 increasing the value will help reduce
overfitting

rfc hyperparametric tuning


n_estimators : number of trees in the forest
criterion for root node selection either ginie or entrophy
max_depth = depth of trees
by decreasing depth we can control overfitting
by increasing depth we can control underfitting

min_samples_split = the minimum no of samples required to split an internal node


In min_samples_split the value is usually between 1 to 40 very lower value tends
to lead to overfitting and very higher value tend to lead to underfitting

min_samples_leaf = the minimum no of samples required to be at the leaf node


the leaf node will not be considered if it has less samples then the value
specified in min-sample_leaf
it is used for smoothening the model especially in regresssion
ideal value is between 1 to 20

in this these both created branches for one sample each.


for lower values it will learn everything from data so training data performs well
and may lead to overfitting
for higher values we cant get any information for data so underfittiing

min_impurity_decrease
A node will be split if this split induces a decrease of impurity
greater than or equal to its value.
it helps us in determining how deep our tree grows

max_features = the number of features to consider when looking for a best split.
auto , sqrt,llog2

hyperparameter for svc


C float, default=1.0
Regularization parameter. The strength of the regularization is inversely
proportional to C. Must be strictly positive. The penalty is a squared l2 penalty.
C value is the tradeoff betweeen smooth decision boundary and classifying the
training points correctly.
higher c values may lead to overfitting

kernel{‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’} or callable,


default=’rbf’
Specifies the kernel type to be used in the algorithm. If none is given, ‘rbf’ will
be used. If a callable is given it is used to pre-compute the kernel matrix from
data matrices; that matrix should be an array of shape (n_samples, n_samples).
kener uses the type of hyper plane to separate the data
linear uses linear hyperplane
and poly , rbf uses non linear hyperplane.

degreeint, default=3
Degree of the polynomial kernel function (‘poly’). Ignored by all other kernels.
for hyperplane to split the data.

gamma{‘scale’, ‘auto’} or float, default=’scale’


it is used for non linear hyper plane it is used to exactly fit in the training
data
Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’.

if gamma='scale' (default) is passed then it uses 1 / (n_features * X.var()) as


value of gamma,

if ‘auto’, uses 1 / n_features.

You might also like