CatBoost - An In-Depth Guide Python
CatBoost - An In-Depth Guide Python
coderzcolumn.com/tutorials/machine-learning/catboost-an-in-depth-guide-python
Table of Contents
import pandas as pd
import numpy as np
import warnings
warnings.filterwarnings("ignore")
pd.set_option("display.max_columns", 50)
import catboost
import sklearn
Load Datasets
We’ll be using the below-mentioned three different datasets which are available from sklearn as a part of this tutorial for explanation purposes.
Boston Housing Dataset: It's a regression problem dataset which has information about the various attribute of houses in Boston and
their price in dollar. This will be used for regression tasks.
Breast Cancer Dataset: It's a classification dataset which has information about two different types of tumor. It'll be used for
explaining binary classification tasks.
Wine Dataset - It's a classification dataset which has information about ingredients used in three different types of wines. It'll be used
for explaining multi-class classification tasks.
We have loaded all three datasets mentioned one by one below. We have printed descriptions of datasets which gives us an overview of dataset
features and size. We have even loaded each dataset as a pandas data frame and displayed the first few samples of data.
boston = load_boston()
boston_df.head()
1/33
**Data Set Characteristics:**
:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX PTRATIO B LSTAT Price
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0 15.3 396.90 4.98 24.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0 17.8 396.90 9.14 21.6
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0 17.8 392.83 4.03 34.7
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0 18.7 394.63 2.94 33.4
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0 18.7 396.90 5.33 36.2
breast_cancer = load_breast_cancer()
breast_cancer_df.head()
:Attribute Information:
- radius (mean of distances from center to points on the perimeter)
- texture (standard deviation of gray-scale values)
- perimeter
- area
- smoothness (local variation in radius lengths)
- compactness (perimeter^2 / area - 1.0)
- concavity (severity of concave portions of the contour)
- concave points (number of concave portions of the contour)
- symmetry
- fractal dimension ("coastline approximation" - 1)
The mean, standard error, and "worst" or largest (mean of the three
largest values) of these features were computed for each image,
resulting in 30 features. For instance, field 3 is Mean Radius, field
13 is Radius SE, field 23 is Worst Radius.
- class:
- WDBC-Malignant
- WDBC-Benign
2/33
mean mean
mean mean mean mean mean mean mean concave mean fractal radius textur
radius texture perimeter area smoothness compactness concavity points symmetry dimension error error
0 17.99 10.38 122.80 1001.0 0.11840 0.27760 0.3001 0.14710 0.2419 0.07871 1.0950 0.9053
1 20.57 17.77 132.90 1326.0 0.08474 0.07864 0.0869 0.07017 0.1812 0.05667 0.5435 0.7339
2 19.69 21.25 130.00 1203.0 0.10960 0.15990 0.1974 0.12790 0.2069 0.05999 0.7456 0.7869
3 11.42 20.38 77.58 386.1 0.14250 0.28390 0.2414 0.10520 0.2597 0.09744 0.4956 1.1560
4 20.29 14.34 135.10 1297.0 0.10030 0.13280 0.1980 0.10430 0.1809 0.05883 0.7572 0.7813
Wine Dataset
wine = load_wine()
wine_df.head()
- class:
- class_0
- class_1
- class_2
CatBoost provides three different estimators to perform classification and regression tasks.
CatBoost - Its a universal estimator which can handle both classification and regression datasets with settings.
CatBoostRegressor - Its designed to work with regression datasets.
CatBoostClassifier - Its designed to work with classification datasets.
We'll now explain how to use each one with simple examples.
CatBoost: Regression
3/33
The simplest way to train a model in Catboost is initializing the CatBoost estimator. The CatBoost constructor accepts only one parameter
named params which is a dictionary of parameters to be used to create an estimator. It has one main parameter named loss_function
based on the value of this parameter it determines whether the task is regression or classification. We can create a CatBoost estimator without
passing any parameter and it'll create an estimator with loss function as root mean squared error which is used for regression tasks. All the
parameters have some defined default values which we'll list down at the end of this section. By default, the CatBoost estimator trains for 1000
iterations creating 1000 trees. It's an alias to the n_estimators parameter which limits the number of trees.
Below we have created our first CatBoost estimator using the RMSE loss function. We have passed an iteration value of 100 to train it for 100
iterations. The verbose value of 10 will print results at every 10 iterations. The training process will create an ensemble of 100 trees.
In the next cell, we have divided the Boston housing dataset into the train (90%) and test (10%) sets using scikit-learn's train_test_split()
function.
booster
<catboost.core.CatBoost at 0x7fa71fbd9978>
We are now training our gradient boosting estimator created from previous steps by calling the fit() method on it passing it train data and
labels. The fit() method accepts many other parameters which we'll explain as we go ahead with the tutorial. We have then called the
set_feature_names() method which can be used to set feature names for each column of data.
booster.fit(X_train, Y_train)
booster.set_feature_names(boston.feature_names)
The CatBoost estimator provides the method predict() which accepts feature values and returns model predictions. We have below
calculated predictions for train and test datasets.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
test_preds[:5], train_preds[:5]
We can evaluate model performance using the eval_metric() method available from the utils module of catboost. The method accepts
actual labels, predictions, and a list of metrics to evaluate. We'll later list down a list of available metrics with catboost. We have evaluated the
R2 metric on both train and test sets below.
Test R2 : 0.83
Train R2 : 0.99
We'll now list down a list of important attributes and methods of the CatBoost estimator. Please make a note that this is not a list of all possible
attributes and methods. There are many more methods which we'll cover later as well.
Attributes
4/33
best_score_ - It returns the best score of the model.
classes_ - It returns list of classes for classification problem.
feature_names_ - It returns list of feature names.
feature_importances_ - It returns the importance of each feature per algorithm.
learning_rate_ - It returns the learning rate of the algorithm.
random_seed_ - It returns a random seed from which initial model weights were assigned.
tree_count_ - It returns the number of trees in the ensemble.
n_features_in_ - It returns the number of features used to train the model.
evals_result_ - It returns dictionary of evaluation. If we have provided an evaluation set then evaluation results for it will be included.
Data Feature Names : ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT']
Random Seed : 0
Number of Features : 13
Methods
leaf_indices = booster.calc_leaf_indexes(X_train)
leaf_indices[:2]
array([[33, 16, 4, 8, 35, 9, 10, 9, 42, 35, 21, 6, 7, 20, 55, 26,
22, 33, 16, 10, 18, 1, 54, 27, 17, 4, 20, 2, 45, 9, 12, 19,
11, 32, 21, 35, 36, 33, 25, 8, 1, 45, 1, 21, 23, 27, 40, 13,
17, 5, 26, 21, 4, 16, 8, 13, 37, 17, 47, 9, 5, 4, 12, 50,
36, 61, 33, 5, 51, 4, 9, 24, 17, 37, 33, 5, 33, 8, 33, 0,
45, 2, 48, 14, 22, 10, 61, 14, 58, 15, 10, 13, 12, 32, 38, 11,
0, 5, 36, 33],
[36, 34, 20, 28, 51, 25, 34, 41, 42, 27, 53, 6, 35, 4, 55, 26,
54, 23, 4, 10, 31, 43, 46, 27, 5, 3, 61, 2, 39, 9, 8, 25,
11, 9, 21, 43, 6, 45, 25, 9, 49, 45, 35, 23, 23, 27, 63, 13,
17, 53, 58, 21, 12, 51, 8, 29, 37, 17, 51, 43, 53, 24, 15, 50,
36, 61, 33, 37, 35, 45, 9, 24, 49, 61, 33, 61, 35, 8, 33, 56,
61, 37, 53, 9, 62, 58, 61, 14, 57, 15, 10, 13, 36, 32, 46, 63,
2, 37, 36, 33]], dtype=uint32)
5/33
print("Parameters Passed When Creating Model : ",booster.get_params())
Parameters Passed When Creating Model : {'iterations': 100, 'verbose': 10, 'loss_function': 'RMSE'}
All Model Parameters : {'nan_mode': 'Min', 'eval_metric': 'RMSE', 'iterations': 100, 'sampling_frequency':
'PerTree', 'leaf_estimation_method': 'Newton', 'grow_policy': 'SymmetricTree', 'penalties_coefficient': 1, 'boosting_type':
'Plain', 'model_shrink_mode': 'Constant', 'feature_border_type': 'GreedyLogSum', 'bayesian_matrix_reg': 0.10000000149011612,
'l2_leaf_reg': 3, 'random_strength': 1, 'rsm': 1, 'boost_from_average': True, 'model_size_reg': 0.5, 'subsample':
0.800000011920929, 'use_best_model': False, 'random_seed': 0, 'depth': 6, 'posterior_sampling': False, 'border_count': 254,
'classes_count': 0, 'auto_class_weights': 'None', 'sparse_features_conflict_fraction': 0, 'leaf_estimation_backtracking':
'AnyImprovement', 'best_model_min_trees': 1, 'model_shrink_rate': 0, 'min_data_in_leaf': 1, 'loss_function': 'RMSE',
'learning_rate': 0.19522100687026975, 'score_function': 'Cosine', 'task_type': 'CPU', 'leaf_estimation_iterations': 1,
'bootstrap_type': 'MVS', 'max_leaves': 64}
Below we have explained how we can use the shrink() method. We have reduced our original ensemble from 100 to 50 trees. We have then
evaluated the R2 metric on the train and test sets. We can notice a visible change in the R2 score by decreasing the number of trees in the
ensemble.
booster.shrink(ntree_end=50)
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Test R2 : 0.81
Train R2 : 0.96
The CatBoost estimator lets us perform grid search as well using the grid_search() method of the estimator. In order to do a grid search, we
need to create an estimator without setting parameters that we want to try. We then call the grid_search() method on the estimator
instance by giving it parameters dictionary and data to try different parameter combinations.
grid_search()
param_grid - It accepts a dictionary of parameter names and a list of values to try for that parameters.
X - It accepts numpy array, pandas dataframe, catboost.Pool data structure which has feature values.
y - It accepts target labels of data. If we are using the catboost.Pool data structure which has labels info then we don't need to pass
this parameter value.
cv - It accepts integer or sklearn data splitter classes (KFold, StratifiedKFold, ShuffleSplit, StratifiedShuffleSplit). If we give integer as
input then that many folds of data will be created for training. The default value of the parameter is 3.
calc_cv_statistics - It accepts boolean value specifying whether to calculate cross validation statistics. The default is True.
refit - It accepts boolean value specifying whether to train a model using the best parameter setting found using cross-validation. The
default is True.
stratified - It performs stratified partition of the dataset so that class proportion is maintained in sets. The default is True.
6/33
The method returns a dictionary with two keys.
Below we are explaining how we can perform grid search with example. We are trying different values of parameters iterations ,
learning_rate and booststrap_type . We are using training data created from the Boston dataset earlier. We have then evaluated the
performance of the estimator with the best setting by calculating the R2 score on the train and test dataset.
booster = CatBoost()
params = {
'iterations':[10,50],
'learning_rate':[0.01, 0.1],
'bootstrap_type':['Bayesian', 'Bernoulli', 'No']
}
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
7/33
bestTest = 21.55239476
bestIteration = 9
bestTest = 21.55702996
bestIteration = 9
bestTest = 21.55128989
bestIteration = 9
bestTest = 10.92908795
bestIteration = 9
bestTest = 10.95387453
bestIteration = 9
bestTest = 10.7457616
bestIteration = 9
bestTest = 15.92112385
bestIteration = 49
bestTest = 15.8017972
bestIteration = 49
bestTest = 15.81644278
bestIteration = 49
bestTest = 3.788368224
bestIteration = 49
bestTest = 3.654791242
bestIteration = 49
10: loss: 3.6547912 best: 3.6547912 (10) total: 335ms remaining: 30.5ms
bestTest = 3.452184786
bestIteration = 49
11: loss: 3.4521848 best: 3.4521848 (11) total: 386ms remaining: 0us
Estimating final quality...
Test R2 : 0.81
Train R2 : 0.93
cv_results = pd.DataFrame(search_results["cv_results"])
cv_results.head()
8/33
Hyperparameters Tunning: Randomized Search
The CatBoost also lets us perform a randomized search which is faster compared to grid search which only tries a few parameter settings than
trying all possible combinations. We can perform a randomized search using randomized_search() of the CatBoost estimator. The
randomized_search() method has the same API as that of the grid_search() method with one extra parameter named n_iter which
accepts integer value specifying how many random combinations of parameters to try. The default value of this parameter is 10.
If you are interested in learning about grid search and randomized search from scikit-learn then please feel free to check our tutorial on the
same.
Below we are explaining how we can perform a randomized search with example. We are trying different values of parameters iterations ,
learning_rate and booststrap_type . We are using training data created from the Boston dataset earlier. We have then evaluated the
performance of an estimator with the best setting by calculating the R2 score on the train and test dataset.
booster = CatBoost()
params = {
'iterations':[5,10,50,100],
'learning_rate':[0.01, 0.03, 0.1,1.0],
'bootstrap_type':['Bayesian', 'Bernoulli', 'MVS', 'No']
}
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
bestTest = 15.6936908
bestIteration = 4
bestTest = 8.345542698
bestIteration = 49
bestTest = 8.147224095
bestIteration = 49
bestTest = 3.452184786
bestIteration = 49
bestTest = 4.563786442
bestIteration = 13
bestTest = 5.044662741
bestIteration = 29
bestTest = 3.425305309
bestIteration = 99
bestTest = 4.563786442
bestIteration = 13
Test R2 : 0.80
Train R2 : 0.96
9/33
cv_results = pd.DataFrame(search_results["cv_results"])
cv_results.head()
Below we have listed down important parameters of gradient boosting algorithm which we can pass to CatBoost constructor in a dictionary
when creating estimator. These parameters will be available in CatBoostRegressor and CatBoostClassifier constructor as well.
loss_function - It accepts string specifying metric used during training. The gradient boosting algorithm will try to minimize/maximize
loss function output depending on the situation. Below we have given some commonly used loss functions.
RMSE
MAE
Logloss
CrossEntropy
MultiClass
MultiClassOneVsAll
Other Available Loss Functions
custom_metric - It’s the same as the above parameter and the output of the function specified here will be printed during training. We
can specify a single metric or even a list of metrics.
eval_metric - It accepts string specifying metric to evaluate on evaluation set given during training. It has the same options as that of
loss_function .
iterations - It accepts integer specifying the number of trees to train. The default is 1000.
learning_rate - It specifies the learning rate during the training process. The default is 0.03.
l2_leaf_reg - It accepts float specifying coefficient of L2 regularization of a loss function. The default value is 3.
bootstrap_type - It accepts string specifying bootstrap type. Below is a list of possible values.
Bayesian
Bernoulli
MVS
Poisson - Only works when training on GPU
No
class_names - It accepts a list of string specifying class names for classification tasks.
classes_count - It accepts integer specifying the number of classes in target for multi-class classification problem.
depth/max_depth - It accepts integer specifying maximum allowed tree depth in an ensemble. The default is 6.
min_data_in_leaf - It accepts integer specifying a minimum number of training samples per leaf of a tree. The default is 1.
max_leaves - It accepts integer specifying the minimum number of leaves in a tree. The default is 31.
leaf_estimation_method - It accepts the string specifying method used to calculate values in leaves. Below is a list of possible options.
Newton
Gradient
Exact
monotone_constraints - It accepts list of integers of length n_features . Each entry in the list has a value either 1,0 or -1 specifying
increasing, none, or decreasing monotone relation of a feature with the target. We can even give a list of strings or a dictionary of
mapping from feature names to relation type.
early_stopping_rounds - It accepts an integer which instructs the algorithm to stop training if the last evaluation set in the list has
not improved for that many rounds.
thread_count - It accepts integer specifying the number of threads to use during training. The default is -1 which means to use all cores
on the system.
used_ram_limit - It accepts string specifying the size of RAM to use when training. It accepts value in KB, MB, and GB.
gpu_ram_part - It accepts float between 0-1 specifying how much GPU ram to use. The default is 0.95 which means 95% of RAM.
task_type - It accepts one of the below options specifying whether to run the task on CPU or GPU.
CPU
GPU
10/33
devices - It accepts string specifying IDs of GPUs to use for training. Below are possible options
Single GPU - <id1> - It'll use GPU with id1 for training.
List of GPUs - <id1>:<id3>:<id5> - It'll use GPU with id1, id3 and id5 for training.
Range of GPUs - <id1>:<id3> - It'll use GPU with id1, id2 and id3 for taining.
train_dir - It accepts string specifying where to store info generated during training. The default is catboost_info .
Please make a note that the above-mentioned list is not all possible parameters available in CatBoost. The above list includes important
parameters which are generally tuned for good performance. Below we have given a list of all possible parameters available.
cat_features - It accepts a list of integer specifying indices of data that has categorical features.
text_features - - It accepts a list of integer specifying indices of data that has text features.
embedding_features - - It accepts a list of integer specifying indices of data that has embedding features.
eval_set - It accepts a list of below options as input to be used as an evaluation set.
catboost.Pool
pandas dataframe
numpy tuple of features and target labels.
early_stopping_rounds - It accepts an integer which instructs the algorithm to stop training if the last evaluation set in the list has
not improved for that many rounds.
plot - It accepts boolean value specifying whether to generate a plot of training results.
save_snapshot - It accepts boolean value specifying whether to store a snapshot of training at specified internal so that interrupted
training can be resumed later from that point rather than from the beginning.
snapshot_file - It accepts string specifying file name where to stop snapshots during training.
snapshot_interval - It accepts integer specifying interval in seconds at which snapshots are saved.
The Pool is an internal data structure of catboost that wraps our data and target values. It can make training faster.
Pool
data - It accepts numpy array, pandas dataframe, or list which has features values.
label - It accepts numpy array, pandas dataframe, or list which has target labels.
cat_features - It accepts a list of integer specifying indices of data that has categorical features.
text_features -It accepts a list of integer specifying indices of data that has text features.
Below we have explained how we can use the Pool data structure with the train() method to generate the CatBoost estimator.
booster = catboost.train(pool=train_data,params={'iterations':100,
'verbose':10,
'loss_function':'RMSE',
})
print()
print(booster)
booster.set_feature_names(boston.feature_names)
test_preds = booster.predict(test_data)
train_preds = booster.predict(train_data)
11/33
Learning rate set to 0.195221
0: learn: 7.9013401 total: 2.71ms remaining: 269ms
10: learn: 3.5505266 total: 9.11ms remaining: 73.7ms
20: learn: 2.5639279 total: 14.9ms remaining: 56.1ms
30: learn: 2.1352590 total: 20.6ms remaining: 45.9ms
40: learn: 1.8986418 total: 26.3ms remaining: 37.9ms
50: learn: 1.7054125 total: 32ms remaining: 30.7ms
60: learn: 1.5124150 total: 37.6ms remaining: 24.1ms
70: learn: 1.3810154 total: 43.3ms remaining: 17.7ms
80: learn: 1.2817508 total: 49.1ms remaining: 11.5ms
90: learn: 1.1909646 total: 54.8ms remaining: 5.41ms
99: learn: 1.0946543 total: 59.8ms remaining: 0us
Test R2 : 0.83
Train R2 : 0.99
Below we have given another example where we have explained how we can give an evaluation set that will be evaluated during training.
test_preds = booster.predict(test_data)
train_preds = booster.predict(train_data)
bestTest = 4.538962725
bestIteration = 74
Test R2 : 0.82
Train R2 : 0.97
Catboost has a method named to_regressor() which takes CatBoost instance and converts it to CatBoostRegressor instance.
catboost.to_regressor(booster)
<catboost.core.CatBoostRegressor at 0x7f21dfac4358>
Please make a note that the predict() method has a parameter named prediction_type which accepts the below-mentioned value to
generate different predictions.
12/33
X_train, X_test, Y_train, Y_test = train_test_split(breast_cancer.data, breast_cancer.target, train_size=0.9,
stratify=breast_cancer.target,
random_state=123)
bestTest = 0.04134545311
bestIteration = 99
Below we have explained how we can generate probabilities with the predict() function.
booster.predict(X_test, prediction_type="Probability")[:5]
array([[9.98702599e-01, 1.29740117e-03],
[7.61900642e-03, 9.92380994e-01],
[9.99418319e-01, 5.81681361e-04],
[4.00919216e-03, 9.95990808e-01],
[1.43614433e-03, 9.98563856e-01]])
Catboost has method named to_classifier() which takes CatBoost instance and converts it to CatBoostClassifier instance.
catboost.to_classifier(booster)
<catboost.core.CatBoostClassifier at 0x7f21de936d30>
13/33
Learning rate set to 0.250848
0: learn: 0.8810930 test: 0.9177696 best: 0.9177696 (0) total: 4.67ms remaining: 463ms
10: learn: 0.2073324 test: 0.3310180 best: 0.3310180 (10) total: 31.5ms remaining: 255ms
20: learn: 0.0968205 test: 0.2342860 best: 0.2342860 (20) total: 56.1ms remaining: 211ms
30: learn: 0.0585815 test: 0.1975512 best: 0.1969462 (29) total: 79ms remaining: 176ms
40: learn: 0.0403924 test: 0.1881050 best: 0.1881050 (40) total: 97.5ms remaining: 140ms
50: learn: 0.0295660 test: 0.1741516 best: 0.1741516 (50) total: 110ms remaining: 106ms
60: learn: 0.0231273 test: 0.1707485 best: 0.1683297 (58) total: 121ms remaining: 77.3ms
70: learn: 0.0188628 test: 0.1692754 best: 0.1683297 (58) total: 130ms remaining: 53.1ms
80: learn: 0.0156153 test: 0.1676627 best: 0.1658585 (78) total: 138ms remaining: 32.4ms
90: learn: 0.0135755 test: 0.1636182 best: 0.1635633 (87) total: 145ms remaining: 14.4ms
99: learn: 0.0121369 test: 0.1616905 best: 0.1616905 (99) total: 152ms remaining: 0us
bestTest = 0.1616905315
bestIteration = 99
booster.predict(X_test, prediction_type="Probability")[:5]
CatBoostRegressor
The catboost provides an estimator named CatBoostRegressor which can be used directly for regression problems. It accepts the same
parameters that were given to CatBoost as a dictionary directly. Below we have explained how we can use it with a simple example using the
Boston dataset.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
bestTest = 4.538962725
bestIteration = 74
Test R2 : 0.82
Train R2 : 0.97
It has the same attributes and methods which are available with the CatBoost estimator.
The CatBoostRegressor also has a grid_search() method which can be used to perform grid search with it. We have explained it below
with a simple example.
14/33
booster = CatBoostRegressor()
params = {
'iterations':[10,50],
'learning_rate':[0.01, 0.1],
'bootstrap_type':['Bayesian', 'No']
}
bestTest = 21.55239476
bestIteration = 9
bestTest = 21.55128989
bestIteration = 9
bestTest = 10.92908795
bestIteration = 9
bestTest = 10.7457616
bestIteration = 9
bestTest = 15.92112385
bestIteration = 49
bestTest = 15.81644278
bestIteration = 49
bestTest = 3.788368224
bestIteration = 49
bestTest = 3.452184786
bestIteration = 49
The CatBoostRegressor also has a randomized_search() method which can be used to perform a randomized search with it. We have
explained it below with a simple example.
booster = CatBoostRegressor()
params = {
'iterations':[5,50,],
'learning_rate':[0.01, 0.1],
'bootstrap_type':['Bayesian', 'Bernoulli', 'MVS']
}
15/33
bestTest = 22.43710864
bestIteration = 4
bestTest = 22.43544339
bestIteration = 4
bestTest = 22.43002952
bestIteration = 4
bestTest = 15.67950464
bestIteration = 4
bestTest = 15.6064376
bestIteration = 4
bestTest = 15.6936908
bestIteration = 4
bestTest = 15.92112385
bestIteration = 49
bestTest = 3.654791242
bestIteration = 49
CatBoostClassifier
The catboost provides an estimator named CatBoostClassifier which can be used directly for regression problems. It accepts the same
parameters that were given to CatBoost as a dictionary directly.
Binary Classification
Below we have explained how we can perform binary classification using CatBoostClassifier .
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
16/33
Learning rate set to 0.073131
0: learn: 0.5730087 test: 0.5715369 best: 0.5715369 (0) total: 7.87ms remaining: 779ms
10: learn: 0.1564599 test: 0.1748713 best: 0.1748713 (10) total: 104ms remaining: 842ms
20: learn: 0.0811663 test: 0.1047480 best: 0.1047480 (20) total: 146ms remaining: 548ms
30: learn: 0.0522608 test: 0.0798318 best: 0.0798318 (30) total: 170ms remaining: 379ms
40: learn: 0.0391529 test: 0.0681539 best: 0.0681539 (40) total: 194ms remaining: 279ms
50: learn: 0.0296856 test: 0.0594379 best: 0.0590864 (49) total: 234ms remaining: 225ms
60: learn: 0.0242369 test: 0.0543715 best: 0.0543715 (60) total: 263ms remaining: 168ms
70: learn: 0.0188321 test: 0.0501147 best: 0.0501147 (70) total: 287ms remaining: 117ms
80: learn: 0.0160430 test: 0.0495876 best: 0.0484613 (77) total: 316ms remaining: 74.1ms
90: learn: 0.0133933 test: 0.0439943 best: 0.0439943 (90) total: 345ms remaining: 34.1ms
99: learn: 0.0115672 test: 0.0413455 best: 0.0413455 (99) total: 379ms remaining: 0us
bestTest = 0.04134545311
bestIteration = 99
The CatBoostClassifier provides a method named predict_proba() which can be used to generate output as a list of probabilities.
booster.predict_proba(X_test)[:5]
array([[9.98702599e-01, 1.29740117e-03],
[7.61900642e-03, 9.92380994e-01],
[9.99418319e-01, 5.81681361e-04],
[4.00919216e-03, 9.95990808e-01],
[1.43614433e-03, 9.98563856e-01]])
booster.predict_log_proba(X_test)[:5]
array([[-1.29824353e-03, -6.64739211e+00],
[-4.87710931e+00, -7.64817932e-03],
[-5.81850604e-04, -7.44958775e+00],
[-5.51916551e+00, -4.01725051e-03],
[-6.54579330e+00, -1.43717657e-03]])
It has the same attributes and methods which are available with the CatBoost estimator.
The CatBoostClassifier also has grid_search() and randomized_search() methods which work exactly the same way as CatBoost
and CatBoostRegresor hence we have not repeated the code again to explain it.
Multi-Class Classification
Below we have given a simple example which explains how CatBoostClassifier can be used for multi-class classification problems.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
17/33
Learning rate set to 0.250848
0: learn: 0.8810930 test: 0.9177696 best: 0.9177696 (0) total: 3.75ms remaining: 371ms
10: learn: 0.2073324 test: 0.3310180 best: 0.3310180 (10) total: 30.7ms remaining: 249ms
20: learn: 0.0968205 test: 0.2342860 best: 0.2342860 (20) total: 55ms remaining: 207ms
30: learn: 0.0585815 test: 0.1975512 best: 0.1969462 (29) total: 78ms remaining: 174ms
40: learn: 0.0403924 test: 0.1881050 best: 0.1881050 (40) total: 99.7ms remaining: 144ms
50: learn: 0.0295660 test: 0.1741516 best: 0.1741516 (50) total: 121ms remaining: 117ms
60: learn: 0.0231273 test: 0.1707485 best: 0.1683297 (58) total: 137ms remaining: 87.5ms
70: learn: 0.0188628 test: 0.1692754 best: 0.1683297 (58) total: 148ms remaining: 60.6ms
80: learn: 0.0156153 test: 0.1676627 best: 0.1658585 (78) total: 158ms remaining: 37.1ms
90: learn: 0.0135755 test: 0.1636182 best: 0.1635633 (87) total: 166ms remaining: 16.5ms
99: learn: 0.0121369 test: 0.1616905 best: 0.1616905 (99) total: 174ms remaining: 0us
bestTest = 0.1616905315
bestIteration = 99
booster.predict_proba(X_test)[:5]
Cross Validation
The catboost provides a method named cv() which can be used to perform cross-validation on data. Below we have explained with few
examples of how we can perform cross-validation in catboost.
0: learn: 23.7392457 test: 23.7214209 best: 23.7214209 (0) total: 1.45s remaining: 13.1s
1: learn: 23.1657012 test: 23.1529352 best: 23.1529352 (1) total: 3.09s remaining: 12.4s
2: learn: 22.5952486 test: 22.5993739 best: 22.5993739 (2) total: 4.65s remaining: 10.9s
3: learn: 22.0752979 test: 22.0892861 best: 22.0892861 (3) total: 6.2s remaining: 9.3s
4: learn: 21.5334643 test: 21.5535213 best: 21.5535213 (4) total: 7.73s remaining: 7.73s
5: learn: 21.0432351 test: 21.0794070 best: 21.0794070 (5) total: 9.42s remaining: 6.28s
6: learn: 20.5481253 test: 20.5972173 best: 20.5972173 (6) total: 10.9s remaining: 4.66s
7: learn: 20.0763160 test: 20.1390519 best: 20.1390519 (7) total: 12.4s remaining: 3.1s
8: learn: 19.6266408 test: 19.6996881 best: 19.6996881 (8) total: 13.9s remaining: 1.55s
9: learn: 19.1627362 test: 19.2500695 best: 19.2500695 (9) total: 15.3s remaining: 0us
18/33
0: learn: 0.6443925 test: 0.6464482 best: 0.6464482 (0) total: 1.79s remaining: 16.1s
1: learn: 0.5999487 test: 0.6040791 best: 0.6040791 (1) total: 3.35s remaining: 13.4s
2: learn: 0.5593831 test: 0.5650989 best: 0.5650989 (2) total: 5.01s remaining: 11.7s
3: learn: 0.5240421 test: 0.5317291 best: 0.5317291 (3) total: 6.56s remaining: 9.85s
4: learn: 0.4873987 test: 0.4971399 best: 0.4971399 (4) total: 8.14s remaining: 8.14s
5: learn: 0.4602370 test: 0.4709405 best: 0.4709405 (5) total: 9.72s remaining: 6.48s
6: learn: 0.4290721 test: 0.4404948 best: 0.4404948 (6) total: 11.3s remaining: 4.83s
7: learn: 0.4036246 test: 0.4172114 best: 0.4172114 (7) total: 12.8s remaining: 3.2s
8: learn: 0.3767254 test: 0.3908014 best: 0.3908014 (8) total: 14.4s remaining: 1.59s
9: learn: 0.3558123 test: 0.3714315 best: 0.3714315 (9) total: 15.9s remaining: 0us
Below we have explained with a simple example of how we can save and load the catboost model. We have even evaluated the loaded model
again to verify.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
bestTest = 4.538962725
bestIteration = 74
Test R2 : 0.82
Train R2 : 0.97
booster.save_model("catboost_regressor.model")
19/33
loaded_booster = CatBoost()
loaded_booster.load_model("catboost_regressor.model")
<catboost.core.CatBoost at 0x7f21dfa427f0>
test_preds = loaded_booster.predict(X_test)
train_preds = loaded_booster.predict(X_train)
Test R2 : 0.82
Train R2 : 0.97
The grid_search() and randomized_search() methods of the estimator’s also had plot parameter which if set to True will generate a plot
for them.
#test_preds = booster.predict(X_test)
#train_preds = booster.predict(X_train)
The cv() method of catboost also has a plot parameter which if set to True will generate a plot for cross-validation. We have explained the
usage of the same below.
20/33
plot_tree()
Each catboost estimators have a method named plot_tree() which accepts integer and plots tree with that index from an ensemble of trees.
Below we have plotted the 2nd tree. The output of the method is graphviz graph which we have saved
out = booster.plot_tree(1)
out.render('tree', format="png")
out
calc_feature_statistics()
The calc_feature_statistics() method estimator takes as input data, target labels and feature names. It then generated a chart showing
statistics of the feature using a trained model, dataset, and target labels.
21/33
plot_predictions()
The plot_predictions() function takes as input dataset and list of features names/feature indices. It then sequentially vary the value of
given features and calculate predictions.
booster.plot_predictions(X_test[:5], features_to_change=["LSTAT"]);
plot_partial_dependence()
booster.plot_partial_dependence(Pool(X_train, Y_train)
, features=["LSTAT"]);
22/33
Compare Models
The catboost estimators have a method named compare() which takes as input another estimator and a list of metrics to compare the
performance of both models on a specified list of metrics. Below we have explained how we can compare the performance of two different
catboost estimators using the compare() method. We have used metrics R2, RMSE, and MAE for it.
test_preds = booster1.predict(X_test)
train_preds = booster1.predict(X_train)
print("Model-1")
print("Test R2 : %.2f"%eval_metric(Y_test, test_preds, "R2")[0])
print("Train R2 : %.2f"%eval_metric(Y_train, train_preds, "R2")[0])
test_preds = booster2.predict(X_test)
train_preds = booster2.predict(X_train)
print("\nModel-2")
print("Test R2 : %.2f"%eval_metric(Y_test, test_preds, "R2")[0])
print("Train R2 : %.2f"%eval_metric(Y_train, train_preds, "R2")[0])
Model-1
Test R2 : 0.80
Train R2 : 0.96
Model-2
Test R2 : 0.87
Train R2 : 0.99
23/33
Recovering Interrupted Training using Snapshots
Catboost provides support for saving the training process and recover the training process if it was interrupted. It provides us with parameters
named save_snapshot , snapshot_file , and snapshot_interval which saves training results at particular intervals. We can then rerun
training from the interrupted part rather than from the beginning using these parameters.
Below we are training CatBoostRegressor for 15000 iterations so that it takes time to complete. We are then interrupting training after a
few seconds. We have set the save_snapshot parameter of the fit() method to True so that it takes a snapshot during training. The
snapshot_interval is set to 1 so that snapshots are taken every 1 minute. The snapshots are saved in a file named
catboost_snapshots.temp .
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
24/33
---------------------------------------------------------------------------
KeyboardInterrupt Traceback (most recent call last)
~/anaconda3/lib/python3.7/site-packages/catboost/core.py in _fit(self, X, y, cat_features, text_features, embedding_features,
pairs, sample_weight, group_id, group_weight, subgroup_id, pairs_weight, baseline, use_best_model, eval_set, verbose,
logging_level, plot, column_description, verbose_eval, metric_period, silent, early_stopping_rounds, save_snapshot, snapshot_file,
snapshot_interval, init_model)
1808 allow_clear_pool,
-> 1809 train_params["init_model"]
1810 )
_catboost.pyx in _catboost._CatBoost._train()
_catboost.pyx in _catboost._CatBoost._train()
KeyboardInterrupt:
KeyboardInterrupt:
Below we are again starting the training process after interruption and we can notice that it has started from where the last snapshot was taken.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Test R2 : 0.86
Train R2 : 1.00
25/33
Early Stop Training to Avoid Overfitting
The catboost provides a parameter named early_stopping_rounds in the fit() method of all estimators which can be set to some
integer. The training process will stop if the training loss function output is not improving constantly for a specified number of rounds
(specified using the early_stopping_rounds parameter).
Below we have explained with a simple example of how we can use early_stopping_rounds . We can notice that training stops when it does
improve loss for 5 consecutive rounds.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
bestTest = 4.538962725
bestIteration = 74
Test R2 : 0.82
Train R2 : 0.97
Monotonic Constraints
The monotonic constraints let us specify the increasing, decreasing, or no monotone relation of a feature with a target. We can specify a
monotone value of 1,0 or -1 for each feature to show the increasing, none, and decreasing relation of the feature with the target by setting the
monotone_constraints parameter. Below we have explained the usage of monotonic constraints for regression task using the Boston
dataset.
Please make a note that the below estimator is not giving a good R2 score because we have randomly set monotonic constraints values just for
explanation purposes.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
26/33
Learning rate set to 0.166668
0: learn: 8.4082824 test: 10.6123467 best: 10.6123467 (0) total: 4.76ms remaining: 472ms
10: learn: 6.1844246 test: 9.1819333 best: 9.1819333 (10) total: 42.9ms remaining: 347ms
20: learn: 5.4849113 test: 8.7248760 best: 8.7248760 (20) total: 80.1ms remaining: 301ms
30: learn: 5.0819403 test: 8.4086976 best: 8.4086976 (30) total: 118ms remaining: 262ms
40: learn: 4.7705217 test: 8.0945726 best: 8.0945726 (40) total: 155ms remaining: 223ms
50: learn: 4.4499139 test: 8.0709417 best: 8.0296328 (42) total: 192ms remaining: 184ms
60: learn: 4.1723597 test: 8.0705261 best: 8.0296328 (42) total: 245ms remaining: 157ms
70: learn: 3.9376692 test: 7.9487736 best: 7.9395332 (69) total: 284ms remaining: 116ms
80: learn: 3.7828106 test: 7.9343475 best: 7.8989299 (78) total: 321ms remaining: 75.4ms
90: learn: 3.6612582 test: 7.8562065 best: 7.8512619 (89) total: 359ms remaining: 35.6ms
99: learn: 3.4974324 test: 7.7380102 best: 7.7302591 (98) total: 394ms remaining: 0us
bestTest = 7.73025906
bestIteration = 98
Test R2 : 0.48
Train R2 : 0.85
is_max_optimal() - This method returns True if we want to maximize metric else False.
evaluate() - This method returns an error and total weight for a list of prediction and target labels. The logic of calculating error for a list
of values should be included here.
get_final_error() - This method returns actual metric value based on total error and total weight. The logic of calculating the final
error based on weights should be included here.
Below we have created a simple mean absolute error metric. We have then given the same metric to the eval_metric method of
CatBoostRegressor. We can notice from the training results that it is printing mean absolute error at every 10 iterations for the evaluation
dataset which is a test set in our case.
class MeanAbsoluteError(object):
def is_max_optimal(self):
## Return True if We want to Maximize metric else false if want to minimize
## We have passed False because we want to decrease metric
return False
for i in range(len(approxes[0])):
w = 1.0 if weight is None else weight[i]
error += w* (target[i] - approxes[0][i])
weight_sum += w
booster.set_feature_names(boston.feature_names)
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
27/33
Learning rate set to 0.166668
0: learn: 24.2600634 test: 136.0999662 best: 136.0999662 (0) total: 3.17ms remaining: 314ms
10: learn: 97.7562849 test: 68.6856874 best: 68.6856874 (10) total: 29.2ms remaining: 236ms
20: learn: 52.6252278 test: 30.4887092 best: 30.4887092 (20) total: 44.4ms remaining: 167ms
30: learn: 18.7700885 test: 18.4446486 best: 18.4446486 (30) total: 53.9ms remaining: 120ms
40: learn: 11.3055473 test: 9.4733398 best: 9.4733398 (40) total: 62.9ms remaining: 90.5ms
50: learn: 6.2744321 test: 5.7102511 best: 5.2957340 (49) total: 71.2ms remaining: 68.4ms
60: learn: 2.7441574 test: 3.7928502 best: 3.7928502 (60) total: 89.9ms remaining: 57.5ms
70: learn: 2.1414438 test: 2.3408031 best: 2.3408031 (70) total: 97.3ms remaining: 39.7ms
80: learn: 2.1668637 test: 0.0583625 best: 0.0583625 (80) total: 105ms remaining: 24.6ms
90: learn: -1.0705412 test: -1.2272650 best: -1.2272650 (90) total: 112ms remaining: 11.1ms
99: learn: 2.3224568 test: -1.3544445 best: -1.3544445 (99) total: 119ms remaining: 0us
bestTest = -1.354444507
bestIteration = 99
Test R2 : 0.82
Train R2 : 0.98
We can then pass this class to the loss_function parameter of estimators. Below we have created a simple mean squared error loss function
and explained usage of it with a simple example in the next cell.
class MeanSquaredErrorLoss(object):
def calc_ders_range(self, approxes, targets, weights):
# This function should return a list of pairs (der1, der2), where
# der1 is the first derivative of the loss function with respect
# to the predicted value, and der2 is the second derivative.
result = []
for index in range(len(targets)):
der1 = 2*(targets[index] - approxes[index]) ## First Derivative of Loss Function
der2 = -1 ## Second Derivative of Loss Function
result.append((der1, der2))
return result
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
28/33
0: learn: -5.4810607 test: -4.9178567 best: -4.9178567 (0) total: 1.06ms remaining: 105ms
10: learn: -1.5021091 test: -1.6297951 best: -1.6297951 (10) total: 14.7ms remaining: 119ms
20: learn: -0.0371187 test: -0.2945356 best: -0.2945356 (20) total: 23.8ms remaining: 89.6ms
30: learn: 0.5014831 test: 0.2272055 best: 0.2272055 (30) total: 32.1ms remaining: 71.5ms
40: learn: 0.7173740 test: 0.4787453 best: 0.4787453 (40) total: 41.1ms remaining: 59.2ms
50: learn: 0.8195089 test: 0.6193570 best: 0.6193570 (50) total: 50.3ms remaining: 48.3ms
60: learn: 0.8656855 test: 0.6928399 best: 0.6928399 (60) total: 59.9ms remaining: 38.3ms
70: learn: 0.8906273 test: 0.7310830 best: 0.7310830 (70) total: 69ms remaining: 28.2ms
80: learn: 0.9081062 test: 0.7554555 best: 0.7554555 (80) total: 78.1ms remaining: 18.3ms
90: learn: 0.9210548 test: 0.7773323 best: 0.7773323 (90) total: 87.3ms remaining: 8.64ms
99: learn: 0.9290782 test: 0.7905164 best: 0.7905164 (99) total: 95.8ms remaining: 0us
bestTest = 0.7905163648
bestIteration = 99
Test R2 : 0.79
Train R2 : 0.93
!wget https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip
!unzip smsspamcollection.zip
Archive: smsspamcollection.zip
inflating: SMSSpamCollection
inflating: readme
import collections
with open('SMSSpamCollection') as f:
data = [line.strip().split('\t') for line in f.readlines()]
y, text = zip(*data)
collections.Counter(y)
Example 1
The simplest way to work with text data is to let catboost handle the column with text data by itself. Catboost will tokenize data and convert it
to a float array by itself. We'll be using catboost's internal Pool data structure for this purpose. The Pool data structure has an argument named
text_features which accepts a list of indices in data that holds text data. As our dataset only has text data, we have given 0 in that list. We
have then normally created the CatBoostClassifier instance and trained it on train data. We have later evaluated it on test data to check
accuracy as well which is very impressive.
The catboost estimators have a parameter named text_processing which has some default JSON values which are responsible for handling
text data. The default value is available at this link.
booster.fit(train_data, eval_set=test_data)
29/33
0: learn: 0.9742823 test: 0.9856631 best: 0.9856631 (0) total: 55.2ms remaining: 497ms
1: learn: 0.9742823 test: 0.9856631 best: 0.9856631 (0) total: 99.8ms remaining: 399ms
2: learn: 0.9748804 test: 0.9892473 best: 0.9892473 (2) total: 146ms remaining: 340ms
3: learn: 0.9744817 test: 0.9856631 best: 0.9892473 (2) total: 191ms remaining: 287ms
4: learn: 0.9742823 test: 0.9856631 best: 0.9892473 (2) total: 237ms remaining: 237ms
5: learn: 0.9744817 test: 0.9856631 best: 0.9892473 (2) total: 307ms remaining: 205ms
6: learn: 0.9752791 test: 0.9892473 best: 0.9892473 (2) total: 358ms remaining: 153ms
7: learn: 0.9750797 test: 0.9874552 best: 0.9892473 (2) total: 403ms remaining: 101ms
8: learn: 0.9750797 test: 0.9874552 best: 0.9892473 (2) total: 448ms remaining: 49.7ms
9: learn: 0.9750797 test: 0.9874552 best: 0.9892473 (2) total: 493ms remaining: 0us
bestTest = 0.9892473118
bestIteration = 2
Example 2
As a part of our second example, we have explained how we can specify the value for the text_processing parameter if we want to impose a
different way of handling text data.
booster.fit(train_data, eval_set=test_data)
0: learn: 0.9742823 test: 0.9856631 best: 0.9856631 (0) total: 45.3ms remaining: 408ms
1: learn: 0.9742823 test: 0.9856631 best: 0.9856631 (0) total: 91.1ms remaining: 364ms
2: learn: 0.9748804 test: 0.9892473 best: 0.9892473 (2) total: 138ms remaining: 322ms
3: learn: 0.9744817 test: 0.9856631 best: 0.9892473 (2) total: 184ms remaining: 276ms
4: learn: 0.9742823 test: 0.9856631 best: 0.9892473 (2) total: 230ms remaining: 230ms
5: learn: 0.9744817 test: 0.9856631 best: 0.9892473 (2) total: 302ms remaining: 201ms
6: learn: 0.9752791 test: 0.9892473 best: 0.9892473 (2) total: 353ms remaining: 151ms
7: learn: 0.9750797 test: 0.9874552 best: 0.9892473 (2) total: 399ms remaining: 99.6ms
8: learn: 0.9750797 test: 0.9874552 best: 0.9892473 (2) total: 444ms remaining: 49.4ms
9: learn: 0.9750797 test: 0.9874552 best: 0.9892473 (2) total: 490ms remaining: 0us
bestTest = 0.9892473118
bestIteration = 2
Example 3
Our third example for text data gives us another example of handling text data with the text_processing parameter.
booster.fit(train_data, eval_set=test_data)
30/33
0: learn: 0.9742823 test: 0.9856631 best: 0.9856631 (0) total: 46.2ms remaining: 416ms
1: learn: 0.9742823 test: 0.9856631 best: 0.9856631 (0) total: 92.9ms remaining: 371ms
2: learn: 0.9748804 test: 0.9892473 best: 0.9892473 (2) total: 139ms remaining: 324ms
3: learn: 0.9744817 test: 0.9856631 best: 0.9892473 (2) total: 185ms remaining: 277ms
4: learn: 0.9742823 test: 0.9856631 best: 0.9892473 (2) total: 231ms remaining: 231ms
5: learn: 0.9744817 test: 0.9856631 best: 0.9892473 (2) total: 293ms remaining: 195ms
6: learn: 0.9752791 test: 0.9892473 best: 0.9892473 (2) total: 352ms remaining: 151ms
7: learn: 0.9750797 test: 0.9874552 best: 0.9892473 (2) total: 399ms remaining: 99.7ms
8: learn: 0.9750797 test: 0.9874552 best: 0.9892473 (2) total: 445ms remaining: 49.4ms
9: learn: 0.9750797 test: 0.9874552 best: 0.9892473 (2) total: 490ms remaining: 0us
bestTest = 0.9892473118
bestIteration = 2
Example 4
As a part of our fourth example for handling text data, we have used the TF-IDF vectorizer available from scikit-learn to transform our text data
to float. All other parts of the code are almost the same as previous examples with the only difference that we are using TF-IDF transformed
arrays for training and evaluation now.
If you are interested in learning about feature extraction from text data using scikit-learn then please feel free to check our tutorial on the same
to get an in-depth idea about it.
vectorizer = TfidfVectorizer(max_features=500)
X_train_vect = vectorizer.fit_transform(X_train)
X_test_vect = vectorizer.transform(X_test)
train_data = Pool(X_train_vect.toarray(), Y_train) ## toarray() is added to prevent catboost from failing (to avoid sparse array
error)
test_data = Pool(X_test_vect.toarray(), Y_test) ## toarray() is added to prevent catboost from failing (to avoid sparse array
error)
booster = CatBoostClassifier(iterations=10)
booster.fit(train_data, eval_set=test_data)
bestTest = 0.1258048004
bestIteration = 9
GPU Support
Catboost let us run the training process on GPU. It lets us run the training process on a single GPU or even on multiple GPUs in parallel.
In order to run training on GPU, we need to set the task_type parameter of estimators to GPU . We can provide the devices parameter
with a list of the below-mentioned values to run the training process on single/multiple GPUs.
Single GPU - <id1> - e.g : "1" - It'll use GPU with id 1 for training.
List of GPUs - <id1>:<id3>:<id5> - e.g : "1:3:5" - It'll use GPUs with id 1, 3 and 5 for training.
31/33
Range of GPUs - <id1>:<id3> - e.g : "1-3" - It'll use GPUs with id 1, 2 and 3 for training.
We can get the count of GPUs present on the system using the get_gpu_device_count() method of the utils module.
gpu_cnt = utils.get_gpu_device_count()
Below we have explained with simple examples of how we can use GPU for the training process.
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Test R2 : 0.86
Train R2 : 0.99
test_preds = booster.predict(X_test)
train_preds = booster.predict(X_train)
Test R2 : 0.86
Train R2 : 0.98
32/33
This ends our small tutorial explaining various functionalities available through the API of catboost. Please feel free to let us know your views
in the comments section.
Sunny Solanki
33/33