0% found this document useful (0 votes)
12 views5 pages

Session 1 On CatBoost Practical Introduction Notes

CatBoost is an open-source gradient boosting algorithm developed by Yandex, designed for various applications such as search and recommendation systems. It offers advantages like ease of use, high accuracy without extensive tuning, and robust support for categorical and textual features. The algorithm includes advanced technical aspects such as handling missing values, built-in overfitting detection, and integration with popular libraries.

Uploaded by

akhilesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views5 pages

Session 1 On CatBoost Practical Introduction Notes

CatBoost is an open-source gradient boosting algorithm developed by Yandex, designed for various applications such as search and recommendation systems. It offers advantages like ease of use, high accuracy without extensive tuning, and robust support for categorical and textual features. The algorithm includes advanced technical aspects such as handling missing values, built-in overfitting detection, and integration with popular libraries.

Uploaded by

akhilesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Plan of Attack

20 April 2024 19:59

Session 24 - CatBoost Page 1


Introduction
19 April 2024 10:08

CatBoost is an algorithm for gradient boosting on decision trees. It is developed by


Yandex researchers and engineers, and is used for search, recommendation
systems, personal assistant, self-driving cars, weather prediction and many other
tasks at Yandex and in other companies, including CERN, Cloudflare, Careem taxi.
It is in open-source and can be used by anyone.

Advantages

1. Easy to use
2. Great result without hyperparameter tuning
3. Improved accuracy -> benchmark

1.

1. Sophisticated Categorical Feature support + textual features


2. Speed(Fast GPU support + CPU + Prediction) -> benchmarks

3.

Session 24 - CatBoost Page 2


4.

1. Model analysis tools

Technical Aspects
Session 24 - CatBoost Page 3
Technical Aspects

1. Can handle categorical variables(text data also) using Ordered Target Encoding
2. Uses Symmetric Trees
3. Uses technique like Newton Raphson to calculate the output value of the
leaves
4. Can dynamically figure out learning rate
5. Snapshotting capability
6. Native integration with libraries such as SHAP, Plotly etc
7. Usage of smart data structures like 'Pool'
8. Built-in support for cross validation and hyperparameter tuning
9. Can handle missing values out of the box
10. Built in overfitting detector
11. Supports custom loss functions and metrics
12. Multi-threading and GPU support
13. Regularization

Session 24 - CatBoost Page 4


Session 24 - CatBoost Page 5

You might also like