L5 Slides
L5 Slides
Boosting System
Tianqi Chen and Carlos Guestrin, University of Washington
XGBoost
Given a dataset (n
examples, m features)
Objective
2nd order
approx.
Remove
constants
Scoring function to
evaluate quality of
tree structure
Regularized objective function
Split-finding algorithms
Exact
Computationally demanding
Enumerate all possible splits for continuous features
Approximate
Algorithm proposes candidate splits according to percentiles of feature distributions
Maps continuous features to buckets split by candidate points
Aggregates statistics and finds best solution among proposals
Comparison of split-finding
Two variants
Global
Local
Shrinkage and column subsampling
Shrinkage
Scales newly added weights by a factor !
Reduces influence of each individual tree
Leaves space for future trees to improve model
Similar to learning rate in stochastic optimization
Column subsampling
Subsample features
Used in Random Forests
Prevents overfitting more effectively than row-sampling
Sparsity-aware split finding
Data is stored on multiple blocks, and these blocks are stored on disk
Independent threads pre-fetch specific blocks into memory to prevent cache misses
Block Compression
Each column is compressed before being written to disk, and decompressed on-the-fly when
read from disk into a prefetched buffer
Cuts down on disk I/O
Block Sharding
Data is split across multiple disks (i.e. cluster)
Pre-fetcher is assigned to each disk to read data into memory
Cache-aware access
https://arogozhnikov.github.io/2016/06/24/gradient_boosting_explained.html
Conclusions