Zhang 2017
Zhang 2017
Downloaded 06/18/17 to 132.239.1.230. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
This facies classification problem was originally introduced Marine siltstone and shale 4 SiSh 5
in the Leading Edge by Brendon Hall in Oct. 2016 (Hall, Mudstone 5 MS 4,6
2016). It seems to evolve into the first machine learning
contest in the SEG, more information to be found on here Wackestone 6 WS 5,7
1371
Machine Learning in Rock Facies Classification: An Application of XGBoost
Methodology 1 T
( f ) T 2j
Downloaded 06/18/17 to 132.239.1.230. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
(2)
Generally speaking, there are 3 types of machine learning 2 j 1
algorithms: supervised learning, unsupervised learning, and
reinforcement learning. The application in this paper belongs There is, of course, more than one way to define the
to the category of supervised learning. This type of complexity, and this particular one works well in practice.
algorithm consists of a target/outcome variable (or And the objective function in XGBoost is defined as:
dependent variable), which is to be predicted from a given
set of predictors (independent variables, or usually called T
1
features). Using these feature variables, a function that maps obj [G j j ( H j ) 2j ] T (3)
inputs to desired outputs will be generated. The training j 1 2
process continues until the model achieves a satisfied level
of accuracy on the training data. Examples of supervised More details about the notations can be found here
learning includes: regression, decision tree, random forest, (http://xgboost.readthedocs.io/en/latest/model.html).
KNN, logistic regression etc.
Data Analysis and Model Selection
The algorithm adopted here is called XGBoost (eXtreme
Gradient Boosting), which is an optimized distributed Before building any machine learning model, it is necessary
gradient boosting library designed to be highly efficient, to perform some exploratory analysis and cleanup.
flexible and portable. It implements machine learning First, we examine the data that will be used to train the
algorithms under the Gradient Boosting framework. classifier. The data consists of 5 wireline log measures, 2
XGBoost provides a parallel tree boosting (also known as indicator variables, and 1 facies label at half foot interval. In
GBDT, GBM) that solves many data science problems in a machine learning terminology, each log measurement is a
fast and accurate way. It was created and developed by feature vector that maps a set of ‘features’ (the log measures)
Tianqi Chen, a Ph.D. student at the University of to a class (the facies type).
Washington. More details about XGBoost can be found here
(http://dmlc.cs.washington.edu/xgboost.html). Pandas library in Python is a great tool in loading data into
the dataframe structure for further manipulation.
The basic idea of boosting is to combine hundreds of simple
trees with low accuracy to build a more accurate model.
Every iteration will generate a new tree for the model. When
it comes to how a new tree is created, there are thousands of
methods. A famous one is called Gradient Boosting
machine, raised by Friedman (Friedman, 2001). It utilizes
the gradient descent to generate the new tree based on all
previous trees, driving the objective function towards the
minimum direction.
1372
Machine Learning in Rock Facies Classification: An Application of XGBoost
(a)
(b)
1373
Machine Learning in Rock Facies Classification: An Application of XGBoost
Algorithm parameter tuning is a critical processs in applying to another two blind well test data. The best
accuracy (F1 score) we have so far is 0.564, ranked 5th in the
Downloaded 06/18/17 to 132.239.1.230. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/
1374