0% found this document useful (0 votes)

27 views4 pages

Zhang 2017

Uploaded by

SebastianRivera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views4 pages

Zhang 2017

Uploaded by

SebastianRivera

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Machine Learning in Rock Facies Classification: An Application of XGBoost

Downloaded 06/18/17 to 132.239.1.230. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

Licheng Zhang, Cheng Zhan

Summary ten wells (with 4149 examples), consisting of a set of seven
predictor variables and a rock facies (class) for each example
Big data analysis has drawn much attention across different vector and validation (test) data (830 examples from two
industries. Geoscientists, meanwhile, have been doing wells) having the same seven predictor variables in the
analysis with voluminous data for many years, without even feature vector. Facies are based on the examination of cores
bragging how big it is. In this paper, we present an from nine wells taken vertically at half-foot intervals.
application of machine learning, to be more specific, the Predictor variables include five from the wireline log
gradient boosting method, in Rock Facies Classification measurements and two geologic constraining variables that
based on certain geological features and constrains. Gradient are derived from geologic knowledge. These are essentially
boosting is a both popular and effective approach in continuous variables sampled at a half-foot sample rate.
classification, which produces a prediction model in an
ensemble of weak models, typically decision trees. The key The seven predictor variables are:
for gradient boosting to work successfully lies in introducing
Five wireline log Two geologic constrains
a customized objective function and tuning the parameters
measurements
iteratively based on cross-validation. Our model achieves a
rather high F1 score in evaluating two test wells data.  Gamma ray (GR)  Nonmarine-marine
 Resistivity logging indicator (NM_M)
Introduction and Background (ILD_log10)  Relative position
 Photoelectric effect (PE) (RELPOS)
Machine learning emerges to be a very promising area and  Neutron-density porosity
should make the work of future geoscientists more fun and difference (Delta PHI)
 Average neutron-density
less tedious. Furthermore, with the maturing neural network
porosity (PHIND)
technology, the ability for better geological interpretation
could be more automatic and accurate, e.g., in the Gulf of
The nine discrete facies (classes of rocks), the abbreviated
Mexico region, salt body characterization (challenging in the
labels, and the corresponding adjacent facies are listed in the
velocity model) might be elevated to the next level of higher
following Table 1. The facies gradually blend into one
quality seismic images.
another, and some of the neighboring facies are rather close.
There are a few decision tree based algorithms to handle Mislabeling within these neighboring is possible to occur.
classification problems. One is using the random forest, Table 1:
which operates by constructing multiple decision trees to
reduce the possible variance error in each model. Another
Class of rocks Facies Label Adjacent Facies
widely used technique is gradient boosting, which has been
successfully applied in many Kaggle competitions. This Nonmarine sandstone 1 SS 2
method focuses on where the model performs poorly, and
Nonmarine coarse siltstone 2 CSiS 1,3
improves those areas by introducing a learner to compensate
the existing model. Nonmarine fine siltstone 3 FSiS 2

This facies classification problem was originally introduced Marine siltstone and shale 4 SiSh 5
in the Leading Edge by Brendon Hall in Oct. 2016 (Hall, Mudstone 5 MS 4,6
2016). It seems to evolve into the first machine learning
contest in the SEG, more information to be found on here Wackestone 6 WS 5,7

(https://github.com/seg/2016-ml-contest). By the time we

Dolomite 7 D 6,8
submitted the paper, our ranking is 5th on the leaderboard.
Packstone-Grainstone 8 PS 6,7,9
This data is from the Council Grove gas reservoir in
Phylloid-glgal bafflestone 9 BS 7,8
Southwest Kansas. The Panoma Council Grove Field is
predominantly a carbonate gas reservoir encompassing 2700
square miles in Southwestern Kansas. This dataset is from

1371
Machine Learning in Rock Facies Classification: An Application of XGBoost

Methodology 1 T
( f )  T     2j
Downloaded 06/18/17 to 132.239.1.230. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

(2)
Generally speaking, there are 3 types of machine learning 2 j 1
algorithms: supervised learning, unsupervised learning, and
reinforcement learning. The application in this paper belongs There is, of course, more than one way to define the
to the category of supervised learning. This type of complexity, and this particular one works well in practice.
algorithm consists of a target/outcome variable (or And the objective function in XGBoost is defined as:
dependent variable), which is to be predicted from a given
set of predictors (independent variables, or usually called T
1
features). Using these feature variables, a function that maps obj   [G j j  ( H j   ) 2j ]  T (3)
inputs to desired outputs will be generated. The training j 1 2
process continues until the model achieves a satisfied level
of accuracy on the training data. Examples of supervised More details about the notations can be found here
learning includes: regression, decision tree, random forest, (http://xgboost.readthedocs.io/en/latest/model.html).
KNN, logistic regression etc.
Data Analysis and Model Selection
The algorithm adopted here is called XGBoost (eXtreme
Gradient Boosting), which is an optimized distributed Before building any machine learning model, it is necessary
gradient boosting library designed to be highly efficient, to perform some exploratory analysis and cleanup.
flexible and portable. It implements machine learning First, we examine the data that will be used to train the
algorithms under the Gradient Boosting framework. classifier. The data consists of 5 wireline log measures, 2
XGBoost provides a parallel tree boosting (also known as indicator variables, and 1 facies label at half foot interval. In
GBDT, GBM) that solves many data science problems in a machine learning terminology, each log measurement is a
fast and accurate way. It was created and developed by feature vector that maps a set of ‘features’ (the log measures)
Tianqi Chen, a Ph.D. student at the University of to a class (the facies type).
Washington. More details about XGBoost can be found here
(http://dmlc.cs.washington.edu/xgboost.html). Pandas library in Python is a great tool in loading data into
the dataframe structure for further manipulation.
The basic idea of boosting is to combine hundreds of simple
trees with low accuracy to build a more accurate model.
Every iteration will generate a new tree for the model. When
it comes to how a new tree is created, there are thousands of
methods. A famous one is called Gradient Boosting
machine, raised by Friedman (Friedman, 2001). It utilizes
the gradient descent to generate the new tree based on all
previous trees, driving the objective function towards the
minimum direction.

An objective function usually has the form that contains two

parts (training loss and regularization):

Obj ( )  L( )  ( ) (1)

Then some basic statistical analysis are produced, for
Where L is the training loss function, and  is the example, the distribution of each classes (Figure 1a),
regularization term. The training loss measures the heatmap of features (Figure 1b), which produces correlation
performance of the model is on training data. The plot for us to observe relationship between variables, and log
regularization term controls the complexity of the model, plots for wells (Figure 1c). These figures are the initial
which usually controls overfitting. The complexity of each blocks to explore the data, and the visualization libraries are
tree is defined as the following: seaborn and matplotlib.

1372
Machine Learning in Rock Facies Classification: An Application of XGBoost

The next step is data preparation and model selection. The

Downloaded 06/18/17 to 132.239.1.230. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

goal is to build a reliable model to predict the Y values

(Facies) based on X values (the seven predictor variables).

To enhance the performance of XGBoost’s speed over many

iterations, we create a DMatrix format. Such process sorts
the data initially to optimize for XGBoost in building trees,
and reduces the runtime correspondingly. This is especially
helpful in learning with a large number of training examples.

(a)

On the other hand, in order to quantify the quality of the

models, certain metrics are needed. We use accuracy metrics
for judging the models. A simple and easy way to learn the
terminologies (e.g., accuracy, prediction, recall) can be
found in the following webpage
(http://www.dataschool.io/simple-guide-to-confusion-
matrix-terminology/).

There are several main parameters to be tuned to get a good

model for this rock facies classification problem.

(b)

Table 2: main parameters

Step size shrinkage employed to

prevent overfitting. We shrinks
Learning rate the feature weights to make the
boosting process more
conservative
N_estimators The number of trees
Maximum depth of a tree, and
increasing this value will make
Max_depth
the model more complex(likely to
be overfitting)
Minimum sum of instance weight
Min_child_weight
needed in a child
Minimum loss reduced required to
Gamma make a further partition on a leaf
node of the tree
(c)
Subsample ratio of the training
Subsample
instance
Subsample ratio of features when
Colsample_bytree
constructing each tree
Figure 1: (a) Distribution of facies (b) Heatmap of features (c) Log
This sets XGBoost to produce
plots for well SHRIMPLIN and SHANKLE Objective:’multi:softmax’ multiclass classification using the
softmax objective
Number of parallel threads used
nthread
to run XGBoost

1373
Machine Learning in Rock Facies Classification: An Application of XGBoost

Algorithm parameter tuning is a critical processs in applying to another two blind well test data. The best
accuracy (F1 score) we have so far is 0.564, ranked 5th in the
Downloaded 06/18/17 to 132.239.1.230. Redistribution subject to SEG license or copyright; see Terms of Use at http://library.seg.org/

achieving the optimal performance of certain algorithm, and

needs to be carefully justified before moving into contest. The following is the feature importance plot of the
production. Our workflow for optimizing parameters is model. Importance provides a score that indicates how
presented here: useful or valuable each feature was in the construction of the
boosted decision trees within the model. The more an
Pick initial parameters (e.g., default values) attribute is used to make key decisions with decision trees,
the higher its relative importance.
Turn tree-based parameters (e.g., adjust
max_depth and min_child_weight
simultaneously

Calibrate gamma, subsample and

colsample_bytree

Balance regularization parameters

Reduce learning rate and update the number

of trees

The reason we adopt such flow is because of the nature of Conclusions

XGBoost algorithm, which is robust enough not to be We have successfully applied the gradient boosting method
overfitting with increasing trees, but a high value for a to a classification problem in the rock facies. Potential
particular learning rate could degrade its ability in predicting applications of such prediction could be to validate the
new test data. As we reduce the learning rate and increase velocity model for seismic data. This could be viewed as
the number of trees, the computation becomes expensive and some commencing endeavors for more machine learning
could potentially take longer time on standard personal applications in the near future of the oil and gas sector.
computers.
Acknowledgments
Grid search is a typical approach for parameters tuning that
methodically builds and evaluates a model for each The authors would like to thank Ted Petrou, Aiqun Huang
combination of parameters in a specific grid. For instance, and Zhongyang Dong for discussion. We also thank Yan Xu
the code below examines different combinations of for reviewing the manuscript.
‘max_depth’ and ‘min_child_weight’.
Reference

Chen, T. & Guestrin, C., 2016. Xgboost: A scalable tree

boosting system. arXiv preprint arXiv:1603.02754.

Friedman, J. H., 2001. Greedy function approximation: a

gradient boosting machine. Annals of statistics, pp. 1189-
1232.

Hall, B., 2016. Facies classification using machine learning.

Another way to tailor parameters is by random search, which
The Leading Edge, 10, pp. 906-909.
complements the predefined grid search procedure that is
currently being exploited. In this case, we didn’t find random Natekin, A. & Knoll, A., n.d. Gradient boosting machines, a
search benefits much the final results. tutorial. Frontiers in neurorobotics, p. 2013.
After several iterations, the final model is built up. A cross-
validation is conducted to access the performance before

1374

SOP - University of Texas at Austin
No ratings yet
SOP - University of Texas at Austin
2 pages
Xgboost Presentation
100% (3)
Xgboost Presentation
54 pages
Pipe2024 Help Manual
No ratings yet
Pipe2024 Help Manual
1,861 pages
Cbse Class 10 Maths Competency Based Prcatice Questions Chapter 2
No ratings yet
Cbse Class 10 Maths Competency Based Prcatice Questions Chapter 2
3 pages
KIE Diploma in Civil Engineering
No ratings yet
KIE Diploma in Civil Engineering
393 pages
MTM Jan Feb 2017 Web
No ratings yet
MTM Jan Feb 2017 Web
72 pages
Dissertation Topics Logistics Supply Chain
100% (1)
Dissertation Topics Logistics Supply Chain
7 pages
Alogos Used
No ratings yet
Alogos Used
3 pages
Student Profile Modeling Using Boosting Algorithms
No ratings yet
Student Profile Modeling Using Boosting Algorithms
13 pages
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
No ratings yet
Comparison Between Xgboost, Lightgbm and Catboost Using A Home Credit Dataset
5 pages
CS 229 Project Report: Predicting Used Car Prices
100% (1)
CS 229 Project Report: Predicting Used Car Prices
5 pages
PyData London 2022 - Unlocking The Power of LightGBM (Summarized)
No ratings yet
PyData London 2022 - Unlocking The Power of LightGBM (Summarized)
28 pages
Q-Ans All Competitive Exam Guide Ebook by Education For Assam (BIJAY KOCH)
No ratings yet
Q-Ans All Competitive Exam Guide Ebook by Education For Assam (BIJAY KOCH)
49 pages
XGBoost and Random Forest Algorithms
100% (1)
XGBoost and Random Forest Algorithms
6 pages
Application of Machine Learning
No ratings yet
Application of Machine Learning
8 pages
PySiRC Supplementary Information
No ratings yet
PySiRC Supplementary Information
8 pages
Direct Memory Access - GeeksforGeeks
No ratings yet
Direct Memory Access - GeeksforGeeks
4 pages
Deep Learning
No ratings yet
Deep Learning
52 pages
SRU ADA Unit-3
No ratings yet
SRU ADA Unit-3
78 pages
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
No ratings yet
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
30 pages
Module 10-Part 3 - Advanced Boosting Models
No ratings yet
Module 10-Part 3 - Advanced Boosting Models
11 pages
Ml-Unit Iii-1
No ratings yet
Ml-Unit Iii-1
46 pages
Week 7 - Tree-Based Model
100% (1)
Week 7 - Tree-Based Model
8 pages
Chiltonilyin 1993
No ratings yet
Chiltonilyin 1993
26 pages
Classification
No ratings yet
Classification
10 pages
EO Catalyst
No ratings yet
EO Catalyst
30 pages
Senior Medical Scribe
No ratings yet
Senior Medical Scribe
4 pages
Models For Machine Learning
No ratings yet
Models For Machine Learning
11 pages
15 NIPS Auto Sklearn Supplementary
No ratings yet
15 NIPS Auto Sklearn Supplementary
13 pages
Gradient Boosting Machines, A Tutorial: Neurorobotics
No ratings yet
Gradient Boosting Machines, A Tutorial: Neurorobotics
21 pages
Zhang 2019 IOP Conf. Ser. Mater. Sci. Eng. 490 072062
No ratings yet
Zhang 2019 IOP Conf. Ser. Mater. Sci. Eng. 490 072062
6 pages
Nibha Dubey
No ratings yet
Nibha Dubey
5 pages
Enrrique Gomez, El Ingeniero Tercermundista
No ratings yet
Enrrique Gomez, El Ingeniero Tercermundista
3 pages
Unit-IV New
No ratings yet
Unit-IV New
18 pages
Notes 326 Set6 PDF
No ratings yet
Notes 326 Set6 PDF
18 pages
Isgsr2025 Paper 20250309 Fel Final
No ratings yet
Isgsr2025 Paper 20250309 Fel Final
4 pages
Stability Risk Assessment of Slopes Using Logistic Model Tree Based On Updated Case Histories
No ratings yet
Stability Risk Assessment of Slopes Using Logistic Model Tree Based On Updated Case Histories
17 pages
Automated Machine Learning
No ratings yet
Automated Machine Learning
10 pages
The Water Potability Prediction Based On Active Support Vector Machine and Artificial Neural Network
No ratings yet
The Water Potability Prediction Based On Active Support Vector Machine and Artificial Neural Network
5 pages
VHB Exhaust Only Hood
No ratings yet
VHB Exhaust Only Hood
3 pages
23-04-2024 Tuesday Educational Information and o
No ratings yet
23-04-2024 Tuesday Educational Information and o
2 pages
.Machine Learning Algorithms Trends, Perspectives and Prospects
No ratings yet
.Machine Learning Algorithms Trends, Perspectives and Prospects
8 pages
Rockburst Prediction Using Gaussian Process Machine Learning
No ratings yet
Rockburst Prediction Using Gaussian Process Machine Learning
4 pages
Isc 3DD209L: Silicon NPN Power Transistor
No ratings yet
Isc 3DD209L: Silicon NPN Power Transistor
2 pages
5 - EnsembleModeling
No ratings yet
5 - EnsembleModeling
80 pages
A Detailed Analysis of The Supervised Machine Learning Algorithms
No ratings yet
A Detailed Analysis of The Supervised Machine Learning Algorithms
5 pages
ML Architecture
No ratings yet
ML Architecture
4 pages
Imbalanced XGBoost Paper
No ratings yet
Imbalanced XGBoost Paper
11 pages
MC Learning
No ratings yet
MC Learning
4 pages
Comparative Analysis of XGBoost
No ratings yet
Comparative Analysis of XGBoost
20 pages
XGBoost and Upgrades
No ratings yet
XGBoost and Upgrades
14 pages
Tan 2021 J. Phys. Conf. Ser. 1994 012016
No ratings yet
Tan 2021 J. Phys. Conf. Ser. 1994 012016
6 pages
KGS 2007-06-5 Chapter05 Technology
No ratings yet
KGS 2007-06-5 Chapter05 Technology
13 pages
Ijet V3i5p39
No ratings yet
Ijet V3i5p39
15 pages
A Review of Supervised Learning Based Classification For Text To Speech System
No ratings yet
A Review of Supervised Learning Based Classification For Text To Speech System
8 pages
Plagiarism
No ratings yet
Plagiarism
18 pages
Ijctt V48P126
No ratings yet
Ijctt V48P126
11 pages
Rockburst Assessment Using AI Methods
No ratings yet
Rockburst Assessment Using AI Methods
18 pages
Relatório Machine Learning
No ratings yet
Relatório Machine Learning
24 pages
A Novel Combination of Whale Optimizaton Algorithm and Support Vector Machine With Different Kernel Based Functions For Prediction of Blasting Induced Fly Rock in Quarry Mines
No ratings yet
A Novel Combination of Whale Optimizaton Algorithm and Support Vector Machine With Different Kernel Based Functions For Prediction of Blasting Induced Fly Rock in Quarry Mines
17 pages
INTRODUCTION
No ratings yet
INTRODUCTION
67 pages
Aws Glossary
No ratings yet
Aws Glossary
184 pages
Building An AI Startup-2024. in 2024, Building An AI Startup - by Bijit Ghosh - Feb, 2024 - Medium
No ratings yet
Building An AI Startup-2024. in 2024, Building An AI Startup - by Bijit Ghosh - Feb, 2024 - Medium
25 pages
Supervised Machine Learning: A Review of Classification Techniques
No ratings yet
Supervised Machine Learning: A Review of Classification Techniques
20 pages
Assignment 4 Reportdocx
No ratings yet
Assignment 4 Reportdocx
10 pages
Machine Learning: An Artificial Intelligence Methodology: WWW - Ijecs.in
No ratings yet
Machine Learning: An Artificial Intelligence Methodology: WWW - Ijecs.in
6 pages
Boolean Algebra and Logic Gates
No ratings yet
Boolean Algebra and Logic Gates
11 pages
Eng 05 00012
No ratings yet
Eng 05 00012
29 pages
Tdp-704 Variable Volume and Temperature Systems
No ratings yet
Tdp-704 Variable Volume and Temperature Systems
64 pages
XGBoost
No ratings yet
XGBoost
4 pages
Continuously Reinforced Concrete Pavement
100% (1)
Continuously Reinforced Concrete Pavement
2 pages
RG Cross Disciplinary Machinelearning MAIN
No ratings yet
RG Cross Disciplinary Machinelearning MAIN
21 pages
Introduction To Linear Algebra With Applications
0% (3)
Introduction To Linear Algebra With Applications
7 pages
Uiet 2009 Cutoff
No ratings yet
Uiet 2009 Cutoff
17 pages
Machine Learning in Environmental Science and Engineering
No ratings yet
Machine Learning in Environmental Science and Engineering
12 pages
Re: Confidential: Re: Resignation Notice Due To Multiple Family Issues - 002CVM744 - Shaik Ahmad/India/IBM
No ratings yet
Re: Confidential: Re: Resignation Notice Due To Multiple Family Issues - 002CVM744 - Shaik Ahmad/India/IBM
5 pages
SEG2017 Application of Machine Learn
No ratings yet
SEG2017 Application of Machine Learn
5 pages
F3 Fixture
No ratings yet
F3 Fixture
2 pages
FSED 27F Places of Assembly Occupancy Checklist Rev01
No ratings yet
FSED 27F Places of Assembly Occupancy Checklist Rev01
4 pages
Emd-Mi3900 Motor de Traccion
100% (4)
Emd-Mi3900 Motor de Traccion
24 pages
Electropneumatics Basic Level: Festo Worldwide
No ratings yet
Electropneumatics Basic Level: Festo Worldwide
34 pages
Decision Trees
No ratings yet
Decision Trees
5 pages
Petrophysics-Driven Well Log Quality Control Using Machine Learning-2
No ratings yet
Petrophysics-Driven Well Log Quality Control Using Machine Learning-2
15 pages
Data Analytics - Unit-IV
No ratings yet
Data Analytics - Unit-IV
21 pages
The 10 Algorithms Machine Learning Engineers Need To Know
No ratings yet
The 10 Algorithms Machine Learning Engineers Need To Know
14 pages
EE102 Lab 4
No ratings yet
EE102 Lab 4
10 pages