0% found this document useful (0 votes)

11 views15 pages

Machin Well Log

Uploaded by

اسلام الشتيوي

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views15 pages

Machin Well Log

Uploaded by

اسلام الشتيوي

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

sustainability

Article
An Approach for the Classification of Rock Types Using
Machine Learning of Core and Log Data
Yihan Xing 1 , Huiting Yang 2 and Wei Yu 3, *

1 School of Statistics, Capital University of Economics and Business, Beijing 100070, China;
[email protected]
2 School of Geosciences &Technology, Southwest Petroleum University, Chengdu 610500, China;
[email protected]
3 SimTech LLC, Houston, TX 77494, USA
* Correspondence: [email protected]

Abstract: Classifying rocks based on core data is the most common method used by geologists.
However, due to factors such as drilling costs, it is impossible to obtain core samples from all wells,
which poses challenges for the accurate identification of rocks. In this study, the authors demonstrated
the application of an explainable machine-learning workflow using core and log data to identify
rock types. The rock type is determined utilizing the flow zone index (FZI) method using core data
first, and then based on the collection, collation, and cleaning of well log data, four supervised
learning techniques were used to correlate well log data with rock types, and learning and prediction
models were constructed. The optimal machine learning algorithm for the classification of rocks
is selected based on a 10-fold cross-test and a comparison of AUC (area under curve) values. The
accuracy rate of the results indicates that the proposed method can greatly improve the accuracy of
the classification of rocks. SHapley Additive exPlanations (SHAP) was used to rank the importance of
the various well logs used as input variables for the prediction of rock types and provides both local
and global sensitivities, enabling the interpretation of prediction models and solving the “black box”
problem with associated machine learning algorithms. The results of this study demonstrated that the
proposed method can reliably predict rock types based on well log data and can solve hard problems
in geological research. Furthermore, the method can provide consistent well log interpretation arising
Citation: Xing, Y.; Yang, H.; Yu, W.
from the lack of core data while providing a powerful tool for well trajectory optimization. Finally,
An Approach for the Classification of the system can aid with the selection of intervals to be completed and/or perforated.
Rock Types Using Machine Learning
of Core and Log Data. Sustainability Keywords: rock type; flow zone index; supervised learning; SHAP value; AUC value
2023, 15, 8868. https://doi.org/
10.3390/su15118868

Academic Editors: Jizhou Tang,

1. Introduction
Yuwei Li and Xiao Ouyang
The classification of rocks is a research topic of common interest among geologists.
Received: 29 April 2023 The accurate classification of rocks can help geologists and petrophysicists determine the
Revised: 22 May 2023
sedimentary environments to improve the accuracy of well log interpretation. With the
Accepted: 28 May 2023
rapid development of electronics and information technology in recent years, researchers
Published: 31 May 2023
have started using machine learning techniques to investigate the relationship between
well log data, rock types, and established methods for predicting rock types. Machine
learning uses various algorithms to build a predictive model on the basis of available data.
Copyright: © 2023 by the authors.
The advantage of this method is that it can evaluate the effect of multiple parameters on
Licensee MDPI, Basel, Switzerland. output simultaneously, which is difficult to study manually. Therefore, machine learn-
This article is an open access article ing is especially effective for high-dimensional problems such as rock type classification.
distributed under the terms and These techniques can be classified into supervised and unsupervised learning techniques.
conditions of the Creative Commons Supervised learning techniques use machine learning for model training and prediction
Attribution (CC BY) license (https:// based on rock types identified by geologists. Hall [1] established a lithology identification
creativecommons.org/licenses/by/ method based on support vector machines. Nishitsuji et al. [2] believed that deep learning
4.0/). has greater potential in lithology identification. Yang Xiao et al. [3] used a decision tree

Sustainability 2023, 15, 8868. https://doi.org/10.3390/su15118868 https://www.mdpi.com/journal/sustainability

Sustainability 2023, 15, 8868 2 of 15

learning algorithm to classify volcanic rocks. Valentín et al. [4] identified rock types using
a deep residual network based on acoustic image logs and micro-resistivity image logs.
Unsupervised learning techniques use training samples of unknown categories (unlabeled
training samples) to solve various problems in pattern recognition. Commonly used unsu-
pervised learning algorithms include principal component analysis (PCA) and clustering
algorithms. Ding Ning [5] carried out lithology identification by means of cluster analysis
based on density attributes. Ju Wu et al. [6] identified coarse-grained sandstone, fine-
grained sandstone, and mudstone using a Bayes stepwise discriminant analysis method
with an accuracy of 82%. Duan Youxiang et al. [7] improved the accuracy of sandstone iden-
tification and classification to a level higher than that of methods based on single-machine
learning. Ma Longfei et al. [8] built a model based on a gradient-boosted decision tree
(GBDT) that can improve the accuracy of lithology identification. Most of these methods
use mathematical models for lithology identification based on manually determined rock
types and involve great uncertainties because experts may adopt different criteria for the
classification of rocks. Moreover, these methods mainly focus on sandstone reservoirs; they
only use a certain type of algorithm for lithology identification and do not consider the
optimization of models adequately. Therefore, it is difficult to interpret the final models of
these methods with geological knowledge. Tang et al. [9] used machine learning to find the
optimum profile in shale formations. Zhao et al. [10] used machine learning methods to
study the dynamic characteristics of fractures in different shale fabric facies, which showed
that machine learning can solve more complex problems, such as shale rock fabric and
fracture characteristics. In this paper, a method combining FZI and machine learning is
proposed for the first time to realize the classification of rock types in the study area. The
rock type is determined through the FZI method using core data, then by comparing the
accuracy levels of four machine learning algorithms and selecting the optimal algorithm to
identify rock types in uncored wells. This method can be used to identify rocks in various
hydrocarbon reservoirs and improve the efficiency and accuracy of well log interpretation
and other geological interpretations. It provides a new idea for lithology identification and
is of great significance for intelligent reservoir evaluation.

2. Geological Settings
The study area is located in the northeastern part of the Amu Darya basin in Turk-
menistan, near the juncture with Uzbekistan. The formation of interest is composed of the
Callovian–Oxfordian carbonate deposits, with an estimated thickness of 350 m, consisting
of the following units from top to bottom: XVac, XVp, XVm, XVhp, XVa1, Z, XVa2, and
XVI [11] (Figure 1).
The area under study in the Callovian period is a carbonate gentle slope sedimen-
tary system composed of an inner ramp, a mid-ramp, an outer ramp, and basin facies
belts. In the early Oxfordian period, under regional transgression, the outer zone of the
mid ramp and outer ramp in the Callovian period were gradually submerged, and the
inner ramp—mid-ramp gradually developed into an edged shelf-type carbonate platform.
The water body in the outer zone is highly energetic, and high-energy shoals or reef–shoal
complexes were developed. The top of the reservoir starts at a depth of about 2300 m.
The main production zones are XVac, XVp, and XVm. The main rock types are various
limestones, where the average matrix porosity is 11.1% and the geometric mean of perme-
ability is 53 mD. The reservoir space can be summarized into three types: pore, vug, and
fracture. The reservoir quality varies significantly vertically and laterally due to different
depositional settings and diagenesis.
023, 15, x FOR PEER REVIEW 3 of 16
Sustainability 2023, 15, 8868 3 of 15

Figure 1. Location of the study area and the column for target intervals of Callovian–Oxfordian.

The area under study in the Callovian period is a carbonate gentle slope sedimentary
system composed of an inner ramp, a mid-ramp, an outer ramp, and basin facies belts. In
the early Oxfordian period, under regional transgression, the outer zone of the mid ramp
and outer ramp in the Callovian period were gradually submerged, and the inner ramp—
mid-ramp gradually developed into an edged shelf-type carbonate platform. The water
body in the outer zone is highly energetic, and high-energy shoals or reef–shoal complexes
were developed. The top of the reservoir starts at a depth of about 2300 m. The main pro-
duction zones are XVac, XVp, and XVm. The main rock types are various limestones,
where the average matrix porosity is 11.1% and the geometric mean of permeability is 53
mD. The reservoir space can be summarized into three types: pore, vug, and fracture. The
reservoir quality varies significantly vertically and laterally due to diﬀerent depositional
settings and diagenesis.
Figure 1. Location of the study area and the column for target intervals of Callovian–Oxfordian.
Figure 1. Location of the study area and the column for target intervals of Callovian–Oxfordian.
3. Data
3. Data and
and Methodology
Methodology
The area under study
The in the Callovian
The schematic
schematic of the
of period used
the workflow
workflow is a carbonate
used inthis
in gentle
thiswork
work slope
isisshown
shown insedimentary
in Figure2.2.
Figure
system composed of an inner ramp, a mid-ramp, an outer ramp, and basin facies belts. In
the early Oxfordian period, under regional transgression, the outer zone of the mid ramp
and outer ramp in the Callovian period were gradually submerged, and the inner ramp—
mid-ramp gradually developed into an edged shelf-type carbonate platform. The water
body in the outer zone is highly energetic, and high-energy shoals or reef–shoal complexes
were developed. The top of the reservoir starts at a depth of about 2300 m. The main pro-
duction zones are XVac, XVp, and XVm. The main rock types are various limestones,
where the average matrix porosity is 11.1% and the geometric mean of permeability is 53
mD. The reservoir space can be summarized into three types: pore, vug, and fracture. The
reservoir quality varies significantly vertically and laterally due to diﬀerent depositional
settings and diagenesis.
Figure 2. Schematic
Figure 2. Schematic of
of the
the workflow
workflow presented
presentedin
inthis
thiswork.
work.
3. Data and Methodology
3.1. Data
3.1.ofData
The schematic the workflow used in this work is shown in Figure 2.
In this study, the 270 m coring data of 3 wells in the Callovian–Oxfordian formation
were In this mainly
used, study, the 270 m coring
including data of
the routine 3 wells
core in the
analysis Callovian–Oxfordian
data formation
of 956 samples, core photos,
weresections,
thin used, mainly including
and scanning the routine
electron core analysis
microscope data of 3data of In
wells. 956addition,
samples,petrophysical
core photos,
well-log data, including gamma-ray (GR), sonic (DT), resistivity (RT and RXO), and density
(RHOB) logs, were available for rock-type classification, especially in the intervals with
poor core data or without core data.

3.2. Methods
3.2.1. Rock Types
Rock typing has a wide variety of applications, such as the prediction of high mud-loss
intervals, potential production zones, and locating perforations. There are many methods
to classify rock types; in this study, we use Winland r35 [12], Pittman equations [13], and
the FZI [14] method. A detailed method of rock classification can be found in the related
literature. It can be seen from Figure 3 that the Callovian–Oxfordian formation in the study
area can be divided into 7 rock types (DRT 1–DRT 7). The corresponding rock types are
Figure 2. Schematic of the workflow presented in this work.

3.1. Data
In this study, the 270 m coring data of 3 wells in the Callovian–Oxfordian formation
3.2. Methods
3.2. Methods
3.2.1. Rock Types
3.2.1. Rock Types
Rock typing has a wide variety of applications, such as the prediction of high mud-
Rock typingpotential
loss intervals, has a wide variety ofzones,
production applications, such asperforations.
and locating the prediction of high
There are mud-
many
loss intervals, potential production zones, and locating perforations. There
methods to classify rock types; in this study, we use Winland r35 [12], Pittman equations are many
Sustainability 2023, 15, 8868 4 of 15
methods
[13], andtotheclassify
FZI [14]rock types;Aindetailed
method. this study, we use
method Winland
of rock r35 [12], Pittman
classification equations
can be found in the
[13], andliterature.
related the FZI [14] It method. A detailed
can be seen method
from Figure of rock
3 that classification can be found
the Callovian–Oxfordian in thein
formation
related literature.
the study area canIt can be seen into
be divided from7 Figure 3 that(DRT
rock types the Callovian–Oxfordian formationrock
1–DRT 7). The corresponding in
wackstone
the study with
area microporosity,
can be divided mud-dominated
into 7 rock types packstone,
(DRT 1–DRT grainstone
7). The
types are wackstone with microporosity, mud-dominated packstone, grainstone with with some
correspondingseparate-
rock
vug
types
some pore
are space, grainstone,
wackstone
separate-vug grain-dominated
withspace,
pore microporosity, packstone, wackstone
grainstone,mud-dominated
grain-dominated withgrainstone
packstone,
packstone, microfractures,
wackstonewith with
and
some mudstone with
separate-vug microfractures,
pore space, respectively.
grainstone, The microscopic
grain-dominated photos
packstone,
microfractures, and mudstone with microfractures, respectively. The microscopic of differentphotos
wackstone rock
with
types are shown
microfractures,
of different rock in
and Figure
are 4.
mudstone
types Statistics of the4.porosity
withinmicrofractures,
shown Figure and permeability
respectively.
Statistics of the of permeability
different
The microscopic
porosity and rockof
photos
types are
ofdifferent shown
differentrock in
rocktypes Table
typesare 1.
areshown
shownininTable
Figure
1. 4. Statistics of the porosity and permeability of
different rock types are shown in Table 1.

Figure3.3.Porosity
Figure Porosityand
andpermeability
permeabilitycross-plots
cross-plotsofofdifferent
diﬀerentrock
rocktypes
typesidentified
identifiedbybyFZI.
FZI.
Figure 3. Porosity and permeability cross-plots of diﬀerent rock types identified by FZI.

Figure 4. Photomicrographs of 7 rock types identified in the Callovian–Oxfordian stage.

Figure 4.
Figure 4. Photomicrographs of 77 rock
Photomicrographs of rock types
types identified
identified in
in the
the Callovian–Oxfordian
Callovian–Oxfordian stage.
stage.

Table 1. Porosity, permeability, and lithology by rock type.

Median Porosity Median Permeability

Rock Types Lithology
(%) (mD)
DRT1 4.18 0.002 Wackstone with microporosity
DRT2 8.60 0.300 Mud-dominated packstone
Grainstone with some
DRT3 11.90 5.100
separate-vug pore space
DRT4 12.00 30.300 Grainstone
DRT5 1.68 1.750 Grain-dominated packstone
DRT6 1.00 1.840 Wackstone with microfracture
DRT7 0.38 0.490 Mudstone with microfracture

3.2.2. Data Preprocessing

The data preprocessing consists of three main phases: data collection, data cleaning
and feature selection, correlation, and normalization.
Sustainability 2023, 15, 8868 5 of 15

(1) Data collection

Having the right data is essential in research work to ensure its success. The authors
collected different rock types (DRTS) and corresponding logging data from 3 wells. The
log data included laterolog deep (RT), laterolog shallow (RXO), acoustic log (DT), which
reflects sedimentation and diagenesis, and gamma log (GR), which reflects sedimenta-
tion. The statistical characteristics of the collected data are shown in Table 2, with the
structured document containing 1093 rows and 6 columns representing rock types and
features, respectively.

Table 2. Statistical distribution of the log data set.

DT GR RHOB RT RXO
(us/ft) (gAPI) (g/cm3 ) (ohm·m) (ohm·m)
Number of values 1093.00 1093.00 1093.00 1093.00 1093.00
Number of missing 2.00 2.00 2.00 2.00 2.00
Min value 48.86 5.29 1.57 4.51 3.53
Max value 81.70 42.94 2.67 72,207.00 618.60
Mode 55.08 8.76 2.41 26.86 10.24
Arithmetic mean 61.83 16.25 2.38 372.28 56.98
Geometric mean 61.41 14.74 2.38 52.46 22.87
Median 60.87 15.01 2.39 33.46 15.92
Average deviation 6.17 5.87 0.07 566.01 65.06
Standard deviation 7.35 7.04 0.10 3307.82 103.48
Variance 54.05 49.58 0.01 10,941,600.00 10,707.70
Skewness 0.39 0.59 −1.82 17.44 2.91
Kurtosis −0.73 −0.30 8.88 325.68 8.34
Q1 [10%] 52.79 7.64 2.27 15.81 7.31
Q2 [25%] 55.55 10.58 2.33 22.55 9.68
Q3 [50%] 60.87 15.01 2.39 33.46 15.92
Q4 [75%] 67.03 21.50 2.44 79.79 37.78
Q5 [90%] 72.31 26.29 2.48 488.06 176.45

It can be seen from Table 3 that the GR value of different types of rocks is low and
changes little, and the RHOB value also does not change much. The DT value of DRT 3
and DRT 4 is larger (greater than 60 gAPI) than that of other rock types, reflecting the
characteristics of high porosity, while DRT 6 and DRT 7 have high resistivity (RT and RXO)
values, which reflect the compact characteristics of these two rocks.

Table 3. Average values of logging parameters for different rock types.

Rock Types GR RHOB DT RT RXO

(gAPI) (g/cm3 ) (us/ft) (ohm·m) (ohm·m)
DRT1 16.70 2.41 54.50 50.70 41.00
DRT2 17.00 2.41 59.50 30.30 17.90
DRT3 11.30 2.39 62.70 38.60 17.10
DRT4 13.00 2.34 63.80 90.90 30.10
DRT5 16.30 2.32 59.30 262.10 91.10
DRT6 16.80 2.29 54.50 650.00 208.90
DRT7 17.20 2.31 57.60 786.00 311.50

It can be seen from the star-plot of average logging values of different rock types
(Figure 5) that it is difficult to use one or several logging values to classify rock types, which
further illustrates the necessity of building other models (such as machine learning) to
predict rock types.
DRT6 16.80 2.29 54.50 650.00 208.90
DRT7 17.20 2.31 57.60 786.00 311.50

It can be seen from the star-plot of average logging values of diﬀerent rock types
(Figure 5) that it is diﬃcult to use one or several logging values to classify rock types,
Sustainability 2023, 15, 8868 6 of 15
which further illustrates the necessity of building other models (such as machine learning)
to predict rock types.

Figure 5.
Figure 5. Star-plots
Star-plots of
of log
log mean
mean values
values for
for different
diﬀerent rock
rock types.
types.

cleaning and
(2) Data cleaning and feature
feature selection
selection
Data cleaning
Data cleaningisisthe theprocess
processofofdetecting
detectingand andremoving
removingnoisy
noisy data
data (erroneous,
(erroneous, incon-
inconsis-
sistent,
tent, andand duplicate
duplicate data)data)
from from datasets.
datasets. Erroneous
Erroneous datadata mainly
mainly results
results fromfromerrorserrors
in wellin
welldata
log log (especially
data (especially
density density data)is and
data) and is typically
typically causedcaused by borehole
by borehole enlargementenlargement
during
during
the the drilling
drilling process.process. In this erroneous
In this study, study, erroneous
data is data is mainly
mainly identified
identified through through sta-
statistical
tistical analysis
analysis methods methods (e.g., box-plot
(e.g., box-plot method). method).
Duplicate Duplicate data mainly
data mainly originates originates from
from differ-
diﬀerent
ent rock types
rock types or porosity
or porosity and permeability
and permeability valuesvalues
at the at the depth.
same same depth. In addition,
In addition, some
columns in the in
some columns initial dataset
the initial are empty,
dataset and the
are empty, authors
and analyzed
the authors the “missingness”
analyzed the “missingness”in the
data
in theset,data
which
set,represents the percentage
which represents of the totalofnumber
the percentage of entries
the total number forofany variable
entries for that
any
is missing.
variable The
that is missing
missing.values can either
The missing be predicted
values can eitherusing the otherusing
be predicted variablesthe or removed.
other varia-
The
blesmissingness
or removed.of well-logging
The missingness variables used in this
of well-logging study used
variables is shown in Figure
in this study is6,shown
in which in
the X-axis represents the well-logging variable and the Y-axis represents
Figure 6, in which the X-axis represents the well-logging variable and the Y-axis repre- the missingness
Sustainability 2023, 15, x FOR PEER REVIEW
expressed as a percentage.
sents the missingness Since, the
expressed as adegree of missingness
percentage. Since, theisdegree
very low (<0.4%) in this
of missingness is7 data
of 16
very
set,
lowthe rowsin
(<0.4%) with
thismissing
data set,values werewith
the rows removed.
missing values were removed.

Figure 6.
Figure 6. Missingness
Missingness in
in different
diﬀerent variables
variables used
used in
in this
this study.
study.

Outliers were
Outliers were removed
removed mainly through the histogram method, the box-plot method,
and Rosner’s
and Rosner’s test [15].
[15]. Histograms
Histograms are useful
useful to
to provide
provide information
information on on the
the distribution
distribution
of
of values
values for
for each
each feature;
feature; they
they can
can be
be used
used toto determine
determine the the distribution,
distribution, center,
center, and
and
skewness
skewnessof ofaadataset
datasetand
anddetect
detectoutliers
outlierstherein.
therein.From
Fromthethe
frequency
frequencyhistograms
histogramsof various
of vari-
ous parameters (Figure 7), it can be seen that RT and RXO data follow a skewed distribu-
tion, and ROHB data basically follow a normal distribution. A few outliers are shown as
black circles in the figure.
Figure 6. Missingness in diﬀerent variables used in this study.

Outliers were removed mainly through the histogram method, the box-plot method,
Sustainability 2023, 15, 8868 and Rosner’s test [15]. Histograms are useful to provide information on the distribution 7 of 15
of values for each feature; they can be used to determine the distribution, center, and
skewness of a dataset and detect outliers therein. From the frequency histograms of vari-
ous parameters
parameters (Figure
(Figure 7), it 7),
canitbe
can be seen
seen thatand
that RT RT RXO
and RXO
data data follow
follow a skewed
a skewed distribu-
distribution,
tion, and ROHB data basically follow a normal distribution. A few outliers are
and ROHB data basically follow a normal distribution. A few outliers are shown as black shown as
black circles in the
circles in the figure. ﬁgure.

Figure7.7.Histograms
Figure Histogramsofoffeatures
features(log
(logparameters).
parameters).

Box plots are widely used to describe the distribution of values along an axis based on
Box plots are widely used to describe the distribution of values along an axis based
the five-number summary: minimum, first quartile, median, third quartile, and maximum
on the ﬁve-number summary: minimum, ﬁrst quartile, median, third quartile, and maxi-
(Figure 8). This visual method allows the reviewer to better understand the distribution
mum (Figure 8). This visual method allows the reviewer to better understand the distri-
and locate the outliers. The median marks the midpoint of the data and is shown by the line
bution and locate the outliers. The median marks the midpoint of the data and is shown
that divides the narrow box into two. The median of the data is usually skewed towards
by the line that divides the narrow box into two. The median of the data is usually skewed
the top or bottom of the narrow box, which means that the data are usually denser on
towards
the narrow theside.
top or
Two bottom
of theofmore
the narrow
extreme box, which means
examples are RTthat
andthe
RXO.dataInare
theusually
samplesdenser
that
on the narrow side. Two of the more extreme examples are RT and
the authors took, half of the samples had values between 30 and 50 ohm·m, which is a RXO. In the samples
that the authors
relatively took, half
dense range. of the
The box samples
plot had avalues
represents between
left-skewed 30 and 50 ohm·m,
distribution. The valueswhich
thatis
a relatively dense range. The box plot represents a left-skewed distribution.
are greater than the upper limit or lesser than the lower limit will be the outliers that should The values
that
be are greater
further lookedthan
intothe
as upper limit carry
they might or lesser than
extra the lower limit
information. Mostwill be thedo
features outliers
not havethat
should be
outliers, and further looked
only the RHOB intovalues
as theyofmight
some carry
sample extra information.
points Most
are less than 2.0features do not
g/m3 . These
Sustainability 2023, 15, x FOR PEER REVIEW
have outliers, and only the RHOB values of some sample points are less than 2.08 g/m
of 16
3.
values are outliers resulting from the distortion of density data caused by borehole collapse
These values
during are outliers
the drilling process.resulting from the distortion of density data caused by borehole
collapse during the drilling process.

Figure8.8.Box
Figure Boxplot
plotof
offeatures
features(log
(logparameters).
parameters).

Consideringthe
Considering thefact
factthat
thatthis
thisstudy
studyinvolves
involvesaalarge
largenumber
numberof ofsamples,
samples,thetheauthors
authors
used the Rosner test function to detect the outliers [16]. The function performsthe
used the Rosner test function to detect the outliers [16]. The function performs theRosner
Rosner
generalized extreme
generalized extreme studentized
studentized deviate
deviate test
test to
to identify
identify potential
potential outliers
outliers in
in aa data
data set,
set,
assumingthe
assuming thedata
datawithout
withoutanyanyoutliers
outlierscomes
comesfrom
fromaanormal
normal(Gaussian)
(Gaussian)distribution.
distribution.
(3) Correlation
By understanding the correlation between diﬀerent parameters, appropriate features
can be selected to build models. Ideally, features that provide a clear relationship to the
output while avoiding too many similar features that would present duplicate infor-
Figure 8. Box plot of features (log parameters).

Sustainability 2023, 15, 8868

Considering the fact that this study involves a large number of samples, the authors
8 of 15
used the Rosner test function to detect the outliers [16]. The function performs the Rosner
generalized extreme studentized deviate test to identify potential outliers in a data set,
assuming the data without any outliers comes from a normal (Gaussian) distribution.
(3)Correlation
(3) Correlation
By understanding the correlation between different parameters, appropriate features
By understanding the correlation between diﬀerent parameters, appropriate features
can be selected to build models. Ideally, features that provide a clear relationship to the
can be selected to build models. Ideally, features that provide a clear relationship to the
output while avoiding too many similar features that would present duplicate information
output while avoiding too many similar features that would present duplicate infor-
should be selected. In order to determine if parameters are linearly correlated with each
mation should be selected. In order to determine if parameters are linearly correlated with
other, the Pearson correlation coefficient was used to calculate the correlation between
each other, the Pearson correlation coeﬃcient was used to calculate the correlation be-
various parameters; the calculation formula is as follows [17]:
tween various parameters; the calculation formula is as follows [17]:

11 ∑∑x ∑∑y ((𝑥

x − x )( y − y) 𝑦)
𝑥)(𝑦
𝑟r== n − 1 (( S x Sy
)
) (1)
(1)
𝑛 1 𝑆 𝑆
− −
where 𝑛n is the number of paired data;
where data; 𝑥̅x and
and𝑦y are
arethe
thesample
samplemeans;
means;and and 𝑆Sy
and𝑆Sx and
are
arethe
thesample
samplestandard
standarddeviations
deviationsofof allall the𝑥 xvalues
the valuesand the𝑦yvalues,
andallallthe values,respectively.
respectively.
The
The coeﬃcient
coefficient value
value can
can range
range between
between −1.00−1.00andand 1.00.
1.00. AAnegative
negativevaluevalue indicates
indicates the
the
relationshipbetween
relationship betweenthe thevariables
variablesis is negatively
negatively correlated,
correlated, which
which means
means as one
as one value
value in-
increases,
creases, thethe other
other decreases.
decreases. ViceVice versa,
versa, a positive
a positive valuevalue
tellstells us that
us that the relationship
the relationship be-
between the variables is positively correlated, which means that as one value
tween the variables is positively correlated, which means that as one value increases, the increases, the
otheralso
other alsoincreases.
increases.AsAsshown
shownininFigure
Figure9,9,the theparameters
parametersare arepoorly
poorlycorrelated,
correlated,and
andonly
only
theRXO
the RXOandandDTDTparameters
parameters have
have aa strong
strong negative
negative correlation
correlation (r (r is −0.45).
is −0.45).

Correlationof
Figure9.9.Correlation
Figure offeatures
features(log
(logparameters).
parameters).

(4)Normalization
(4) Normalization
To meet the needs of some machine learning algorithms (such as KNN), the data needs
to be normalized to eliminate bias. There are several techniques to scale or normalize the
data. The standard scaler expressed by Equation (2) was used for this study. For any given
set of data, xi
x − mean( x )
xscaled_i = i (2)
StdDev( x )

3.2.3. Machine Learning

Machine learning is a process that allows a computer to learn from data without
being explicitly programmed, where an algorithm (often a black box) is used to infer
the underlying input/output relationship from the data [18]. There are various machine
learning algorithms, but they are generally categorized into supervised and unsupervised
learning. Supervised algorithms learn from labeled data, while unsupervised methods
automatically mine or explore for patterns based on similarities. The optimal algorithm
among four supervised learning classifiers (KNN, MLP, RF, and GBM) was selected through
a comparative performance analysis and used to predict rock types.
(1) Random Forest (RF)
underlying input/output relationship from the data [18]. There are various machine learn-
ing algorithms, but they are generally categorized into supervised and unsupervised
learning. Supervised algorithms learn from labeled data, while unsupervised methods au-
tomatically mine or explore for patterns based on similarities. The optimal algorithm
Sustainability 2023, 15, 8868 among four supervised learning classifiers (KNN, MLP, RF, and GBM) was selected 9 of 15
through a comparative performance analysis and used to predict rock types.
(1) Random Forest (RF)
The
The random
random forest
forest method
method is is an ensemble learning
an ensemble learning method
method basedbased onon decision
decision tree
tree
learning
learning [19]. The goal of decision tree learning is to create a model that predicts the
[19]. The goal of decision tree learning is to create a model that predicts the value
value
of
of aa target
target variable
variable based
based onon several
several input
input variables
variables by by discretizing
discretizing the the multidimensional
multidimensional
sample space into uniform blocks and using the average value value within
within each blockblock as the
predictive value. The The disadvantage
disadvantage of of decision
decision tree
tree learning
learning isis that,
that,for
forcomplex
complexproblems,
problems,
the tree tends to grow excessively,
excessively, resulting
resulting in in overfitting.
overfitting. The random forest method
solves the problem of overfitting by creating a large number of deep decision trees [20]. In
each tree, a random subset of the input attributes (log variables) is used to split the tree at
any node. This This randomization
randomization across
across multiple trees (random forest) avoids the overfitting
problem associated
associatedwith
withsingle
singledecision
decision trees
trees by by averaging
averaging the the prediction
prediction results
results of allof all
trees.
trees. Furthermore,
Furthermore, the relative
the relative importance
importance of eachof each
inputinput feature
feature can becanranked
be ranked
in the in random
the ran-
forestforest
dom model. Larger
model. importance
Larger means
importance that athat
means decision on theon
a decision basis
the of thatofspecific
basis input
that specific
can result
input in greater
can result homogeneity
in greater in the subtrees.
homogeneity Typically,
in the subtrees. nodes atnodes
Typically, the topatofthethetop
decision
of the
tree havetree
decision higher
haveimportance. Figure 10
higher importance. shows10that
Figure RT is
shows theRT
that most important
is the of the five
most important of
logging
the parameters
five logging for rock for
parameters classification.
rock classification.

Figure
Figure 10.
10. The
The plot
plot of
of feature
feature importance.
importance.

The random forest method can obtain the optimal result and avoid overfitting overfitting by
adjusting
adjusting the maximum
maximum treetree depth,
depth, the
the percentage
percentage of
of features
features used
used in
in each tree, and the
minimum sample size in a leaf node. Figure Figure 11a shows
shows the optimal
optimal number of parameters
for splitting at any node, which should be 11.
(2) Gradient Boosting Machine (GBM)
Both GBM and the random forest method belong to the broad class of tree-based
classification techniques.
classification techniques.AAseries
seriesofof weak
weak learners
learners is initially
is initially generated,
generated, each
each of which
of which fits
fits the negative gradient of the loss function of the previously superimposed model, so
that the cumulative loss of the model after the addition of the weak learner decreases
in the direction of the negative gradient. Then, all learners are linearly combined using
different weights to enable the learners with excellent performance to be reused. The major
advantage of the GBM algorithm is that it does not require standardization or normalization
of features when different types of data are used; it is not sensitive to missing data; and it
features high nonlinearity and good interpretability for the model.
Optimizable hyperparameters in the GBM algorithm include the number of trees, the
minimum number of data points in the leaf nodes, the interaction depth specified for the
maximum depth of each tree, and the number of variables (or predictors) for splitting at
each node [21]. The larger the number of trees, the larger the tree depth, and the higher the
accuracy. The smaller the number of observations at leaf nodes, the higher the accuracy.
When there are more than 800 trees and the maximum tree depth is 15, the complexity of
the model will increase greatly, but the improvement in accuracy is negligible. Therefore,
simpler models are preferred to avoid overfitting. The optimal hyperparameters selected
for this study are as follows: the number of trees (estimators) is 172 (Figure 11b), the
maximum tree depth is 3, the minimum number of samples for a leaf node is 1, the number
of features to be split is 0.2, and the number of random states (random seeds) is 89.
(3) K-Nearest Neighbor (KNN)
Sustainability 2023, 15, 8868 10 of 15

KNN is a nonparametric regression and classification technique that uses a predefined

number of nearest neighbors to determine the new value (for regression) or new label (for
classification) of new observations [22,23]. It usually uses the Euclidean distance to measure
the distance between two points or elements. To prevent the weights of attributes with
larger initial values (such as RT and RXO in this study) from exceeding those of attributes
with smaller initial values (such as RHOB in this study), each value needs to be normalized
or standardized before the weights of attributes are calculated.
The tuning hyperparameter for the KNN technique is the number of the nearest
neighbors K that can be evaluated by a trial-and-error approach. It can be seen from
Figure 11c that, when K is greater than 40, the accuracy of the model will decrease as the
number of neighbors increases. Therefore, the optimal number of neighbors is 40.
(4) Multilayer-Perceptron Neural Network (MLP)
Multilayer-perceptron neural networks are fully connected feed-forward networks,
which are best applied to problems where the input data and output data are well defined,
yet the process that relates the input to the output is extremely complex [24,25]. A neural
network usually consists of multiple layers; each layer has several neurons, and the neurons
in one layer are connected to all neurons in adjacent layers. Each neuron receives one or
more input signals (such as well-logging variables considered herein), and the input signals
are multiplied by corresponding weights to generate output signals (such as rock types).
The relationship between the independent variable x and the dependent variable y can be
expressed as:
n
y ( x ) = f ( ∑ i =1 wi x i ) (3)
The w weights allow each of the n inputs (denoted by xi ) to contribute a greater or
lesser amount to the sum of input signals. For the activation function f ( x ), the net sum is
used, and the resulting signal y( x ) is the output.
The main adjustable parameters in the MLP algorithm are the number of layers and
the number of neurons (or nodes) in each layer. Errors can be minimized by optimizing the
weights. The optimal parameters are as follows: Alpha is 0.0001, bet_1 is 0.9, and bet_2
is 0.999. An MLP is optimal when it consists of three hidden layers and the number of
Sustainability 2023, 15, x FOR PEER REVIEW 11 of 16
neurons in the third hidden layer is 14 (Figure 11d).

Figure Hyperparameter
11.Hyperparameter
Figure 11. tuningtuning for different
for diﬀerent supervised supervised learning
learning (a) the (a) of
Estimators theRFEstimators
is 11. of RF is 11.
(b) the Estimators of GBM is 172. (c) the K of KNN is 40. (d) the number of neurons in the
(b) the Estimators of GBM is 172. (c) the K of KNN is 40. (d) the number of neurons in the thirdthird
hidden layer is 14.
hidden layer is 14.
3.3. K-Fold Cross-Validation
Classifiers for lithology identification were constructed using KNN, GBM, random
forest, and MLP based on well log data. The log parameters selected for predicting the
rock types were GR, RT, DT, RXO, and RHOB. A total of 75% of the data was used for
training, and the other 25% was used for testing. A 10-fold cross-validation was performed
on the training data to prevent overfitting. In 10-fold cross-validation, the training data
were randomly subdivided into 10 parts; the model was trained on 9 parts and then vali-
Sustainability 2023, 15, 8868 11 of 15

3.3. K-Fold Cross-Validation

Classifiers for lithology identification were constructed using KNN, GBM, random
forest, and MLP based on well log data. The log parameters selected for predicting the rock
types were GR, RT, DT, RXO, and RHOB. A total of 75% of the data was used for training,
and the other 25% was used for testing. A 10-fold cross-validation was performed on the
training data to prevent overfitting. In 10-fold cross-validation, the training data were
randomly subdivided into 10 parts; the model was trained on 9 parts and then validated
on the remaining 1 part. This process was repeated multiple times for each machine
learning technique. Only those models were averaged to give the final model that provides
good results on the validation data. Figure 11 shows the results for hyperparameter
tuning, and Table 4 summarizes the optimal values of hyperparameters for different
supervised learning techniques. Table 5 summarizes the cross-validation accuracy for
different supervised learnings.

Table 4. Summary of optimal hyperparameters for different supervised learnings.

Methods Optimal Hyperparameter Values

KNN Number of neighbors = 40
MLP neural network Layer1: units = 15; Layer2: units = 20; Layer3: units = 14
Random forest N-Estimators = 11
N-Estimators = 172, maximum tree depth = 3, n.minobsinnode = 1
GBM
(minimum number of observations in the leaf nodes = 1)

Table 5. Summary of cross-validation accuracy for different supervised learnings.

Machine Learning Cross Validation Cross Validation

Algorithms Accuracy (%) Standard Deviation (%)
GBM 67.86 3.22
MLP 67.01 2.37
KNN 67.03 0.10
Random forest 66.88 2.69

4. Evaluation and Application of Machine Learning

4.1. Model Accuracy and Machine Learning Results
Table 6 summarizes the different accuracy metrics on the test data set for different
supervised learning techniques. The area under the curve (AUC) represents the area under
the receiver operating characteristic (ROC) curve and is a useful metric to evaluate the
performance of any classification model [26]. The accuracy metric represents the proportion
of the test data set predicted correctly (expressed as a percentage). It can be seen from Table 6
that the four supervised learning techniques have achieved prediction results, and their
accuracy levels are higher than 70%. The GBM has achieved the highest accuracy and largest
AUC value, indicating that it is the best one among the four supervised learning techniques
in terms of performance. The model accuracy has reached 79.25% on the test set.

Table 6. Different accuracy metrics on the test data set for different supervised learnings.

Machine Learning Algorithms AUC Model Accuracy (%)

GBM 0.83 79.25
MLP 0.78 73.94
KNN 0.75 70.85
Random forest 0.74 70.40

Figure 12 shows the results of a comparison between the actual rock types (Actual
Rock types) of core samples from Well A (which was not modeled during this study)
and the rock types predicted by various supervised learning techniques (different colors
Sustainability 2023, 15, 8868 12 of 15

represent different rock types). GBM_Rock represents rock types predicted by GBM using
the log data. MLP_Rock, KNN_Rock, and Rand Forest_Rock represent the results predicted
using MLP, KNN, and random forest, respectively. It is evident that the random forest
Sustainability 2023, 15, x FOR PEER REVIEW
technique does not predict as well as other supervised learning techniques. The13 of 16
visual
results in Figure 12 further corroborate the quantitative accuracy metrics shown in Table 6.

Figure 12.
Figure 12. The
Theplot
plotofofactual
actualrock
rock types
types and
and thethe types
types predicted
predicted by different
by different machine
machine learning
learning tech-
techniques.
niques.
4.2. Importance of Predictors and Model Interpretation
4.2. Importance
Predictionofmodels
Predictors
canand Model Interpretation
be interpreted by quantitatively analyzing the importance of
predictors (well-logging variables) to the models. This is helpful
Prediction models can be interpreted by quantitatively in decoding
analyzing the “black
the importance of
box” predictions and makes the model interpretable. The main parameter
predictors (well-logging variables) to the models. This is helpful in decoding the is the SHapley
“black
Additive exPlanations
box” predictions (SHAP)the
and makes values,
modelwhich are calculated
interpretable. for each
The main combination
parameter is theof predic-
SHapley
tor (log variables) and cluster (rock types). Mathematically, they represent
Additive exPlanations (SHAP) values, which are calculated for each combination of pre-the average of
the marginal
dictor contributions
(log variables) across(rock
and cluster all permutations [27]. Typically,
types). Mathematically, they arepresent
higher SHAP value
the average
for a predictor/cluster combination suggests that the chosen log variable is important
of the marginal contributions across all permutations [27]. Typically, a higher SHAP value to
identify the cluster. Because SHAP is model-agnostic, any machine-learning model
for a predictor/cluster combination suggests that the chosen log variable is important to can be
analyzed to derive
identify the cluster.input/output
Because SHAP relationships.
is model-agnostic, any machine-learning model can be
Figure 13a shows a variable-importance plot that lists the most significant variables in
analyzed to derive input/output relationships.
descending order, which provides a global interpretation of the classification and shows
Figure 13a shows a variable-importance plot that lists the most significant variables
the average impact on model-output magnitude. In Figure 13a, the X-axis represents
in descending order, which provides a global interpretation of the classification and
the average value of the SHAP absolute value, which reflects the average effect on the
shows the average impact on model-output magnitude. In Figure 13a, the X-axis repre-
magnitude of the output, and the Y-axis represents the well-logging variables used to
sents the average value of the SHAP absolute value, which reflects the average effect on
identify rock types. The plot shows that RT, RXO, and DT are the three most important
the magnitude of the output, and the Y-axis represents the well-logging variables used to
variables to define rock types in this study.
identify rock types. The plot shows that RT, RXO, and DT are the three most important
variables to define rock types in this study.
Figure 13b shows the SHAP values for Cluster 3 (Rock type 4) and different log vari-
ables; the different points represent the different observations (i.e., depths in the data set).
The color in the plot represents whether the log variable has a high or low value for that
observation. The X-axis shows the Shapley values; the larger the Shapley value, the greater
the impact on cluster prediction. For any variable, such as RHOB, the SHAP values corre-
of Rock type 4. In summary, Cluster 3 (rock type 4) is characterized by low GR values, low
RHOB values, high DT values, and medium-high RXO values, which is consistent with
the rocks in Cluster 3 being grainstones with low GR values, low RHOB values, high DT
values, and low RT values. This method is helpful in the local interpretation of classifica-
tion models. Such analysis provides a way to interpret classification results without con-
Sustainability 2023, 15, 8868 13 of 15
sidering model selection, and the application of SHAP values in petroleum engineering
provides a method for the global and local interpretation of classification models.

(a)

(b)
Figure 13. (a) VariableFigure
importance
13. (a)plot. (b) Sharp
Variable plot for
importance Rock
plot. (b)type 4. plot for Rock type 4.
Sharp

5. Conclusions Figure 13b shows the SHAP values for Cluster 3 (Rock type 4) and different log
variables; the different
This paper presents a promising andpoints represent
interpretable the different
machine observations
learning approach that (i.e., depths in the data
can identify variousset).
typesThe
ofcolor
rocksinbased
the plot represents
on well whether
log data. the logof
The purpose variable has was
this study a high or low value for
to improve geologicalthatinsights
observation.
and theThe X-axis shows
accuracy of wellthe
logShapley values;through
interpretation the larger the Shapley value, the
accu-
greater
rate identification of the impact
rock types. on clustermethod
The proposed prediction.
alsoFor any variable,
provides valuablesuch as RHOB, the SHAP values
references
for the optimizationcorresponding to different
of well trajectory and the RHOB dataselection
optimal points range from slightly
of intervals to be negative
perfo- to larger positive
values. The points with larger positive
rated. The conclusions drawn from this study are detailed below. SHAP values have a strong influence on Rock type 4,
and these points are associated with low (colored blue) values of features, suggesting that
(1) Based on core data and the FZI method, the Callovian–Oxfordian formation in the
low RHOB values are a key characteristic of Rock type 4. Similarly, it can be determined
study area can be divided into seven rock types.
through analysis that low GR values and high DT values are also typical features of Rock
(2) The results of this study show that the rock types in uncored wells can be accurately
type 4. In summary, Cluster 3 (rock type 4) is characterized by low GR values, low RHOB
classified by core data using machine learning and well log data. Accurate classifica-
values, high DT values, and medium-high RXO values, which is consistent with the rocks
tion of rocks can greatly improve the accuracy of well log interpretation and the reli-
in Cluster 3 being grainstones with low GR values, low RHOB values, high DT values, and
ability of research
low results withThis
RT values. respect to sedimentary
method is helpful inmicrofacies.
the local interpretation of classification models.
(3) Four machine Such learning algorithms were evaluated, including
analysis provides a way to interpret classification KNN, results
GBM, random
without considering model
forest, and MLP.selection, and the application of SHAP values in petroleumthe
Based on the cross-validation and evaluation results, GBM has provides a method
engineering
been selected for
for the identification
the global and local of interpretation
rock types in oftheclassification
study area. The accuracy of
models.
this algorithm for lithology identification can reach 79%.
(4) In this study, SHAP values were used to interpret “black box” (machine learning)
5. Conclusions
models, which demonstrate
This paperhigh robustness
presents and practicability
a promising and provide
and interpretable an eﬀec-
machine learning approach that
tive means of global and local interpretation for rock classification models
can identify various types of rocks based on well log data. The purpose based on of this study was
machine learning.
to improve geological insights and the accuracy of well log interpretation through accurate
identification of rock types. The proposed method also provides valuable references for the
optimization of well trajectory and the optimal selection of intervals to be perforated. The
conclusions drawn from this study are detailed below.
(1) Based on core data and the FZI method, the Callovian–Oxfordian formation in the
study area can be divided into seven rock types.
(2) The results of this study show that the rock types in uncored wells can be accurately
classified by core data using machine learning and well log data. Accurate classifi-
cation of rocks can greatly improve the accuracy of well log interpretation and the
reliability of research results with respect to sedimentary microfacies.
Sustainability 2023, 15, 8868 14 of 15

(3) Four machine learning algorithms were evaluated, including KNN, GBM, random
forest, and MLP. Based on the cross-validation and evaluation results, the GBM has
been selected for the identification of rock types in the study area. The accuracy of
this algorithm for lithology identification can reach 79%.
(4) In this study, SHAP values were used to interpret “black box” (machine learning)
models, which demonstrate high robustness and practicability and provide an effec-
tive means of global and local interpretation for rock classification models based on
machine learning.
(5) The results of this study suggested that Rock type 4 (grainstones) are the best reser-
voir rocks in the study area. These rocks are characterized by high porosity, high
permeability, low GR values, low RHOB values, high DT values, low RT values, and
low RXO values.

Author Contributions: Y.X.: Conceptualization, Methodology, Validation, Investigation, Writing—Original

Draft, Visualization, Data Curation. H.Y.: Methodology, Validation, Writing—Original Draft, Visualization.
W.Y.: Methodology, Validation, Writing—Original Draft, Visualization. All authors have read and agreed to
the published version of the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Restrictions apply to the availability of these data.
Acknowledgments: The authors would like to thank the reviewers for their helpful and constructive
comments and suggestions that greatly contributed to improving the final version of this paper.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Brendon Hall. Facies classification using machine learning. Lead. Edge 2016, 35, 906–909. [CrossRef]
2. Nishitsuji, Y.; Exley, R. Elastic impedance-based facies classification using support vector machine and deep learning. Geophys.
Prospect. 2019, 67, 1040–1054. [CrossRef]
3. Xiao, Y.; Wang, Z.; Zhou, Z.; Wei, Z.; Qu, K.; Wang, X.; Wang, R. Lithology classification of acidic volcanic rocks based on
parameter-optimized Ada Boost algorithm. Acta Pet. Sin. 2019, 40, 457–467.
4. Valentin, M.B.; Bom, C.R.; Coelho, J.M.; Correia, M.D.; De Albuquerque, M.P.; de Albuquerque, M.P.; Faria, E.L. A deep residual
convolutional neural network for automatic lithological facies identification in Brazilian pre-salt oilfield wellbore image logs.
J. Pet. Sci. Eng. 2019, 179, 474–503. [CrossRef]
5. Ning, D. An Improved Semi Supervised Clustering of Given Density and Its Application in Lithology Identification; China University of
Geosciences: Beijing, China, 2018.
6. Ju, W.; Han, X.H.; Zhi, L.F. A lithology identification method in Es4 reservoir of xin 176 block with bayes stepwise discriminant
method. Comput. Tech. Geophys. Geochem. Explor. 2012, 34, 576–581.
7. Duan, Y.; Wang, Y.; Sun, Q. Application of selective ensemble learning model in lithology-porosity prediction. Sci. Technol. Eng.
2020, 20, 1001–1008.
8. Ma, L.; Xiao, H.; Tao, J.; Su, Z. Intelligent lithology classification method based on GBDT algorithm. Pet. Geol. Recovery Effic. 2022,
29, 21–29.
9. Tang, J.; Fan, B.; Xiao, L.; Tian, S.; Zhang, F.; Zhang, L.; Weitz, D. A New Ensemble Machine Learning Framework for Searching
Sweet Spots in Shale Reservoirs. SPE J. 2021, 26, 482–497. [CrossRef]
10. Zhao, X.; Jin, F.; Liu, X.; Zhang, Z.; Cong, Z.; Li, Z.; Tang, J. Numerical study of fracture dynamics in different shale fabric facies
by integrating machine learning and 3-D lattice method: A case from Cangdong Sag, Bohai Bay basin, China. J. Pet. Sci. Eng.
2022, 218, 110861. [CrossRef]
11. Ulmishek, G.F. Petroleum Geology and Resources of the Amu Darya Basin, Turkmenistan, Uzbekistan, Afghanistan and Iran; USGS:
Reston, VA, USA, 2004; pp. 1–38.
12. Kolodzie, S. Analysis of pore throat size and use of the Waxman-Smits equation to determine OOIP in Spindle field, Colorado. In
Proceedings of the SPE Annual Technical Conference and Exhibition, Dallas, TX, USA, 21–24 September 1980. SPE-9382-MS.
13. Pittman, E.D. Relationship of porosity and permeability to various parameters derived from mercury injection-capillary pressure
curves for sandstone. AAPG Bull. 1992, 76, 191–198.
Sustainability 2023, 15, 8868 15 of 15

14. Amaefule, J.O.; Altunbay, M.; Tiab, D.; Kersey, D.G.; Keelan, D.K. Enhanced reservoir description using core and log data to
identify hydraulic flow units and predict permeability in un-cored intervals/wells. In Proceedings of the SPE Annual Technical
Conference and Exhibition, Houston, TX, USA, 3–6 October 1993.
15. Tang, Q. DPS Data Processing System—Experimental Design, Statistical Analysis and Data Mining, 2nd ed.; Science Press: Beijing,
China, 2010.
16. Barnett, V.; Lewis, T. Outliers in Statistical Data, 3rd ed.; John Wiley & Sons: Chichester, UK, 1995.
17. Hu, L.; Gao, W.; Zhao, K.; Zhang, P.; Wang, F. Feature Selection Considering Two Types of Feature Relevancy and Feature
Interdependency. Expert Syst. Appl. 2018, 93, 423–434. [CrossRef]
18. Breiman, L. Arcing the Edge; Technical Report 486; Statistics Department, University of California, Berkeley: Berkeley, CA,
USA, 1997.
19. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
20. Kuhn, S.; Cracknell, M.J.; Reading, A.M. Lithological mapping in the Central African copper belt using random forests and
clustering: Strategies for optimised results. Ore Geol. Rev. 2019, 112, 103015. [CrossRef]
21. Kuhn, M. Building Predictive Models in R Using the caret Package. J. Stat. Softw. 2008, 28, 1–26. [CrossRef]
22. Altman, N.S. An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. Am. Stat. 1992, 46, 175–185.
23. James, G.; Witten, D.; Hastie, T.; Tibshirani, R. An Introduction to Statistical Learning with Applications in R; Springer: New York, NY,
USA, 2013.
24. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998,
86, 2278–2324. [CrossRef]
25. Nielson, M.A. Neural Networks and Deep Learning; Determination Press: San Francisco, CA, USA, 2015.
26. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [CrossRef]
27. Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. In Proceedings of the 31st Conference on Neural
Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

Petrel Ibooks Jan 2017 PDF
100% (2)
Petrel Ibooks Jan 2017 PDF
86 pages
IC - Introduction To Petroleum Geoscience
100% (1)
IC - Introduction To Petroleum Geoscience
323 pages
Petroleum Geology
100% (3)
Petroleum Geology
55 pages
Spe Competency Matrix
No ratings yet
Spe Competency Matrix
5 pages
Rock Typing 1572604212
No ratings yet
Rock Typing 1572604212
41 pages
Gas Lift Valve Mechanics
100% (3)
Gas Lift Valve Mechanics
22 pages
M5 Log Upscaling
100% (3)
M5 Log Upscaling
16 pages
Basic Well Logging - CHAPTER 2
100% (2)
Basic Well Logging - CHAPTER 2
57 pages
Novel Approach To Quantifying Deepwater Laminated Sequences Using Integrated Evaluation of LWD Real-Time Shear, Porosity, Azimuthal Density and High-Resolution Propagation Resistivity
No ratings yet
Novel Approach To Quantifying Deepwater Laminated Sequences Using Integrated Evaluation of LWD Real-Time Shear, Porosity, Azimuthal Density and High-Resolution Propagation Resistivity
16 pages
Coal Sampling Procedure - Nopw
100% (1)
Coal Sampling Procedure - Nopw
16 pages
Reservoir Description: Subsurface Data From Geological and Engineering Sources
100% (2)
Reservoir Description: Subsurface Data From Geological and Engineering Sources
35 pages
Integrated Rock Typing and Flow Unit Iran
No ratings yet
Integrated Rock Typing and Flow Unit Iran
38 pages
Petrel Introduction
No ratings yet
Petrel Introduction
25 pages
TC Okq 924 PDF
No ratings yet
TC Okq 924 PDF
213 pages
PVD Training Course Portfolio 2014
0% (1)
PVD Training Course Portfolio 2014
16 pages
Falloff Test Whats
100% (1)
Falloff Test Whats
80 pages
Leveraging Machine Learning For Lithology Discrimination
No ratings yet
Leveraging Machine Learning For Lithology Discrimination
18 pages
Comparison of Different Machine Learning Algorithms
No ratings yet
Comparison of Different Machine Learning Algorithms
13 pages
Mineral Resource Estimation Using A Combination of Drilling and IP-Rs Data Using Statistical and Cokriging
100% (1)
Mineral Resource Estimation Using A Combination of Drilling and IP-Rs Data Using Statistical and Cokriging
20 pages
Well Logging Tools and Techniques
No ratings yet
Well Logging Tools and Techniques
295 pages
RRT
No ratings yet
RRT
15 pages
ID 008 Uncertainty in Well Log Analyses
No ratings yet
ID 008 Uncertainty in Well Log Analyses
7 pages
Introduction To Petroleum Engineering - Lecture 13 Final - Formation Evaluation - Well Logging
No ratings yet
Introduction To Petroleum Engineering - Lecture 13 Final - Formation Evaluation - Well Logging
30 pages
Resistivity Logs PDF
No ratings yet
Resistivity Logs PDF
28 pages
No - Uis Inspera 78834918 47156456
No ratings yet
No - Uis Inspera 78834918 47156456
194 pages
Xu 2021
No ratings yet
Xu 2021
54 pages
SLB Brouchure
No ratings yet
SLB Brouchure
3 pages
Methodology For San Narciso Water District
No ratings yet
Methodology For San Narciso Water District
69 pages
Waiting Set Up Time Cement
No ratings yet
Waiting Set Up Time Cement
7 pages
1 s2.0 S0098300419309239 Main
No ratings yet
1 s2.0 S0098300419309239 Main
13 pages
Petrological Controls On The Engineering Propertie
No ratings yet
Petrological Controls On The Engineering Propertie
26 pages
Rock Classification From Field Image Patches Analyzed Using A Deep Convolutional Neural Network
No ratings yet
Rock Classification From Field Image Patches Analyzed Using A Deep Convolutional Neural Network
17 pages
Eng 04 00139
No ratings yet
Eng 04 00139
25 pages
Dynamic Rock Type Characterization Using Artificial Neural Networks in Hamra Quartzites Reservoir: A Multidisciplinary Approach
No ratings yet
Dynamic Rock Type Characterization Using Artificial Neural Networks in Hamra Quartzites Reservoir: A Multidisciplinary Approach
8 pages
Reservoir Insights Enabled by Machine Learning Technology A Supervised Machine Learning Method For Probabilistic Rock Type Prediction
No ratings yet
Reservoir Insights Enabled by Machine Learning Technology A Supervised Machine Learning Method For Probabilistic Rock Type Prediction
33 pages
Reservoir Rock Typing For Optimum Permeability Pre
No ratings yet
Reservoir Rock Typing For Optimum Permeability Pre
22 pages
Data-Driven Machine Learning
No ratings yet
Data-Driven Machine Learning
22 pages
Eng 05 00012
No ratings yet
Eng 05 00012
29 pages
A GUIDE TO ROCK CORE LOGGING Part 1
No ratings yet
A GUIDE TO ROCK CORE LOGGING Part 1
19 pages
Identifying Payable Cluster Distributions
No ratings yet
Identifying Payable Cluster Distributions
22 pages
A Review On Advancements in Lithological Mapping
No ratings yet
A Review On Advancements in Lithological Mapping
17 pages
Important Map
No ratings yet
Important Map
14 pages
1 s2.0 S294989102300828X Main
No ratings yet
1 s2.0 S294989102300828X Main
20 pages
Processes
No ratings yet
Processes
19 pages
Powerlog: Powerlog - Well Log Analysis
No ratings yet
Powerlog: Powerlog - Well Log Analysis
4 pages
Article Diagraphie Main
No ratings yet
Article Diagraphie Main
18 pages
Prediction of Reservoir Brittleness From Geophysical Logs Using Machine Learning Algorithms
No ratings yet
Prediction of Reservoir Brittleness From Geophysical Logs Using Machine Learning Algorithms
16 pages
Li 2018
No ratings yet
Li 2018
14 pages
Dr. Fadhil Luctures-6
No ratings yet
Dr. Fadhil Luctures-6
12 pages
1 s2.0 S0920410518302201 Main
No ratings yet
1 s2.0 S0920410518302201 Main
18 pages
IJOGST Volume7 Issue4 Pages13-35
No ratings yet
IJOGST Volume7 Issue4 Pages13-35
24 pages
The Application of Fuzzy Logic and Genetic Algorithm
No ratings yet
The Application of Fuzzy Logic and Genetic Algorithm
24 pages
Two-Step Clustering For Mineral Prospectivity Mapping A Case Study From The Northeastern Edge of The Jiaolai Basin, China
No ratings yet
Two-Step Clustering For Mineral Prospectivity Mapping A Case Study From The Northeastern Edge of The Jiaolai Basin, China
20 pages
Identification of Lithology From Well Log Data Usi
No ratings yet
Identification of Lithology From Well Log Data Usi
10 pages
Arma Igs 2024 0665
No ratings yet
Arma Igs 2024 0665
16 pages
Clustring Analysis Carbonate
No ratings yet
Clustring Analysis Carbonate
13 pages
RG Cross Disciplinary Machinelearning MAIN
No ratings yet
RG Cross Disciplinary Machinelearning MAIN
21 pages
Data Load Into PETREL
No ratings yet
Data Load Into PETREL
2 pages
Classification of Rock Facies Using Deep Convoluti
No ratings yet
Classification of Rock Facies Using Deep Convoluti
18 pages
1 s2.0 S0920410518304960 Main
No ratings yet
1 s2.0 S0920410518304960 Main
11 pages
Reservoir Zonation
No ratings yet
Reservoir Zonation
14 pages
Seismic Inversion With Deep Learning
No ratings yet
Seismic Inversion With Deep Learning
15 pages
Clustring
No ratings yet
Clustring
11 pages
1 s2.0 S0920410517308094 Main
No ratings yet
1 s2.0 S0920410517308094 Main
12 pages
Characterization of Flow Units Rock and
No ratings yet
Characterization of Flow Units Rock and
14 pages
5 Methods Used in Iran
No ratings yet
5 Methods Used in Iran
11 pages
1 s2.0 S0098300419304686 Main
No ratings yet
1 s2.0 S0098300419304686 Main
14 pages
Energies 16 06559
No ratings yet
Energies 16 06559
19 pages
1 s2.0 S0098300413003002 Main
No ratings yet
1 s2.0 S0098300413003002 Main
9 pages
Applsci 13 11315
No ratings yet
Applsci 13 11315
15 pages
A Lithology Identification Approach Based On Machine Learning With Evolutionary Parameter Tuning
No ratings yet
A Lithology Identification Approach Based On Machine Learning With Evolutionary Parameter Tuning
5 pages
Minerals: An Enhanced Rock Mineral Recognition Method Integrating A Deep Learning Model and Clustering Algorithm
No ratings yet
Minerals: An Enhanced Rock Mineral Recognition Method Integrating A Deep Learning Model and Clustering Algorithm
17 pages
Evaluating StackingC and Ensemble Models For Enhanced - 2024 - Journal of Geoche
No ratings yet
Evaluating StackingC and Ensemble Models For Enhanced - 2024 - Journal of Geoche
14 pages
1 s2.0 S2590197422000222 Main
No ratings yet
1 s2.0 S2590197422000222 Main
7 pages
Geom+IA - Artificial Intelligence Model For Predicting Geomechanical
No ratings yet
Geom+IA - Artificial Intelligence Model For Predicting Geomechanical
12 pages
Thin Reservoir Identification Based On Logging Interpretation
No ratings yet
Thin Reservoir Identification Based On Logging Interpretation
12 pages
Cappillary DRT
No ratings yet
Cappillary DRT
12 pages
Automated Lithology Extraction From Core Photographs
No ratings yet
Automated Lithology Extraction From Core Photographs
7 pages
Stacked Vector Multi Source Lithologic Classificatio - 2021 - Remote Sensing App
No ratings yet
Stacked Vector Multi Source Lithologic Classificatio - 2021 - Remote Sensing App
9 pages
(2020) Multiple Point Geostatistical Simulation With Adaptive Filter Derived From Neural Network For Sedimentary Facies Classification
No ratings yet
(2020) Multiple Point Geostatistical Simulation With Adaptive Filter Derived From Neural Network For Sedimentary Facies Classification
11 pages
Ali Farman - CV
No ratings yet
Ali Farman - CV
3 pages
Qian Li (2021) - Prediction of Rock Abrasivity and Hardness From Mineral Composition
No ratings yet
Qian Li (2021) - Prediction of Rock Abrasivity and Hardness From Mineral Composition
13 pages
Yousifnajeeb
No ratings yet
Yousifnajeeb
18 pages
Spe 197307 MS
No ratings yet
Spe 197307 MS
10 pages
Paper SQP Dan SQs
No ratings yet
Paper SQP Dan SQs
8 pages
Critical Review Paper Sample
No ratings yet
Critical Review Paper Sample
3 pages
Jge11 4 011
No ratings yet
Jge11 4 011
7 pages
Journal of Petroleum Science and Engineering: Hong Tang, Christopher D. White
No ratings yet
Journal of Petroleum Science and Engineering: Hong Tang, Christopher D. White
6 pages
CV B.-Mukherjee
No ratings yet
CV B.-Mukherjee
5 pages
A Machine Learning Approach To Facies Classification Using Well Logs
No ratings yet
A Machine Learning Approach To Facies Classification Using Well Logs
6 pages
Pnge 450
No ratings yet
Pnge 450
4 pages
SEG2017 Application of Machine Learn
No ratings yet
SEG2017 Application of Machine Learn
5 pages
Formation Resistivity
No ratings yet
Formation Resistivity
5 pages
Chapter 1
No ratings yet
Chapter 1
3 pages
Amanda Travitzky Resume
No ratings yet
Amanda Travitzky Resume
2 pages
Facies Classification
No ratings yet
Facies Classification
1 page
Geothermal Well History
No ratings yet
Geothermal Well History
3 pages
Advances in Computed Tomography for Geomaterials: GeoX 2010
From Everand
Advances in Computed Tomography for Geomaterials: GeoX 2010
Khalid A. Alshibli
No ratings yet

Machin Well Log

Uploaded by

Machin Well Log

Uploaded by

sustainability

Academic Editors: Jizhou Tang,

Sustainability 2023, 15, 8868. https://doi.org/10.3390/su15118868 https://www.mdpi.com/journal/sustainability

Figure 4. Photomicrographs of 7 rock types identified in the Callovian–Oxfordian stage.

Table 1. Porosity, permeability, and lithology by rock type.

Median Porosity Median Permeability

3.2.2. Data Preprocessing

(1) Data collection

Table 2. Statistical distribution of the log data set.

Table 3. Average values of logging parameters for different rock types.

Rock Types GR RHOB DT RT RXO

Sustainability 2023, 15, 8868

11 ∑∑x ∑∑y ((𝑥

3.2.3. Machine Learning

KNN is a nonparametric regression and classification technique that uses a predefined

3.3. K-Fold Cross-Validation

Table 4. Summary of optimal hyperparameters for different supervised learnings.

Methods Optimal Hyperparameter Values

Table 5. Summary of cross-validation accuracy for different supervised learnings.

Machine Learning Cross Validation Cross Validation

4. Evaluation and Application of Machine Learning

Machine Learning Algorithms AUC Model Accuracy (%)

Author Contributions: Y.X.: Conceptualization, Methodology, Validation, Investigation, Writing—Original

You might also like