1 s2.0 S0920410522004855 Main

Journal of Petroleum Science and Engineering 215 (2022) 110610
Contents lists available at ScienceDirect
Journal of Petroleum Science and Engineering

journal homepage: www.elsevier.com/locate/petrol
Application of machine learning in the identification of fluvial-lacustrine

lithofacies from well logs: A case study from Sichuan Basin, China
Dongyu Zheng a, b, Mingcai Hou a, b, Anqing Chen a, b, *, Hanting Zhong a, b, Zhe Qi b, Qiang Ren a, b,
Jiachun You a, Huiyong Wang c, Chao Ma a, b, **
a
State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Chengdu University of Technology, Chengdu, 610051, China
b
Key Laboratory of Deep-time Geography and Environment Reconstruction and Applications, MNR & Institute of Sedimentary Geology, Chengdu University of
Technology, Chengdu, 610059, China
c
Sinopec Exploration & Production Research Institute, Beijing, 100083, China
A R T I C L E I N F O A B S T R A C T
Keywords: The lithofacies identification is critical for forecasting sweet spots of hydrocarbon explorations. Well logs are
Machine learning widely used in lithofacies identifications because they are petrophysical measurements of subsurface stratigraphy
Fluvial-lacustrine lithofacies which reflect lithological successions and depositional processes. The traditional lithofacies identification from
XGBoost
well logs is a manual work that is time-consuming and bias-prone. An automated and bias-free method is in
Resampling
Well log
demand. To this end, we created a lithofacies dataset of eleven wells with well log records and lithofacies de
Sichuan Basin scriptions that were interpreted manually based on facies analysis of drilling cutting descriptions and well logs.
Then we developed machine learning models that were trained using the lithofacies dataset of the fluvial-
lacustrine Upper Triassic Xujiahe and Lower Jurassic Ziliujing formations in Yuanba Area, northern Sichuan
Basin of southwestern China. By employing extreme gradient boosting and resampling algorithms, this machine
learning model is efficient and outperforms support vector machine and multiple-layer perceptron, as indicated
by its highest accuracy and F1-score of 0.90, the highest AUC of 0.94, as well as the shortest training time.
Moreover, the result suggests that resampling is necessary for lithofacies identification with the imbalanced
dataset. A combined method of oversampling and undersampling is better than a single resampling method. This
study presents a successful application of machine learning in fluvial-lacustrine lithofacies identification from
well logs and suggests the great potentiality of machine learning in subsurface hydrocarbon explorations.
Credit author statement 1. Introduction
DYZ, analysis, software, writing original draft & editing; MCH, Lithofacies is a combination of rocks that embody abundant infor
conceptualization, supervision, fund acquisition; AQC, conceptualiza mation from different examples under the same depositional conditions.
tion, data preparation, analysis, writing & editing; HTZ, data prepara The knowledge of lithofacies is imperative in predicting lithology dis
tion, conceptualization; ZQ, data preparation; QR, data preparation; tributions and alignment of stratigraphic units when only limited data
JCY, data preparation; HYW, data preparation; CM, conceptualization, are available (Allen, 1975; Miall, 1995), and this knowledge is critical to
writing & editing, fund acquisition. All authors read and approved the reconstructing the palaeogeography of the ancient Earth and targeting
manuscript. sweet spots of hydrocarbon explorations (e.g., Horne et al., 1978;
Catuneanu, 2006; Nielsen and Schovsbo, 2011; Laya and Tucker, 2012;
* Corresponding author. State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Institute of Sedimentary Geology, Chengdu University of
Technology, Chengdu, 610051, China.
** Corresponding author. State Key Laboratory of Oil and Gas Reservoir Geology and Exploitation, Institute of Sedimentary Geology, Chengdu University of
Technology, Chengdu, 610051, China.
E-mail addresses: [email protected] (D. Zheng), [email protected] (M. Hou), [email protected] (A. Chen), [email protected] (H. Zhong),
[email protected] (Z. Qi), [email protected] (Q. Ren), [email protected] (J. You), [email protected] (H. Wang), [email protected]
(C. Ma).
https://doi.org/10.1016/j.petrol.2022.110610
Received 30 December 2021; Received in revised form 4 May 2022; Accepted 6 May 2022
Available online 10 May 2022
0920-4105/© 2022 Published by Elsevier B.V.
D. Zheng et al. Journal of Petroleum Science and Engineering 215 (2022) 110610
Zhu et al., 2014; Chen et al., 2020; Zheng and Yang, 2020; Zheng and 2. Data
Wu, 2021).
Well logs are ubiquitous in subsurface exploration and they are 2.1. Geological background of the areas of the studied wells
normally continuous and sampled in uninterrupted sections. As well logs
directly measure the petrophysical characteristics of subsurface rocks, The selected intervals of well logs are from the Upper Triassic
they can reflect lithological, textural and structural changes, as well as Xujiahe to Lower Jurassic Ziliujing formations in Yuanba Area of
stacking patterns of lithology, which are critical to understanding lith Sichuan Basin. The Sichuan Basin, with an area of 180,000 km2, is one of
ofacies (Selley, 1976; Rider, 1990; Nazeer et al., 2016). Therefore, well the largest petroliferous basins in China. The Sichuan Basin is flanked by
logs facilitate the spatial and temporal correlations of subsurface stra the Longmen Mountains to the west, the Qinling Orogenic Belt to the
tigraphy and are widely used in oil and gas reservoir predictions (Del north, the Xuefeng Mountains to the east, and Kangdian High Land to
finer et al., 1987; Lim et al., 1997; Asquith and Krygowski, 2004; Tan the south (Fig. 1; Meng et al., 2005). Sichuan Basin experienced three
et al., 2015; Lai et al., 2018; Zheng et al., 2021). major stages of tectonic evolutions and was a foreland basin during Late
Despite the common use of well logs in lithofacies identifications, Triassic to Late Cretaceous (Liu et al., 2018).
there are two major limitations. First, lithofacies are largely interpreted Yuanba Area is in the northern part of the Sichuan Basin and is a
based on gamma-ray well logs, whereas the rest well logs are supportive medium-giant gas and oil field, in which the Xujiahe Formation and
(Allen, 1975; Rider, 1990; Cant, 1992; Asquith and Krygowski, 2004). A Ziliujing formations were deposited in a fluvial-lacustrine system and
comprehensive interpretation using multiple well logs simultaneously is are target intervals of tight sand gas with more than 1000 × 108 m3 gas
required for detailed lithofacies interpretations. However, the manual reserves (Fig. 1; Ma et al., 2010; Zheng et al., 2011; Guo et al., 2013).
works are difficult to handle with multiple well logs and sometimes may Five members (T3x1-T3x5 from bottom to top) are subdivided from the
neglect abundant useful information (Rider, 1990; Radwan, 2021). Xujiahe Formation. The T3x1, T3x3, and T3x5 are siltstone, siltstone with
Second, lithofacies identifications from well logs require huge efforts interbedded fine-grained sandstone, mudstone, and mudstone with
from experienced interpreters, thus it will increase the cost and hinder sandy or silty interbeds; the T3x2 and T3x4 are lithic arkose and feld
the efficiency. To date, deep subsurface explorations require huge vol spathic litharenites (Zhang et al., 2016; Li and He, 2014). The Ziliujing
umes of geo-dataset to reconstruct detailed paleogeographic settings Formation is subdivided into Zhenzhuchong, Dongyuemiao, Ma’anshan,
(Wang et al., 2021). A fast and efficient interpretation method is and Da’anzhai members from bottom to up (Li and He, 2014).
necessary, and machine learning is an optimal solution that can facilitate Coarse-grained rocks mainly occur in the Zhenzhuchong member;
researchers to extract useful information and gain new insights from the fine-grained rocks occur in the Dongyuemiao, Ma’anshan, and Da’anz
explosive datasets (Jordan and Mitchell, 2015). hai members (Li and He, 2014).
Research on applications of machine learning algorithms in the
lithofacies identification from well logs has been widely conducted in 2.2. Selected well log types and well log preprocessing
the past three decades. These methods include multi-dimensional ana
lyses, support vector machine, k-nearest neighbors, artificial neural In this study, eleven wells with a total thickness of over 13,800 m and
network and its transformers (e.g., Baldwin et al., 1990; Rogers et al., complete Xujiahe and Ziliujing formations from Yuanba Area were
1992; Bhatt and Helle, 2002; Dubois et al., 2007; Hall, 2016; Bestagini studied (Fig. 1). Eight types of well logs are selected in this study,
et al., 2017; Al-Mudhafar, 2017; Bize-Forest et al., 2018). Recent at including caliper well log (CAL), gamma-ray well log (GR), gamma-ray
tempts of facies recognition compared the capability of artificial neural without uranium well log (KTH), deep investigate double lateral re
networks, support vector machine, and random forest (Deng et al., sistivity log (RD), shallow investigate double lateral resistivity log (RS),
2019), and used Beier score to estimate the performance of machine compensated neutron log (CNL), density log (DEN), and acoustic log
learning models (Feng, 2021). However, these applications achieved an (AC). Data preprocessing were performed before lithofacies identifica
overall limited accuracy because they are incapable of solving general tion to avoid influences of depth offsets, disfunction of well log de
ized problems, and they yield abundant hyper-parameters tunings and tectors, and differences in value ranges of various well log types. The
computation costs (Jordan and Mitchell, 2015). Additionally, the well log was recorded for every 0.125 m, and a total of 109,894 valid
imbalanced dataset of lithofacies reduces the accuracy of the machine records of well log values were obtained after data preprocessing (Fig. 2;
learning algorithms as well (Chawla, 2009; Longadge et al., 2013). Appendix).
Although the imbalanced dataset exists in real exploration projects, the The data preprocessing procedures include:
potential influences of the imbalanced dataset and the relevant solutions
were not discussed in previous publications. (1) Depth calibration. The depth calibration of well logs is required
To offer a more accurate and efficient machine learning application to obtain accurate lithofacies interpretations because well logs
that can work on projects with imbalanced lithofacies, we collected a and cores/cuttings usually have depth offset. As mudstone/shale
dataset of eleven wells with a total thickness of 13,800 m from the Upper have higher GR values than sandstones/conglomerates, the GR
Triassic Xujiahe and Lower Jurassic Ziliujing formations in Sichuan log was used to calibrate the depth by moving the well logs to
Basin. Detailed lithofacies descriptions that were interpreted manually match the intervals of marker beds.
from facies analysis of drilling cutting descriptions and electrofacies of (2) Removal of invalid values. The raw well log data have values,
well logs. Based on this dataset, we then compared support vector ma such as − 999, − 9999, or 0. These values cannot reflect the real
chine (SVM), multiple-layer perceptron (MLP), and extreme gradient conditions of the subsurface rock formations, instead, they are
boosting (XGBoost) classifiers with over- and under-sampling algo likely caused by the disfunction of well log detectors. Therefore,
rithms. The XGBoost classifier with the combined over- and under- these invalid values were removed.
sampling algorithms obtained the best performance. The machine (3) Data standardization. To avoid the influences caused by
learning model in this study overcomes the ubiquitous problems of tremendous differences in value ranges of well logs, the raw
previous methods for lithofacies identifications, such as low prediction dataset was standardized before the training of machine learning
accuracy and incapability of solving the imbalanced dataset from real models. The procedure follows:
subsurface projects. Our results indicate that machine learning algo
1∑n
rithms can provide reliable, efficient, and bias-free lithofacies identifi μ= xi (1a)
n i=1
cations, which has great potentialities to facilitate hydrocarbon
explorations.
2
Fig. 1. Geological maps. a) sketch map of China. Yellow highlighted area is Sichuan Basin; b) Tectonic divisions of Sichuan Basin (adapted from Ma, 2008). The red
square is the Yuan Area that is shown in Fig. 1c; c) Location map of the studied wells in Yuanba Area. Blue dots are wells with interpreted lithofacies that were used
for the implementation of machine learning models. (For interpretation of the references to color in this figure legend, the reader is referred to the Web version of
this article.)
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
1 ∑n BCCS, which is represented by its cylindrical feature (Fig. 4a). To
σ= (xi − μ)2 (1b) distinguish MCS from BCCS, intervals of fluvial channel lithofacies with
n − 1 i=1
more mud contents were classified as MCS, while intervals with less mud
xi ,scaled =
xi − μ
(1c) contents were classified as BCCS. Labels of MCS account for 18% of the
σ total lithofacies.
Longitudinal/transverse bar sandstone lithofacies (LTBS): this
where n is the sample number (109,894 in this study), μ is the average, σ
type of lithofacies lies on the upper part of the braided stream channels,
is the standard deviation, xi ,scaled is the standardized well log value of the
which is typified by its cross-bedded sandstones, conglomeratic sand
ith sample.
stones, and sandy mudstones. The log curve is bell-shaped with an
overall fining upward feature (Fig. 4b). Labels of these lithofacies ac
2.3. The interpreted lithofacies count for 15% of the total lithofacies.
Point bar sandstone and mudstone lithofacies (PBSM): this type
Lithofacies of eleven wells were interpreted based on facies analysis of lithofacies lies on the upper part of the meandering stream channels
of drilling cutting/core description against electrofacies of well logs. with typical cross-bedded sandstone and mudstones. The log curves of
Xujiahe and Ziliujing formations has been deposited in the braided and these lithofacies are also bell-shaped (Fig. 4b). Compared to LTBS, the
meandering river, fan and fluvial delta, and lacustrine depositional en particle sizes of point bars are finer and values of gamma-ray are higher,
vironments (Zheng et al., 2011). In this study, we further classified the accordingly. Labels of PBSM account for 13% of the total lithofacies.
Xujiahe and Ziliujing formations as nine major lithofacies based on Alluvial plain sandstone and mudstone lithofacies (APSM): to
cutting descriptions and well log interpretations. These lithofacies were distinguish the rivers plains lithofacies of braided streams from
interpreted based on lithologies, sedimentary textures and structures, meandering streams, the alluvial plain and floodplain lithofacies are
stacking patterns of the lithology from cutting descriptions, and also further classified. The APSM consist of sandstones, sandy mudstones,
interpreted in terms of the shapes of log curves where no cuttings are mudstone, and coal beds, with irregular log curves of high gamma
available (Figs. 3 and 4; Table 1). In this study, delta plains, delta front, values (Fig. 4c), indicating high contents of muddy components with no
and prodelta were not differentiated. Instead, the interdistributary clear changes in mud contents. Labels of APSM account for 8% of the
channels were incorporated into the channel subfacies due to their total lithofacies.
similar lithological and log curve features; moreover, the rest parts of Floodplain mudstone lithofacies (FPM): the FPM consist of sandy
deltas were merged into the mouth bar subfacies because they share the mudstones, mudstones, and coal beds and are featured with irregular log
coarsening upward feature of lithology. The nine subfacies used in this curves. Compared with APSM, FPM contains more muddy components
study are: with higher values of the gamma-ray log. Labels of FPM account for 17%
Braided channel conglomerate and sandstone lithofacies of the total lithofacies.
(BCCS): consists of conglomerate, conglomeratic sandstone, sandstone, Crevasse splay sandstone and mudstone lithofacies (CSSM): the
and trace amounts of mudstone. The log feature is the cylindrical shape lithology of crevasse splay includes sandstone, siltstone, and mudstone,
(Cant, 1992, Fig. 4a), suggesting low mud content and no clear trend of with funnel-shaped log curves (Fig. 4d), suggesting a coarsening upward
grain size changes. Labels of braided streams account for 11% of the trend. Labels of CSSM account for 1% of the total lithofacies.
total lithofacies (Fig. 5). Mouth bar sandstone and mudstone lithofacies (MBSM): mouth
Meandering channel sandstone lithofacies (MCS): consists of bar is developed where the mouth of a river meets the standing body of
sandstone, mudstone, and trace amounts of conglomerate and water. The main lithology includes sandstones, siltstones, and
conglomeratic sandstone. The log feature of MCS is similar to that of
3
mudstones. The log curve is funnel-shaped indicating an overall coars that uses the interpreted dataset to make future predictions. In this
ening upward trend (Fig. 4d). Labels of MBSM account for 8% of the study, we selected eight wells for training, one well for validation, and
total lithofacies. two wells for test.
Shallow lake sandstone and mudstone lithofacies (SLSM): in The functions of these three datasets are different. The training
cludes lithofacies from littoral to sublittoral zones of the lacustrine en dataset is used to create the machine learning model, the validation
vironments. The main lithology includes sandstones, sandy mudstones, dataset is used to tune the hyper-parameters, and the test dataset is used
and mudstone. The log curve is irregular (Fig. 4e), indicating no clear to evaluate the model’s accuracy (Goodfellow et al., 2016). This study
grain size changes. Labels of SLSM account for 9% of the total adopted three classifiers, including support vector machine (SVM),
lithofacies. Multiple-layers Perceptrons (MLP), and Extreme gradient boosting
(XGBoost) to make lithofacies predictions. The grid search method was
3. Methodology applied to find the best hyper-parameters for each model (see Table 2 for
the tuned hyper-parameters). As the XGBoost outperforms SVM and
3.1. The implementation of machine learning models MLP classifiers (see Results), explanations of SVM and MLP classifiers
are not discussed in this study. Details can be viewed from Vapnik
The machine learning in this study is a task of supervised learning (1998) and Hinton and Osindero,.
Fig. 2. Crossplot of well logs with labels of lithofacies. See text for abbreviations of well logs and lithofacies.
4
Fig. 3. Well log curves and interpreted lithofacies. (a) One selected well with eight well log curves, lithology, and interpreted lithofacies. Regarding the legends of
lithology, yellow is mudstone, green is sandstone, red is conglomerate. Regarding the lithofacies, 0-BCCS; 1-MCS; 2-LTBS; 3-PBSM; 4-APSM; 5-FPM; 6-CSSM; 7-
MBSM; 8-SLSM. (b) The interpreted lithofacies for the rest ten wells. (For interpretation of the references to color in this figure legend, the reader is referred to the
Web version of this article.)
The XGBoost is an ensemble machine learning algorithm with a deviations between the prediction y and the real y values. In this study,
parallel tree boosting (GBDT) that can deal with big datasets in an the multiple log loss function is selected as the loss function and is
efficient way (Chen and Guestrin, 2016). XGBoost combines the adap defined as:
tive boosting approach with the efficient optimization method; conse ( )
∑M
1 ∑N
quently, XGBoost runs more than ten times faster than other existing L= −
( )
yij ∗ Ln pij (5)
popular algorithms on a single machine and can obtain optimal results j
N i
with little effort in hyper-parameter tuning (Chen and Guestrin, 2016).
XGBoost classifier is a supervised learning model that consists of a bunch where N is the sample numbers, M is the class numbers, yij is the ith
of decision trees. Suppose dataset D contains n examples with m features sample with jth class that is either 0 or 1, pij is the classification prob
D = {(xi , yi ), xi ∈ Rm , yi ∈ R}, the output of an ensemble of K trees is, ability predicted by the classifier for the ith sample with jth class.
The second term is the regularization function that controls the
f (x) = wq (x) (2)
complexity of the model and avoids overfitting. In XGBoost, the regu
larization function is defined as,
∑
K
yi = fk (xi ) (3)
1 ∑ T
k=1 Ω (f ) = γT + λ w2j (6)
2 j=1
where x is the input well log dataset, and y is the lithofacies, wq is the
score of the associated leaf q. where T is the number of leaves, γ is the pseudo-regularization hyper
The objective function to be optimized is parameter, λ is the L2 norms, and w is the weight.
∑
n
[( ) ] ∑t
J(t) = y t−i 1 + ft (xi ) +
L yi , ̂ Ω (fi ) (4) 3.2. Data resampling, validation, and model evaluation
i=1 i=1
The data resampling algorithms include oversampling and under

where the first term in the L function is the loss function that defines the
sampling that are performed to remove the influences caused by the
5
Fig. 4. Typical well log shapes of lithofacies in a fluvial-lacustrine system, including a) cylindrical, b) bell, c) irregular, d) funnel shape, and e) symmetrical shapes.
Adapted from Cant (1992).
Table 1
Typical characteristics of the studied lithofacies.
Lithofacies Lithology Sedimentary textures and structures Textural stacking Well log shapes
pattern
Braided channel conglomerate Conglomerate, conglomeratic Low-relief erosion surface, imbricated structure, cross Fining-upward Low gamma, high
and sandstone lithofacies sandstone, sandstone bedding. porosity, cylindrical
(BCCS)
Transverse/longitudinal bar Sandstone dominant, Cross bedding, gravel sheet, imbricated structure, parallel Fining-upward Low gamma, high
sandstone lithofacies (LTBS) conglomerate bedding, sand bedding, rhombohedral gravel mat body, porosity, bell
reacting surface structure
Meandering channel sandstone Sandstone, conglomerate High-relief erosion surface, imbricated structure, cross Fining-upward Low gamma, high
lithofacies (MCS) bedding. porosity, cylindrical
Point bar sandstone and Sandstone, siltstone, mudstone Scour surface, lateral accretion structure, cross bedding, Fining-upward Low gamma, high
mudstone lithofacies (PBSM) parallel bedding porosity, bell
Floodplain mudstone Mudstone, siltstone, Horizontal bedding, mud crack, ripple mark No trend High gamma, high
lithofacies (FPM) carboniferous mudstone, coal porosity, irregular
Alluvial plain sandstone and Sandstone, mudstone, siltstone, Ripple mark, exposure marks, horizontal bedding, mud No trend High gamma, high
mudstone lithofacies (APSM) carboniferous mudstone, coal crack, porosity, irregular
Crevasse splay sandstone and Siltstone, fine-grained Ripple lamination, scour surface Coarsening-upward High gamma, high
mudstone lithofacies (CSSM) sandstone porosity, funnel
Mouth bar sandstone and Sandstone, siltstone, mudstone Climbing ripple, small-scale festoon-type cross bedding Coarsening-upward High gamma, high
mudstone lithofacies (MBSM) porosity, funnel
Shallow lake sandstone and Sandstone, siltstone, mudstone Bi-directional cross bedding, horizontal bedding, ripple Coarsening-upward High gamma, high
mudstone lithofacies (SLSM) lamination, ripple mark then fining-upward porosity, symmetrical
imbalanced raw dataset by creating a balanced dataset. The over The performance of these three data resampling methods was compared;
sampling methods create synthetic samples to increase the proportions the best method was suggested in Section 5.1.
of the rare samples; whereas, the undersampling methods reduce the To create the most accurate machine learning models and protect
samples to decrease the proportions of the abundant samples. The against overfitting, the dataset was split into training, validation, and
Synthetic Minority Oversampling Technique (SMOTE; Chawla et al., test datasets. For comparative analysis of the performances of models,
2002) and Neighborhood Cleaning Rule (NCR; Laurikkala, 2001) are the accuracy, F1-score, and area under the curve (AUC) were selected to
selected oversampling and undersampling methods. Additionally, a evaluate the model performance. The accuracy is defined as:
combined method of SMOTE and NCR was also performed in this study.
6
the higher the values are, the better the model is. To visualize the pre
dicted results, the normalized confusion matrix was used (Fig. 6a).
AUC measures the area under the Receiver Operating Characteristics
(ROC) curve, in which the x-axis is the false positive rate (FPR) and the
y-axis is the true positive rate (TPR; Fig. 6b). The inflection point of the
ROC curve of a perfect classifier would fall into the top-left corner of the
ROC graph with the TPR of 1 and FPR of 0. The TPR and FPR are defined
as:
TP
TPR = (9a)
TP + FN
FP
FPR = (9b)
TN + FP
Then AUC is defined as:
∫1
AUC = TPR(x)dx (9c)
0
Fig. 5. Lithofacies distribution of the studied dataset. See text for

where x is the FPR.
abbreviations.
4. Results
Table 2 The result of each model was obtained from a fine-tuned optimal
Hyper-parameters for grid search. model using the grid search method (Table 2). Additionally, each model
MLP SVM XGBoost adopted data resampling methods to increase its performance. For
comparative analysis, accuracy, F1-score, and AUC were used to eval
Hidden layer 2, 3, 4, 5, 6, 7, 8
numbers uate the model’s performance.
Neuron numbers 10, 20, 50, 100
Learning rate 0.001, 0.01, 0.03, 0.01, 0.03, 0.10,
4.1. SVM performance
0.10, 0.30 0.30
Kernel Linear, poly,
sigmoid The optimal performance of the SVM classifier was obtained from the
Degree 3, 5, 7, 9 model with a seven-degree polynomial kernel function. Overall, the
Tree depth 3, 4, 5, 8, 10, 12, 15,
SVM failed to identify lithofacies. The SVM classifier obtained the ac
17, 20
curacy, F1-score, and AUC of 0.41, 0.37, and 0.61 on the training
The optimum hyper-parameters were selected for machine learning training and dataset, and accuracy, F1-score, and AUC of 0.41, 0.37, and 0.61 on the
bolded in this table. test dataset (Table 3; Figs. 7 and 9). The limited variations between the
performances on the training and test datasets suggest that the SVM
classifier was unable to extract the complicated relationships in the
Accuracy =
TP + TN
(7) dataset of well logs and lithofacies. In the normalized confusion matrix
TP + TN + FP + FN of the test dataset, the column of the APSM showed dark blue color,
F1-score is defined as: suggesting that most lithofacies were mislabeled by APSM (Fig. 8a).
Moreover, the training process of the SVM classifier took 359.50 s
F1 =
2TP
(8) (Table 3).
2TP + FP + FN The data resampling methods improved the performance of the SVM
classifier. The SVM classifier with the SMOTE (oversampling) reproc
where TP is true positive, TN is true negative, FP is false positive, and FN
essed dataset increased its accuracy on the test dataset from 0.41 to 0.42,
is false negative. Both accuracy and F1 score are values between 0 and 1,
F1-score from 0.37 to 0.40, and AUC from 0.61 to 0.63. The NCR
Fig. 6. Explanation graph of the (a) normalized

confusion matrix and (b) AUC. (a) The true positive
and true negative are the correct predictions that are
aligned along the left diagonal; on the contrary, the
false negative and false positive are incorrect pre
dictions that are located in the rest parts of the
confusion matrix. (b) The blue line represents the
ROC curve, and the gray area represents the AUC. TP,
true positive; TN, true negative; FP, false positive; FN,
false negative; TPR, true positive rate; FPR, false
positive rate. See equations (9a)-(9c) for more ex
planations. (For interpretation of the references to
color in this figure legend, the reader is referred to the
Web version of this article.)
7
Table 3
Evaluation metrics and training time of SVM.
Training dataset Validation dataset Test dataset Time
Acc F1 AUC Acc F1 AUC Acc F1 AUC 359.50s
Original 0.41 0.37 0.61 0.41 0.37 0.72 0.41 0.37 0.61
SMOTE 0.43 0.41 0.62 0.43 0.41 0.63 0.42 0.40 0.63
NCR 0.41 0.40 0.63 0.41 0.39 0.64 0.41 0.39 0.64
SMOTE + NCR 0.45 0.41 0.74 0.44 0.40 0.74 0.42 0.44 0.75
Acc is accuracy; F1 is F1-score; Time is the training time of machine learning models.
indicating that the XGBoost model was successfully implemented and

could converge quickly to reach optimal performance (Fig. 10). Overall,
the XGBoost was accurate in identifying lithofacies. More than 80% of
all lithofacies were successfully classified. Over 82% of BCCS, MCS,
APSM, FPM, and SLSM were predicted correctly. The classification of
CSSM and MBSM of XGBoost had low accuracy that 22% of the CSSM
were mislabeled by MBSM, and 17% of the MBSM were mislabeled by
SLSM (Figs. 7 and 8e).
The data resampling methods improved the performance of the
XGBoost classifier. Similar to the performance of the original XGBoost
model, XGBoost with the resampled dataset significantly reduced its
losses in the initial 200 iterations, and gradually achieved its lowest loss
value at approximately 250 iterations. Moreover, compared with the
original XGBoost, the losses shown in the training and validation data
sets decreased. The loss value of the original XGBoost was 0.29;
conversely, this value was only 0.18 in the XGBoost model with SMOTE
and NCR resampled method. The data resampled methods improved the
performances of the XGBoost model. The model with SMOTE method
improved the accuracy to 0.88 and the F1-score to 0.88; the model with
the NCR method improved the accuracy to 0.86 and the F1-score to 0.86;
Fig. 7. Accuracies, F1-scores, and AUC of the original machine learning clas the model with the combined SMOTE and NCR method was the most
sifiers and classifiers with resampling algorithms. accurate with the highest accuracy of 0.90 and the highest F1-score of
0.90 (Figs. 7, 8f and 9; Table 5).
(undersampling) method also increased the performance of the classi
fier. The F1-score increased from 0.37 to 0.39 and AUC increased from 5. Discussions
0.61 to 0.64. The SVM classifier with the combined method of SMOTE
and NCR achieved the best performance, as suggested by the highest F1- 5.1. Resampling is necessary for lithofacies identifications
score and AUC of 0.44 and 0.75, respectively (Figs. 7, 8b and 9; Table 3).
The lithofacies distribution in large river-delta systems is imbalanced
4.2. MLP performance due to the influences of sea levels or lakes levels from tectonic activities
and climate changes. Consequently, the stratigraphic sections could
The best performance of the MLP classifier was acquired from the accumulate a thick succession of progradation or retrogradation, thus
deep neural network with five hidden layers and 100 neurons in each creating a succession with imbalanced lithofacies distributions (Catu
hidden layer. The MLP classifier obtained the accuracy, F1-score, and neanu, 2006). Moreover, well drilling sites are manually selected, and
AUC of 0.78, 0.78, and 0.89 for the training dataset, and the accuracy, this process could further cause the lithofacies imbalance.
F1-score and AUC of 0.71, 0.71, and 0.85 on the test dataset (Table 4). In this study, the lithofacies distribution is highly imbalanced
The MLP classifier had good performance on most lithofacies except the (Fig. 5). For example, the CSSM, occupying only 1% of the whole lith
LTBS and SLSM, as suggested by the accuracy lower than 0.80 (Fig. 8c). ofacies, is distinctively less distributed than other lithofacies. Without
Moreover, the training process of the MLP classifier took 469.98 s data resampling, such as oversampling or undersampling, the machine
(Table 4). learning models automatically added more weights on the more
The data resampling methods also improved the performance of the enriched lithofacies, which improves the performance of the more
MLP classifier. The SMOTE method improved the accuracy of the MLP enriched lithofacies at the expense of diminishing the performance of
classifier from 0.71 to 0.80, F1-score from 0.71 to 0.80, and AUC from less enriched lithofacies. By resampling the raw dataset, the perfor
0.85 to 0.91 on the test dataset. The NCR method improved the accuracy mances of XGBoost, SVM, and MLP classifiers improved. All three
to 0.79, F1-score to 0.78, and AUC to 0.83. The MLP classifier with the resample methods, including SMOTE, NCR, and the combined method of
combined method of SMOTE and NCR achieved its highest accuracy of SMOTE and NCR, improved the performances of machine learning
0.82, and highest F1-score of 0.82 (Figs. 7, 8d and 9; Table 4). models, and the combined method, therein, is the best method. The
SMOTE method created a balanced dataset by creating synthetic data;
on the contrary, the NCR method created a balanced dataset by reducing
4.3. XGBoost performance
the proportions of the over-sampled data. Regarding these two methods,
the SMOTE was more effective, as indicated by its greater promotion of
The optimal performance of the XGBoost classifier was obtained
accuracies, F1-scores and AUC. As the SMOTE method can maintain or
from the model with 17 tree depths. The average accuracy and F1-score
even increase the original dataset size, the information in the original
were both 0.80. The XGBoost classifier reduced its loss in the initial 200
dataset is retained after SMOTE processing. Conversely, information
iterations and gradually obtained its lowest losses for training and
may lose after NCR processing because the NCR method will remove
validation datasets after 250 iterations of weight updates, thus
8
Fig. 8. Normalized confusion matrixes of SVM model performances with a) the original dataset and b) the SMOTE and NCR resampled dataset; normalized confusion
matrixes of MLP model performances with c) the original dataset and d) the SMOTE and NCR resampled dataset; Normalized confusion matrixes of XGBoost model
performances with e) the original dataset and f) the SMOTE and NCR resampled dataset. The scores were achieved on the test dataset; see text for abbreviations.
parts of the over-sampled classes. In addition to the single resample the accuracy and F1-score were both 0.90. Therefore, resampling was
method, the combined method using SMOTE and NCR was the most significant before training the machine learning model when the orig
effective in improving models’ performances. All machine learning inal dataset was imbalanced, and the combined method of oversampling
models with the combined method achieved the best performances. The and undersampling was better than a single method.
improvements were distinctive, especially for XGBoost, as suggested by
9
lithofacies identifications, improvements are still necessary. In this

study, several facies were merged to simplify the input dataset. For
example, the interdistributary channel lithofacies of deltas were labeled
as fluvial channel lithofacies because they usually present similar lith
ological and well log characteristics. However, this issue should be
overcome for a more accurate lithofacies identification in the future.
Moreover, the current XGBoost model is too sensitive to well log
changes, which may cause the over-interpretations of lithofacies. For
example, spikes exist in the lithofacies predictions (See Appendix for the
abrupt lithofacies changes within thin thickness). Any slight variations
of well log values will be regarded as lithofacies changes. However,
these changes could be induced by any heterogeneities of subsurface
rock formations rather than lithofacies changes. These variations should
be neglected but are recorded in current machine learning models.
To overcome the over-sensitivity of XGBoost, recommendations for
future work include training the dataset with suitable encoders and
decoders to extract both the low-frequency and high-frequency infor
mation, such as lithological stacking patterns and quick lithological
changes, respectively (Goodfellow et al., 2016). The mechanism of
attention is suggested to be included to suppress the noise of the well log
data (Vaswani et al., 2017). Additionally, more subsurface data, such as
core and seismic data, can be added to the machine learning models to
extract more critical information for lithofacies identifications. It is
promising to obtain a more accurate and more robust machine learning
Fig. 9. ROC curves of machine learning models with resampling algorithms.
model for lithofacies identification with the advantage of the large
See texts for abbreviations.
dataset and advanced machine learning algorithms. More importantly,
any lithofacies predicted by machine learning are required to be
5.2. Comparisons, improvements, and future applications of machine
learning models in well log-lithofacies predictions
In this study, the XGBoost outperformed the SVM and MLP models,
as indicated by its higher accuracy and fewer computation demands. The
XGBoost had the accuracy of 0.80 with the raw dataset and achieved its
highest accuracy of 0.90 with the resampled dataset, indicating that
90% of the lithofacies were successfully identified by the XGBoost
classifier. By contrast, SVM and MLP classifiers failed to provide accu
rate identifications. The SVM had an accuracy of 0.40 using both raw
and resampled datasets, suggesting that the SVM classifier was unable to
extract the relationships between well logs and lithofacies. The MLP
classifier had better performance than the SVM classifier, but the ac
curacy, F1-score, and AUC of the best-trained MLP were still lower than
the XGBoost model. Theoretically, the MLP classifier should improve its
performance by increasing hidden layers. However, a deep neural
network with excessive hidden layers may overfit the training dataset or
fail to converge due to the gradient vanish issue (Hanin, 2018). In this
study, numbers of hidden layers from two to eight were investigated, the
grid search results indicated that the best number of hidden layers was
five and extra hidden layers were incapable of improving the model’s
performance. In addition to the model’s accuracy, the SVM and MLP
require greater computation costs. Both SVM and MLP classifiers
required more than twice training time as the XGBoost classifier
(Tables 3–5). The XGBoost is a scalable tree algorithm with improved
parallel computation and gradient converging. Therefore, XGBoost was Fig. 10. Loss curves of XGBoost model with the original and resampled dataset.
the optimal algorithm in the lithofacies identification in this study. The loss used is the log loss from a Python library, scikit-learn (Pedre
Though the XGBoost classifier has an overall good performance on gosa, 2011).
Table 4
Evaluation metrics and training time of MLP.
Original 0.78 0.78 0.89 0.79 0.78 0.89 0.71 0.71 0.85
SMOTE 0.87 0.87 0.94 0.87 0.87 0.94 0.80 0.80 0.91
NCR 0.83 0.83 0.86 0.83 0.83 0.87 0.79 0.78 0.83
SMOTE + NCR 0.86 0.86 0.90 0.86 0.86 0.89 0.82 0.82 0.87
See Table 3 for abbreviations.
10
Table 5
Evaluation metrics and training time of XGBoost.
Original 0.96 0.96 0.99 0.81 0.81 0.92 0.80 0.79 0.91
SMOTE 0.98 0.98 0.99 0.88 0.88 0.92 0.87 0.87 0.92
NCR 0.97 0.97 0.99 0.86 0.86 0.94 0.85 0.85 0.94
SMOTE + NCR 0.98 0.98 0.99 0.90 0.90 0.93 0.90 0.90 0.94
See Table 3 for abbreviations.
manually checked based on domain knowledge. To solve the challenges improvements in accuracies and F1-scores. The original dataset of well
posed by the tremendous heterogeneity of lithofacies and well logs, the logs and interpreted lithofacies were resampled using SMOTE, NCR, and
manual interference with the knowledge of geological background is the combined method of SMOTE and NCR. All three resampling methods
likely one of the most reliable and efficient ways to make accurate enhanced the accuracies, F1-scores, and AUC, and the improvements
lithofacies predictions in current circumstances. using the combined method were the greatest. With the resampled
Successful applications of machine learning in lithofacies identifi dataset using the combined method, the accuracy and F1-score of the
cation are significant in subsurface exploration and palaeogeographic XGBoost model increased from 0.80 to 0.90; accuracies and F1-scores of
reconstruction. Subsurface exploration provides important natural re the SVM and MLP models also increased by about 0.1.
sources for society. Palaeogeographic reconstructions are critical to This study indicates that machine learning is a reliable and efficient
understanding Earth’s evolution and can provide insights into fields, method to identify fluvial-lacustrine lithofacies in Sichuan Basin. Sug
such as paleoclimatology, plate tectonics, and geodynamics (Cao et al., gested future works include the adoption of suitable autoencoders that
2017; Wang et al., 2021). The fundamental procedure of both subsurface can detect both low-frequency and high-frequency responses of well logs
exploration and palaeogeographic reconstruction includes lithofacies to lithofacies changes and the incorporation of more types of subsurface
identification. By incorporating datasets from longer periods and larger data. It is promising to create a machine learning model that can be used
spaces, the machine learning model can be used to understand basinal to provide accurate lithofacies identification from basin-scale to global
palaeogeographic evolution, locate target intervals in petroleum sys scale and provide insights into understanding the paleogeographic
tems, refine palaeogeographic maps (e.g., Golonka, 2007; Scotese, 2001, conditions of the Earth and forecasting the sweet spots of hydrocarbon
An atlas of Phanerozoic paleogeographic maps: the seas come in and the explorations.
seas go out), or even provide insights into understanding the palae
ogeographic reconstruction of the Earth.
Declaration of competing interest
6. Conclusions
The authors declare that they have no known competing financial
To investigate the feasibility of machine learning in lithofacies interests or personal relationships that could have appeared to influence
identifications from well logs, three machine learning algorithms and the work reported in this paper.
resampling methods were adopted in this study. The machine learning
models were trained by using the dataset of well logs and interpreted Acknowledgments
lithofacies of the Upper Triassic Xujiahe and Lower Jurassic Ziliujing
formations in the northern part of Sichuan Basin, Southwestern China. We thank Deep-time Digital Earth (DDE) program for supporting our
The results of this study indicate that the XGBoost algorithm pro project. We thank Sinopec Petroleum Exploration and Production
vides accurate lithofacies identifications and outperforms the SVM and Research Institute for providing the well data. We thank Xinbing Wang
MLP methods. The XGBoost model achieved both accuracy and F1-score and Jie Ouyang of Shanghai Jiaotong University, and Youyuan Que of
of 0.80 and AUC of 0.91, indicating that 80% of the total lithofacies were Chengdu University of Technology for their help. We thank four anon
predicted correctly. By contrast, SVM and MLP classifiers failed to pro ymous reviewers for their constructive comments. This work was
vide accurate identifications. The accuracies of SVM and MLP models financially supported by National Natural Science Foundation of China
are 0.41 and 0.71; the F1-scores are 0.37 and 0.71; and AUC are 0.61 and (Grant No. 42050104, 42050102, and 41888101), Everest Scientific
0.80. Moreover, the XGBoost required only half of the training time of Research Program of Chengdu University of Technology (Grant no.
SVM and MLP classifiers. 2020ZF11402), Open Fund (PLC20211102) of State Key Laboratory of
Additionally, resampling methods are effective to improve the per Oil and Gas Reservoir Geology and Exploitation of Chengdu University
formances of machine learning models, as suggested by the of Technology.
Appendix
Depth CAL GR KTH RS RD CNL DEN AC LF Pred_LF
1 6.15 80.4291 49.02506 1882.66084 2465.33467 23.5858 2.56983 70.07048 2 2

2 10.0826 35.64093 14.30196 1882.66084 2465.33467 4.83032 2.57616 56.32379 2 2
3 12 86.65779 59.61331 99.999 99.999 31.65261 2.37405 92.7531 2 2
4 9.28946 80.30837 36.75566 187.02668 187.02668 13.65303 2.71189 61.53134 2 2
5 13.32 71.3935 54.37662 199.998 199.998 17.16059 2.65742 62.88762 2 2
6 14.43079 35.40132 13.35704 704.97986 804.39217 6.18639 2.77384 55.84355 2 2
7 9.85388 73.13059 47.98425 99.999 99.999 13.29438 2.80054 62.22012 2 2
8 10.05728 55.38639 38.25445 99.999 99.999 15.45227 2.74936 69.66574 2 2
9 9.73919 56.06553 58.79256 99.999 99.999 12.65269 2.75482 62.52381 2 2
10 6.06 100.8231 68.80709 99.999 99.999 47.49839 2.49556 97.12797 2 2
(continued on next page)
11
(continued )
11 10.08589 58.42329 38.48767 192.69204 321.91488 13.13388 2.41146 57.53466 2 2

12 13.72588 89.52375 62.38312 99.999 99.999 20.55217 2.69988 65.75 2 2
13 13.20228 55.70792 30.81312 99.999 99.999 13.44889 2.7537 60.80453 2 2
14 10.45222 74.46941 51.44168 88.27183 88.27183 14.61185 2.66517 59.3116 2 2
15 9.1995 44.39254 20.98508 99.999 99.999 7.54824 2.7366 58.49 2 2
16 12.78 39.36052 19.77631 4103.47807 2859.62791 4.66071 2.51005 52.78241 2 2
17 12.27 48.6906 19.32181 99.999 99.999 6.80516 2.62306 61.02837 2 2
18 10.77 38.60045 12.45034 433.17927 366.58814 6.91412 2.71335 51.9635 2 2
19 13.31011 69.77045 49.1659 167.04363 134.08927 19.03915 2.66255 71.20625 2 4
20 9.94626 70.44776 53.79971 118.06229 118.06229 9.74228 2.73455 56.30065 2 2
21 13.35975 70.95368 49.88742 155.82662 155.82662 19.19767 2.66879 68.88938 2 2
22 12.145 74.79503 56.1 96.66767 96.66767 18.06006 2.5532 62.32161 2 2
23 13.64614 71.04507 47.36288 162.37231 162.37231 14.74628 2.69725 63.06861 2 2
24 9.42 40.65059 30.45176 299.997 299.997 4.57074 2.76597 54.86043 2 2
25 14.00749 35.16926 15.17297 2915.05231 2584.19722 8.55317 2.70538 48.6658 2 2
26 13.29374 61.80037 39.26222 99.999 99.999 13.47583 2.685 59.93317 2 2
27 12.48 74.82366 51.57546 99.999 99.999 26.80518 2.70822 74.26067 0 5
28 13.52263 72.33448 52.11395 147.54306 147.54306 13.16199 2.67685 59.04712 0 0
29 13.1131 77.35493 60.97536 99.999 99.999 16.69163 2.76716 60.4819 0 0
30 13.56004 50.96991 34.76947 160.02765 160.02765 8.14161 2.65801 56.55477 0 0
31 13.11 84.29359 66.33217 99.999 99.999 24.26622 2.66474 64.41744 0 0
32 13.03882 62.3871 30.57635 99.999 99.999 17.56594 2.74466 62.11369 0 0
33 13.53 50.51701 37.68691 199.998 199.998 13.86955 2.68186 56.31707 0 0
34 9.69619 42.70314 30.17063 386.2439 293.12095 4.29452 2.74159 54.73833 0 0
35 13.87067 55.03557 32.33222 477.40086 677.39886 13.71742 2.64387 58.93368 0 0
36 17.05901 89.58663 24.97327 99.999 99.999 38.37181 2.33176 89.27911 0 0
37 13.63739 79.19363 62.00666 99.999 99.999 17.12112 2.68696 66.43867 0 0
38 13.3272 54.65521 36.95284 124.80128 62.40064 19.22281 2.5872 68.62668 0 0
39 6.12 89.13534 77.85554 99.999 99.999 23.76417 2.3544 75.96704 0 0
40 12.68143 74.9143 56.2607 99.999 99.999 17.80214 2.73029 63.56572 0 0
41 9.59593 39.98655 29.19138 1127.11506 927.11706 3.56366 2.77 52.63016 0 0
42 13.88413 42.75392 32.38891 599.994 743.90605 9.31298 2.70273 54.87865 0 0
43 12.50992 75.97506 43.57431 99.999 99.999 16.01698 2.73099 64.81347 0 0
44 13.54276 84.21612 59.87126 99.999 99.999 18.15722 2.69203 66.45849 0 0
45 15.69962 43.47771 20.03145 15,070.24024 12,873.94945 4.23225 2.64422 56.48689 0 0
46 13.06862 78.76302 50.97203 27.33078 27.33078 21.75736 2.70179 64.52019 0 0
47 13.64229 35.91432 16.45716 625.70569 625.70569 6.22844 2.69931 54.88828 0 0
48 13.46733 71.15701 51.9 99.999 99.999 13.94734 2.67866 59.92978 0 0
49 5.5049 51.87701 36.10859 163.80265 453.36543 10.75605 2.68183 60.24462 0 0
50 9.17274 57.81287 16.27258 4177.39077 3783.84255 3.1312 2.66184 54.64 0 0
51 9.72385 41.97666 29.13149 287.15745 712.83255 8.88159 2.46477 53.35124 0 0
52 12.49677 63.37091 41.01614 99.999 99.999 16.58491 2.7 68.6525 0 0
53 9.35423 65.57315 43.51745 199.998 199.998 10.93681 2.74038 58.57884 0 0
54 12 75.97904 51.71946 99.999 99.999 16.15374 2.69949 64.77054 0 0
55 13.19818 65.98618 43.87597 99.999 99.999 20.88095 2.428 85.02613 0 0
56 9.90819 58.52673 38.97744 99.999 175.76448 16.12269 2.42758 60.77671 0 0
57 12.24 50.36434 35.79666 99.999 99.999 10.30863 2.6759 63.29823 0 0
58 13.61618 77.70637 59.21489 74.46864 74.46864 24.29768 2.69787 64.47916 0 5
59 6.09 86.57905 68.93239 99.999 99.999 27.80837 2.52761 81.49065 0 0
60 6.12 73.38744 53.49645 99.999 91.07347 14.95044 2.68018 61.54255 0 0
61 12.71458 62.57712 41.70334 99.999 199.998 21.01412 2.61258 67.71134 0 0
62 13.66802 50.45989 36.48022 99.999 99.999 11.56243 2.52599 64.98124 0 0
63 9.82569 66.23537 56.82174 99.999 38.07573 9.03938 2.70124 57.45856 3 3
64 9.98323 63.45241 44.62338 399.996 688.70326 10.1086 2.36128 56.35209 3 3
65 10.22533 65.93668 45.31662 199.998 219.25928 7.82607 2.76653 55.00019 3 1
66 9.47478 57.47589 34.89779 1499.985 1399.986 3.74807 2.61557 61.82358 3 3
67 10.8719 63.54559 35.8861 99.999 99.999 11.86697 2.69441 57.21736 3 3
68 9.39 58.80805 66.60007 573.67603 547.35806 10.80001 2.78737 56.1511 3 3
69 10.03947 94.67094 72.90798 99.999 99.999 33.09994 2.44347 82.62622 3 3
70 10.785 94.50009 76.49991 99.999 99.999 28.84247 2.527 92.31477 3 3
71 13.53704 75.80309 50.93827 41.15097 99.999 20.23325 2.69212 59.77787 3 3
72 15.54873 90.38726 59.90174 14.543 99.999 21.70106 2.48734 64.14006 3 3
73 9.15144 74.72295 40.16397 99.999 99.999 11.5498 2.7379 62.33472 3 3
74 10.71529 46.84422 30.82067 463.21744 463.21744 5.0785 2.7005 53.1 3 3
75 13.29198 87.49949 60.53033 99.999 99.999 29.42866 2.6308 76.1492 4 4
76 14.70087 42.77217 24.72783 299.997 299.997 12.78245 2.53502 67.02224 4 4
77 12 79.31985 69.32978 99.999 99.999 19.33295 2.62609 65.35395 4 4
78 14.34665 97.27278 60.55603 99.999 99.999 32.17405 2.62983 57.29517 4 4
79 13.26083 73.18753 53.72078 100.55313 99.999 14.92869 2.68223 62.66269 4 4
80 9.93 32.70556 16.62223 1983.31056 1535.1682 1.44732 2.68522 49.75722 4 4
81 9.33288 98.80163 22.05429 99.999 99.999 23.33444 2.48203 84.99506 4 4
82 13.34554 53.07768 33.5714 285.11399 199.998 16.54795 2.65689 58.99089 4 4
83 15.64188 71.98863 44.26137 99.999 99.999 27.11187 2.56974 72.31267 4 4
84 15.80576 71.27078 44.22716 99.999 99.999 24.54789 2.6457 65.95476 4 4
85 13.05 77.4886 63.2215 99.999 99.999 15.95597 2.69626 66.17191 4 4
86 13.35 69.52344 42.61225 26.52976 26.52976 23.71541 2.71886 70.54735 4 4
(continued on next page)
12
(continued )
87 10.04773 36.46016 20.89414 2954.65745 2854.65845 4.18604 2.60785 56.22898 4 4

88 13.07512 78.87925 64.68841 99.999 99.999 16.07388 2.6719 65.93422 4 4
89 9.87 39.74325 15.20375 1536.56595 1135.07089 2.57286 2.70599 51.74141 4 4
90 12.48691 49.41777 26.76845 453.9112 553.9102 14.10634 2.69738 54.1498 4 4
91 13.02742 72.15618 46.568 4.1214 4.1214 13.44518 2.73476 68.3356 4 4
92 12.78965 58.85087 31.50779 333.90732 399.996 7.66949 2.65454 58.27 4 4
93 10.02 56.87008 24.16221 642.43635 561.6229 9.73868 2.48719 59.25552 4 4
94 9.27 71.07789 49.26583 399.996 399.996 8.4257 2.72688 58.78241 6 6
95 10.56724 58.36049 40.72238 399.996 399.996 8.10792 2.52883 55.16346 6 6
96 9.96265 56.58071 55.69639 99.999 99.999 9.7 2.46406 52.73776 6 6
97 9.25895 69.46597 53.39212 131.58129 131.58129 11.76998 2.69379 59.90259 6 6
98 9.27 65.1377 64.07522 691.67442 591.67542 7.96629 2.726 56.95915 6 6
99 9.65086 53.69202 25.95 4548.31112 2343.69061 10.05106 2.47452 70.30298 6 6
100 9.72 81.59174 37.39748 198.6962 198.6962 12.33516 2.50642 59.09248 6 6
101 9.42 74.55976 65.65118 198.6962 198.6962 13.97995 2.73571 60.9057 6 6
102 9.38829 76.38865 51.96873 99.999 99.999 14.05915 2.73069 59.01487 6 6
103 9.35793 76.44656 53.05508 99.999 99.999 13.29152 2.72572 58.88324 6 6
104 9.96796 57.81753 67.05 99.999 99.999 17.36889 2.34439 58.09848 6 6
105 9.27 70.05 62.43926 678.00324 578.00424 8.28445 2.71468 57.66932 6 6
106 10.70344 61.47396 44.7873 326.36319 326.36319 7.34577 2.59516 55.68199 6 6
107 9.32134 85.46444 53.16792 99.999 99.999 13.92995 2.69379 60.65255 6 6
108 9.40954 77.72389 56.68615 99.999 99.999 13.21939 2.72251 62.40148 6 6
109 10.96533 63.31779 44.52095 299.997 299.997 7.10487 2.57869 55.85533 6 6
110 9.27 67.60488 53.30631 383.09105 322.06003 9.50646 2.70631 59.07152 6 6
111 10.09754 76.35 53.36085 99.999 99.999 20.45106 2.2766 60.70699 6 6
112 9.39327 76.31583 56.68292 99.999 99.999 13.78405 2.71711 59.19791 6 6
113 12.84864 46.27384 37.0217 470.14977 570.14877 11.78853 2.66882 55.48774 5 5
114 11.16724 95.91946 70.53093 93.38387 93.38387 30.05367 2.5277 75.6346 5 5
115 13.4215 51.85269 32.36495 230.13788 260.27777 18.72633 2.60679 56.77215 5 0
116 13.37673 60.35273 36.45818 99.999 99.999 19.70321 2.61848 65.94065 5 5
117 11.07 79.7206 60.01764 194.11692 194.11692 14.96411 2.708 58.39294 5 5
118 12.57388 78.92154 59.54354 98.56287 99.999 19.87046 2.68414 65.98896 5 5
119 15.51743 78.76194 37.43761 99.999 99.999 42.05229 2.44861 64.52286 5 5
120 11.04 70.93989 54.85562 365.91385 365.91385 10.11451 2.66477 58.71 5 5
121 11.04 71.375 54.625 398.60696 398.60696 9.99556 2.66025 58.70542 5 5
122 16.69902 89.97855 49.01109 99.999 99.999 26.59561 2.56673 65.86293 5 5
123 16.80973 75.36695 49.25495 99.999 99.999 27.7442 2.28474 98.02516 5 5
124 13.5483 79.55849 49.89801 99.999 99.999 25.79649 2.65165 72.14906 5 5
125 12.49773 68.34545 36.7807 70.45324 70.45324 22.75169 2.68418 69.96749 5 5
126 12.76622 98.61191 64.51744 99.999 99.999 43.43385 2.14086 101.48913 5 5
127 11.97 63.76013 56.7 99.999 99.999 9.18231 2.5911 56.27886 5 5
128 14.92267 48.88144 43.99008 99.999 99.999 21.97273 2.70261 64.40471 5 5
129 15.09661 81.3763 63.35104 99.999 99.999 27.0126 2.54108 80.60191 5 5
130 12.55722 60.21529 43.0486 99.999 99.999 19.39 2.65056 60.44151 5 5
131 15.95176 77.3118 51.38557 99.999 99.999 20.58787 2.64227 61.94958 5 5
132 12.57071 49.12815 28.60916 99.999 99.999 15.44103 2.70499 62.09782 5 0
133 15.71573 69.93845 30.2386 115.30098 30.60395 18.78213 2.59765 59.75168 5 5
134 12.53888 86.43136 65.17051 99.999 99.999 23.57 2.61669 70.71889 5 5
135 12.48451 69.18495 54.33275 99.999 99.999 15.81643 2.74107 60.22437 5 5
136 12.57 69.02216 32.49162 99.999 99.999 28.01908 2.60937 75.25281 5 5
137 12.09249 84.63438 60.11146 99.999 99.999 25.81602 2.60288 69.38305 5 5
138 12.55378 78.79668 59.88753 99.999 99.999 25.96826 2.63865 77.37409 5 5
See text for abbreviations of well logs. LF, lithofacies; Pred_LF, predicted lithofacies by XGBoost model. 0-BCCS; 1-MCS; 2-LTBS; 3-PBSM; 4-APSM; 5-FPM; 6-CSSM; 7-
MBSM; 8-SLSM.
References Bize-Forest, N., Lima, L., Baines, V., Boyd, A., Abbots, F., Barnett, A., 2018. Using
Machine-Learning for Depositional Facies Prediction in a Complex Carbonate
Reservoir.
Al-Mudhafar, W.J., 2017. Integrating well log interpretations for lithofacies classification
Cant, D.J., 1992. Subsurface facies analysis. In: James, R.G.W., N.P (Eds.), Facies Models:
and permeability modeling through advanced machine learning algorithms. J. Pet.
Response to Sea Level Changes. Geological Association, pp. 27–45.
Explor. Prod. Technol. 7, 1023–1033.
Cao, W., Zahirovic, S., Flament, N., Williams, S., Golonka, J., Müller, R.D., 2017.
Allen, D.R., 1975. Identification of sediments-their depositional environments and
Improving Global Paleogeography since the Late Paleozoic Using Paleobiology,
degree of compactionfrom well logs. In: Chilingarian, George V., Karl, H.W. (Eds.),
pp. 5425–5439.
Compaction of Coarse-Grained Sediments, Developments in Sedimentology. Elsevier,
Catuneanu, O., 2006. Principles of Sequence Stratigraphy. Elsevier.
New York, pp. 349–402.
Chawla, N.V., 2009. Data mining for imbalanced datasets: an overview. In: Data Mining
Asquith, G., Krygowski, D., 2004. AAPG Memoirs Basic Well Log Analysis (AAPG Special
and Knowledge Discovery Handbook, pp. 875–886.
vol. s).
Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P., 2002. SMOTE: synthetic
Baldwin, J.L., Bateman, R.M., Wheatley, C.L., 1990. Application of a neural network to
minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357.
the problem of mineral identification from well logs. In: The Log Analyst.
Chen, T., Guestrin, C., 2016. XGBoost: a scalable tree boosting system. In: Proceedings of
Zhu, Hongtao, et al., 2014. Three-dimensional facies architecture analysis using
the ACM SIGKDD International Conference on Knowledge Discovery and Data
sequence stratigraphy and seismic sedimentology: Example from the Paleogene
Mining. Association for Computing Machinery, pp. 785–794.
Dongying Formation in the BZ3-1 block of the Bozhong Sag, Bohai Bay Basin, China.
Chen, A., Zou, H., Ogg, J.G., Yang, S., Hou, M., Jiang, X., Xu, S., Zhang, X., 2020. Source-
Mar. Petrol. Geol. 51, 20–33.
to-sink of Late carboniferous Ordos Basin: constraints on crustal accretion margins
Bhatt, A., Helle, H.B., n.d. Determination of Facies from Well Logs Using Modular Neural
converting to orogenic belts bounding the North China Block. Geosci. Front. 11,
Networks.
2031–2052.
Bestagini, Paolo, et al., 2017. A machine learning approach to facies classification using
well logs. Seg technical program expanded abstracts 2137–2142.
13
Delfiner, Pierre, Peyret, Olivier, Serra, Oberto, 1987. Automatic determination of Meng, Q.R., Wang, E., Hu, J.-M., 2005. Mesozoic sedimentary evolution of the northwest
lithology from well logs. SPE Format. Eval. 2 (03), 303–310. Sichuan basin: implication for continued clockwise rotation of the South China
Deng, T., Xu, C., Jobe, D., Xu, R., 2019. A comparative study of three supervised block. Geol. Soc. Am. Bull. 117, 396–410.
machine-learning algorithms for classifying carbonate vuggy facies in the Kansas Miall, A.D., 1995. Whither stratigraphy? Sediment. Geol. 100, 5–20.
arbuckle formation. J. Form. Eval. Reserv. Descr. 60, 838–853. Nazeer, Adeel, et al., 2016. Sedimentary facies interpretation of Gamma Ray (GR) log as
Dubois, Martin, 2007. Comparison of four approaches to a rock facies classification basic well logs in Central and Lower Indus Basin of Pakistan. Geodesy Geodyn. 7 (6),
problem. Comput. Geosci. 33 (5), 599–617. 432–443.
Feng, R., 2021. Improving uncertainty analysis in well log classification by machine Nielsen, Arne, Schovsbo, Niels, 2011. The Lower Cambrian of Scandinavia: Depositional
learning with a scaling algorithm. J. Petrol. Sci. Eng. 196. environment, sequence stratigraphy and palaeogeography. Earth Sci. Rev. 107 (3–4),
Golonka, J., 2007. Late Triassic and early Jurassic palaeogeography of the world. 207–310.
Palaeogeogr. Palaeoclimatol. Palaeoecol. 244, 297–307. Pedregosa, F, et al., 2011. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.
Goodfellow, Ian, Bengio, Yoshua, Courville, Aaron, 2016. Deep learning. MIT press. 12, 2825–2830.
Guo, Tonglou, 2013. Evaluation of highly thermally mature shale-gas reservoirs in Radwan, A.E., 2021. Modeling the depositional environment of the sandstone reservoir
complex structural parts of the Sichuan Basin. J. Earth Sci. 24 (6), 863–873. in the Middle Miocene Sidri member, Badri field, Gulf of Suez basin, Egypt:
Hall, B., 2016. Facies Classification Using Machine Learning. Lead. Edge. integration of gamma-ray log patterns and petrographic characteristics of lithology.
Hinton, G.E., Osindero, S., n.d. A Fast Learning Algorithm for Deep Belief Nets Yee-Whye Nat. Resour. Res. 30, 431–449.
Teh. Rider, M.H., 1990. Gamma-ray log shape used as a facies indicator: critical analysis of an
Hanin, B, 2018. Which neural net architectures give rise to exploding and vanishing oversimplified methodology. Geol. Soc. Spec. Publ. 48, 27–37.
gradients? In Advances in Neural Information Processing Systems 580–589. Rogers, Samuel J., Fang, J.H., Karr, C.L., Stanley, D.A., 1992. Determination of lithology
Horne, J.C., Ferme, J.C., Caroccio, F.T., Baganz, B.P., 1978. Depositional models in coal from well logs using a neural network. Am. Assoc. Petrol. Geol. Bull. 76, 731–739.
exploration and mine planning in Appalachian regions. Am. Assoc. Petrol. Geol. Bull. Scotese, Christopher, 2021. An atlas of Phanerozoic paleogeographic maps: the seas
62, 2379–2411. come in and the seas go out. Annu. Rev. Earth Planet Sci. 49, 679–728.
Jordan, M.I., Mitchell, T.M., 2015. Machine learning: trends,perspectives, and prospects. Selley, R.C., 1976. Subsurface environmental analysis of North Sea sediments. AAPG
Science (80). Bull. (Am. Assoc. Pet. Geol. 60, 184–195.
Lai, J., Wang, G., Wang, S., Cao, J., Li, M., Pang, X., Zhou, Z., Fan, X., Dai, Q., Yang, L., Tan, M., Song, X., Yang, X., Wu, Q., 2015. Support-vector-regression machine technology
2018. Review of diagenetic facies in tight sandstones: diagenesis, diagenetic for total organic carbon content prediction from wireline logs in organic shale: a
minerals, and prediction via well logs. Earth Sci. Rev. 185, 234–258. comparative study. J. Nat. Gas Sci. Eng. 26, 792–802.
Laurikkala, J., 2001. Improving identification of difficult small classes by balancing class Vapnik, V., 1998. Statistical Learning Theory. Wiley, New York.
distribution. In: Conference on Artificial Intelligence in Medicine in Europe. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł.,
Springer, Berlin, Heidelberg, pp. 63–66. Polosukhin, I., 2017. Attention is all you need. In: Advances in Neural Information
Laya, Juan, Tucker, Maurice, 2012. Facies analysis and depositional environments of Processing Systems, pp. 5998–6008.
Permian carbonates of the Venezuelan Andes: Palaeogeographic implications for Wang, C., Hazen, R.M., Cheng, Q., Stephenson, M.H., Zhou, C., Fox, P., Shen, S.,
Northern Gondwana. Palaeogeogr. Palaeoclimatol. Palaeoecol. 331, 1–26. Oberhänsli, R., Hou, Z., Ma, X., Feng, Z., Fan, J., Ma, C., Hu, X., Luo, B., Wang, J.,
Li, Y., He, D., 2014. Evolution of tectonic-depositional environment and prototype basins Schiffries, C.M., 2021. The Deep-Time Digital Earth program: data-driven discovery
of the Early Jurassic in Sichuan Basin and adjacent areas. Acta Pet. Sin. 35, 219–232. in geosciences. Natl. Sci. Rev. 8, 2021.
Lim, J.-S., Kang, J.M., Kim, J., 1997. Multivariate statistical analysis for automatic Zheng, D.Y., Wu, S.X., 2021. Principal component analysis of textural characteristics of
electrofacies determination from well log measurements. In: All Days. SPE. fluvio-lacustrine sandstones and controlling factors of sandstone textures. Geol. Mag.
Liu, S., Deng, B., Jansa, L., Li, Z., Sun, W., Wang, G., Luo, Z., Yong, Z., 2018. Multi-stage 158 (10), 1847–1861.
basin development and hydrocarbon accumulations: a review of the Sichuan Basin at Zheng, D.Y., Yang, W., 2020. Provenance of upper Permian-lowermost Triassic
eastern margin of the Tibetan Plateau. J. Earth Sci. 29, 307–325. sandstones, Wutonggou low-order cycle, Bogda Mountains, NW China: implications
Longadge, M.R., Snehlata, M., Dongre, S., Latesh Malik, D., 2013. Class Imbalance on the unroofing history of the eastern north Tianshan Suture. J. Palaeogeogr. 9.
Problem in Data Mining: Review. International Journal of Computer Science and Zhang, Li, et al., 2016. Lithologic characteristics and diagenesis of the Upper Triassic
Network. Xujiahe formation, Yuanba area, northeastern Sichuan Basin. Journal of Natural Gas
Ma, Yongsheng, et al., 2008. Petroleum geology of the Puguang sour gas field in the Science and Engineering 35, 1320–1335.
Sichuan Basin, SW China. Mar. Petrol. Geol. 25 (4–5), 357–370. Zheng, Rongcai, Dai, Zhaocheng, Luo, Qinglin, Wang, Xiaoping, Lei, Guangming,
Ma, Y.S., Cai, X.Y., Zhao, P.R., Luo, Y., Zhang, X.F., 2010. Distribution and further Jiang, Hao, Hu, Chen, 2011. Sedimentary system of the upper Triassic Xujiahe
exploration of the large-medium sized gas fields in Sichuan Basin. Acta Pet. Sin. 31, Formation in the Sichuan foreland basin. Nat. Gas. Ind. 31, 16–24.
347–354. Zheng, D., Wu, S., Hou, M., 2021. Fully connected deep network: an improved method to
predict TOC of shale reservoirs from well logs. Mar. Petrol. Geol. 105205.
14

1 s2.0 S0920410522004855 Main

Uploaded by

Copyright:

Available Formats

1 s2.0 S0920410522004855 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s2.0 S0920410522004855 Main

Uploaded by

Copyright:

Available Formats

Journal of Petroleum Science and Engineering 215 (2022) 110610

Contents lists available at ScienceDirect

Journal of Petroleum Science and Engineering

Application of machine learning in the identification of fluvial-lacustrine

Credit author statement 1. Introduction

The data resampling algorithms include oversampling and under­

Fig. 5. Lithofacies distribution of the studied dataset. See text for

Fig. 6. Explanation graph of the (a) normalized

Acc F1 AUC Acc F1 AUC Acc F1 AUC 359.50s

indicating that the XGBoost model was successfully implemented and

lithofacies identifications, improvements are still necessary. In this

Acc F1 AUC Acc F1 AUC Acc F1 AUC 469.98s

See Table 3 for abbreviations.

Acc F1 AUC Acc F1 AUC Acc F1 AUC 165.03s

See Table 3 for abbreviations.

Depth CAL GR KTH RS RD CNL DEN AC LF Pred_LF

1 6.15 80.4291 49.02506 1882.66084 2465.33467 23.5858 2.56983 70.07048 2 2

11 10.08589 58.42329 38.48767 192.69204 321.91488 13.13388 2.41146 57.53466 2 2

87 10.04773 36.46016 20.89414 2954.65745 2854.65845 4.18604 2.60785 56.22898 4 4

You might also like

The data resampling algorithms include oversampling and under