Data-Driven Machine Learning
Data-Driven Machine Learning
Muhammad Ali, Peimin Zhu, Ma Huolin, Ren Jiang, Hao Zhang, Umar Ashraf
& Wakeel Hussain
To cite this article: Muhammad Ali, Peimin Zhu, Ma Huolin, Ren Jiang, Hao Zhang, Umar
Ashraf & Wakeel Hussain (18 Oct 2024): Data-driven machine learning approaches for precise
lithofacies identification in complex geological environments, Geo-spatial Information Science,
DOI: 10.1080/10095020.2024.2405635
1. Introduction
highlight the difficulty of achieving high-precision litho
Lithofacies classification is the foundation of geological facies identification based on the classification of rock
surveys, using logging data to assign lithofacies types to structure components. The complex layering and strong
rock samples (Dubois, Bohling, and Chakrabarti 2007). heterogeneity further compound the difficulty of accu
Lithology is of great significance in reservoir evaluation. rately classifying lithofacies based solely on the classifica
Different lithofacies have specific ranges of pore and tion of rock structure components (Bloch, Lander, and
permeability changes, and pore and permeability data Bonnell 2002; Lai et al. 2018b; Zhang, Ambrose, and Xie
can be fitted better according to the lithofacies result 2021). In this context, the utilization of traditional meth
(Chang, Kopaska, and Chen 2002). The study of sedi ods that heavily rely on extensive core sample images for
mentary microfacies is crucial for accurate reservoir pre lithofacies prediction has proven challenging due to lim
diction, especially in scenarios with limited coring well ited data availability. Therefore, a conventional approach
data. Precise lithofacies identification using logging data plays a crucial role in lithofacies identification to obtain
becomes crucial for predicting tight sandstone forma information about lithofacies from tight reservoirs (Liu
tions (Ali et al. 2023a). The inherent challenges of tight et al. 2020; Lyu et al. 2019; Valentín et al. 2019; Zhou et al.
reservoirs, characterized by matrix microporosity, low 2016). Several traditional techniques have been proposed
permeability, complex layering, and strong heterogeneity, to classify lithofacies beyond core samples. One such
method is conventional logging identification, which the effectiveness of SVR in enhancing RRN perfor
includes structural characteristic parameter method mance. Al-Qaness et al. (2022) utilized a model,
(Feng et al. 2018; Li et al. 2020), curve overlap method AOOBL-ANFIS, to enhance the adaptive neuro-fuzzy
(Lai et al. 2020), etc. However, these methods often inference system (ANFIS) for oil production estima
depend more on the experience and knowledge of inter tion. They optimized ANFIS parameters using
preters and generally exhibit low accuracy. The second a modified aquila optimizer (AO) with the opposition-
approach involves special logging identification methods, based learning (OBL) technique. The AOOBL-ANFIS
such as the use of imaging logging to identify lithofacies. model outperformed classic ANFIS and other modi
However, these methods are challenging to apply widely fied ANFIS models and time series forecasting meth
due to their high cost. To address this issue, the conven ods in terms of performance metrics and
tional method enables more efficient and accurate reser computational time. Pei et al. (2023) introduced
voir lithofacies classification using efficient and reliable a deep learning-based algorithm called FCN-
machine learning methods in complex reservoirs for Attention for classifying line-of-sight (LOS) and none-
exploration, development, and production. line-of-sight (NLOS) propagation in ultra-wideband
In recent years, with the focus on leveraging artifi (UWB) Location-Based Services. FCN-Attention uti
cial intelligence for subsurface characterization in the lizes a fully convolution network (FCN) and a self-
oil and gas industry, there has been notable progress attention mechanism to improve feature extraction
(Antariksa, Muammar, and Lee 2022; Song et al. 2021; and description, achieving high classification accura
Valentín et al. 2019). Chawshin et al. (2021) developed cies on various datasets, outperforming existing algo
a convolutional neural network (CNN) that utilizes 2D rithms. Zhen et al. (2023) utilized classical Boosting
core CT scan image slices as input to perform auto machine learning algorithms to categorize deep-water
matic lithofacies prediction. Kim (2022) developed an submarine fan lithofacies types in a West African
integrated approach for lithofacies classification in the oilfield. By addressing sample no balance issues
Eagle Ford shale and the Austin Chalk, addressing through oversampling techniques and optimizing
challenges in defining geological heterogeneity in hyperparameters with Genetic Algorithm, the pro
these unconventional reservoirs. They utilized core posed MAHAKIL-GA-GBDT algorithm achieved
samples, thin sections, SEM images, and wireline a high accuracy of 0.986. Alzubaidi et al. (2021) intro
logs, including natural gamma (GR) deep resistivity duced a CNN-based method that used core images to
(LLD), sonic (DT), and density (RHOB) logs. A CNN predict lithology automatically and quickly, but the
model was trained to classify four lithofacies based on method did not perform well in the subdivision of
various geological features, providing a data-driven rock types. Zhang et al. (2021) used convolutional
method for reservoir characterization. Zhen et al. neural networks to build a deep learning model for
(2023) utilized classical boosting machine learning lithofacies identification from core images. This work
algorithms to identify deep-water submarine fan litho effectively provided the first-glance analysis of core
facies types in a West African oilfield. By addressing data; however, the generalization of the model needed
sample non-balance issues through oversampling to be improved. Although these methods greatly
techniques and optimizing hyperparameters with reduced the identification time, they still required
genetic algorithm, the proposed MAHAKIL-GA- many core sample images for network training, and
GBDT algorithm achieved a high accuracy of 0.986. labeling the samples was also a challenge. Therefore,
Dixit, McColgan, and Kusler (2020) applied machine relatively low-cost well logs instead of core samples
learning algorithms to predict rock facies in the Umiat were used for lithofacies identification.
Oil Field of Alaska. Utilizing limited core data and Log data is widely used in lithofacies identification and
mineralogical information, they identified five sand evaluation due to its high vertical resolution and good
reservoir lithofacies in the Lower Member continuity (Lai et al. 2018a). The composition and struc
Grandstand. The integration of machine learning ture of the reservoir will lead to different lithofacies being
algorithms, including self-organizing maps, with wire divided, and the corresponding logging response will also
line log data resulted in successful facies predictions, be different (Hemmesch et al. 2014; Ozkan et al. 2011).
particularly validated in nearby uncored wells using Therefore, Bhattacharya, Carr, and Pal (2016) input five
observed seismic data. Geng et al. (2020) proposed one-dimensional logs and other derived parameters into
a learning-based non-linear regression method, sup the lithofacies model using three machine learning algo
port vector machine regression (SVR), for accurate rithms, such as ANN, and proved that lithofacies identi
modeling of the radiometric transforming relation in fication could be modeled in that way. Similarly, Wu et al.
relative radiometric normalization (RRN) of coarse- (2020) selected deep resistivity (RT), spontaneous poten
resolution data. They conduct experiments, including tial (SP), natural gamma (GR), sonic (DT), compensated
synthetic and real data, comparing SVR with other neutron (CNL), and density (DEN) to summarize the
methods like linear regression, artificial neural net logging response characteristics of five lithofacies based
work (ANN), and random forest (RF), demonstrating on the experimental results of core composition analysis,
GEO-SPATIAL INFORMATION SCIENCE 3
and successfully predicted the distribution of each litho in tight sandstone formations using machine learning
facies in a single well. He et al. (2016) optimized the techniques. Given the challenges associated with lim
identification model constructed by DEN, AC, RT, and ited core data, our focus is on leveraging the efficiency,
other logs through the comparison of core observation, speed, and accuracy of machine learning methods to
X-ray diffraction, and qualitatively identified lithofacies address the complexities of tight reservoirs. Specifically,
through the intersection diagram. Compared with GR we have chosen four widely used techniques: self-
DEN, DT, and other one-dimensional logs, resistivity organizing map (SOM), multi-resolution graph-based
images can directly observe formation changes and iden clustering (MRGC), K-nearest neighbor (KNN), and
tify lithofacies boundaries. Its appearance improves the artificial neural network (ANN). In this study, the
accuracy of lithofacies identification. Nishitsuji and Exley Lower Goru formation in the Kadanwari block of the
(2019) address a key challenge in the energy industry by central Indus Basin serves as a testbed. Two core sam
optimizing deep-learning architectures, with a focus on ple wells and two identification and verification wells
labor-intensive hyperparameter optimization. Using have been selected for a thorough comparative analysis
optuna, a global optimizer, the study fine-tunes para of lithofacies identification. By showcasing the effec
meters for an extended long-short term memory model tiveness of the chosen machine learning techniques in
in predicting lithological facies. Although the macro dif this real-world scenario, we aim to establish their
ference with and without optuna is minor, the results applicability and reliability in enhancing lithofacies
indicate notable commercial impacts, particularly in sce prediction. Ultimately, the culmination of this research
narios with small yet challenging targets. Ying and Bao- will involve applying the truncated Gaussian simula
Zhi (2011) used the support vector machine (SVM) algo tion technique to create a facies model based on the
rithm to process conventional logs such as natural lithofacies identified through the machine learning
gamma, photoelectric absorption cross-section index, approaches. This model will not only contribute to
etc. They explained it with the help of micro-resistivity a deeper understanding of lithofacies distribution in
images and finally made a better analysis of the volcanic the studied reservoir but will also assist in identifying
lithofacies. However, the SVM is difficult to achieve large- prospective areas with potential for tight sandstone
scale training samples, and the neural network is easy to formations. Through this research, we seek to establish
fall into local optimum (LeCun, Bengio, and Hinton a robust methodology for efficient and accurate litho
2015). Therefore, Yu et al. (2021) established a lithology facies prediction in challenging geological settings. In
identification and classification model using the gradient summary, our study differentiates itself from previous
boosting decision tree (GBDT) ensemble learning algo approaches through its emphasis on log data, the care
rithm. The model correctly identified the lithofacies of ful selection of diverse machine learning techniques,
the volcanic rocks using core and FMI-calibrated con and the rigorous validation of results in a specific geo
ventional as input. On this basis, to further improve the logical context. This innovative combination contri
efficiency and accuracy of identification, research has butes to the advancement of lithofacies identification
been carried out. Lan et al. (2021) proposed a semi- methodologies, offering a more robust and applicable
supervised learning strategy for conventional log data solution for reservoir prediction in tight sandstone
based on positive and unlabeled machine-learning, formations.
which only marked limited log samples, and successfully
obtained five carbonate logging lithofacies, but the accu
2. Geological setting and data analysis
racy of the results needed to be improved. However, most
of the above lithofacies identification methods require In this section, we provide a comprehensive overview
certain a priori judgment results for guidance. This kind of the geological setting and data analysis for the
of method is greatly influenced by manual subjective and Kadanwari and Sawan gas field in the Lower Goru
has low precision and huge workload. Therefore, it is formation within the Central Indus Basin, Pakistan
necessary to identify lithofacies automatically. Tian et al. (Figure 1(a)). The section is structured as follows.
(2016) used the multi-resolution graph-based clustering
(MRGC) method to automatically cluster the log of the
2.1. Geographical and geological description
Amu Darya basin without prior knowledge, and finally
obtained different lithofacies. Chai et al. (2009) designed The study area encompasses the Kadanwari and
an automatic lithofacies classification method for sedi Sawan gas field, focusing on the conventional sands
mentary facies of reef-shoal reservoirs. These above in D, E, F, and the tight G sand layer. This region,
researchers once again proved the trend of design situated in the Central Indus Basin, is characterized by
research of lithofacies automatic classification and iden a complex geometrically progradational sequence
tification method. environment, formed during three significant tectonic
Therefore, the objective of this paper is to introduce events (Ahmad and Chaudhry 2002; Ali et al. 2019,
a comprehensive approach for lithofacies identification 2020; Ashraf et al. 2019). The structural configuration
4 M. ALI ET AL.
Figure 1. Presents a clear visual representation, highlighting (a) the geographical positioning, (b) the lithological composition
within the study region, and (c) the sedimentology model.
of the field, shaped by these tectonic events, has exhibits hot sand characteristics in the field (Ashraf
a significant impact on the reservoir characteristics et al. 2019, 2020).
(Figure 1(a)). The Lower Goru sands have been
divided into seven sand-bearing intervals (Figure 1
(b)) from bottom B-Sand to top H-Sand (Ahmad
2.2. Deltaic system characteristics
and Chaudhry 2002). The primary producing sands
in the area are E-Sand and G-Sand, while D-Sand and The Kadanwari and Sawan (from C to H layers) of the
F-Sand have also yielded production from select wells Lower Goru represent a clastic delta system character
(Ali et al. 2023b; Ali et al. 2019, 2020). In Kadanwari, ized by a river-dominant regime with additional wave
E-Sand, the main producer, is characterized as and tidal transformations. River dynamics leave their
a conventional reservoir, forming an elongate body mark on both sand-prone “proximal” and fine-grained
trending SW-NE parallel to the paleo shoreline of the “distal” facies. Proximal facies exhibit cross-bedded
Early Cretaceous time. However, B, C, D, G, and medium to coarse sandstones, while distal facies are
H exhibit the tight characteristics. G-Sand has been typified by hummocky cross-lamination, associated
productive post-hydraulic fracturing, and F-Sand with hyperpycnal flow during massive seasonal storms
GEO-SPATIAL INFORMATION SCIENCE 5
and floods (Valzania et al. 2011). Distinctive variations training wells, comprises multiple stages, as depicted
in the size and shape of delta lobes deposited at dif in Figure 2. It is essential to emphasize that, prior to
ferent stages are evident (Figure 1(c)). training and employing the models to predict lithofa
cies in new wells, various critical measures need to be
undertaken for data preprocessing, exploration, and
2.3. Data description and analysis
sample preparation. These phases involve eliminating
The logging and core data used in this paper are from erroneous data, incorporating additional features and
four wells K-15, K-14, K-13, and S-10 in the Lower clustering, scaling and normalizing the data, creating
Goru formation of Kadanwari and Sawan gas field sequential samples, and dividing the dataset into sub
block in the central Indus Basin. Two coring wells sets designated for training and testing. These actions
(Well K-15, and Well K-14) were selected as core are instrumental in ensuring the excellence of the data
sample wells, and the other two wells (Well S-10, and samples that will subsequently be inputted into the
Well K-13) were selected as identification effect ver machine learning models, ultimately leading to high-
ification wells. We will utilize the selected well logs quality predictions.
such as sonic (DT), density (RHOB), neutron (NPHI),
deep lateral resistivity (LLD), photoelectric factor
3.1. Data cleaning
(PEF), spontaneous potential (SP), and natural
gamma log (GR) for the input logging curve. Due to instrument errors or recording errors, some
outliers are inevitably generated in the logging data.
When the model is sensitive to outliers, these out
3. Methodology
liers tend to negatively affect the results (Ashraf et
The procedure of utilizing machine learning algo al. 2024a, 2024b, 2024c; Valzania et al. 2011). The
rithms to predict lithofacies in wells where core facies logging data basically conforms to the normal dis
data does not exist, based on raw well log data from tribution (Zheng et al. 2021); To detect outliers, we
use the Pauta criterion (Li, Wen, and Wang 2016) layer represents a neuron, and each neuron is con
for detection. Figure 3 shows that according to the nected through lateral inhibition. The input layer and
description of the Pauta criterion, the confidence the competition layer are connected through full con
probability of judging gross error is 99.7%, which nection (Figure 4).
is based on three times the standard deviation. If the One key advantage of SOM is their ability to per
value exceeds the confidence interval, it does not form clustering and dimensionality reduction simul
belong to the category of random error, but a gross taneously. This makes them particularly useful for
error. The outliers in this paper are filled with visualizing high-dimensional data in a lower-
LaGrange interpolation. dimensional space. The competitive learning process,
where neurons compete to be activated, allows SOM to
identify patterns and relationships within the data.
3.2. Machine learning algorithms
Additionally, SOM has been applied in various fields,
In the course of this research, we strategically identi such as image recognition, data mining, and feature
fied and selected four machine learning algorithms, extraction, showcasing their versatility in solving com
taking into account the specific characteristics of the plex problems across different domains (Ali et al.
study area scenario and the constraints posed by lim 2022; Ali et al. 2023).
ited core data availability. The chosen machine learn The specific steps of SOM algorithm implementa
ing algorithms encompass SOM, KNN, MRGC, and tion are as follows:
ANN. This strategic selection not only addresses the Network initialization: initialize weights W, etc.
complexities of the geological study area but also
ensures a comprehensive and effective approach to
our analysis. Randomly select an input vector from the input
samples
3.2.1. Self-organizing map (SOM)
SOM is a type of ANN proposed by Teuvo Kohonen,
a professor at Helsinki University in Finland, in 1981. Calculate the distance between xi and the competing
Therefore, it is also known as the Kohonen algorithm layer neuron j, and find out the smallest distance
(Kohonen 1991). The SOM is an unsupervised train neuron g from xi . Adjust the weights of the neurons
ing neural network. By introducing the concept of g and the neurons included in their neighborhood
a neighborhood function, it achieves self-organizing NðTÞ according to the first step.
and unsupervised learning. In other words, all neurons
are placed on a topology determined in advance
according to prior knowledge (Bhattacharya, Carr, where wij represents the weights of the input layer
and Pal 2016; Cai and Chen 2022; Hussain et al. node i and the competition layer node j; n represents
2022; Wang et al. 2020). The introduction of the the learning rate, which generally decreases with the
neighborhood function restricts SOM training, ensur number of evolutions.
ing that the training does not fall into a local mini
mum. It adopts a two-dimensional SOM structure and 3.2.2. Multi-resolution graph-based clustering
consists of an input layer and a competition layer. The (MRGC)
dimension of the input layer is consistent with the Multi-resolution-graph-based-clustering (MRGC)
dimension of the input sample vector. The nodes of technique was proposed by Ye et al. (2000).
the competition layer are generally distributed in MRGC is a multi-dimensional dot matrix pattern
a two-dimensional array. A node in the competition recognition method based on a nonparametric
GEO-SPATIAL INFORMATION SCIENCE 7
Figure 4. The basic architecture of the self-organizing map (SOM). The input x is fully connected to the array of map nodes which is
most often and also in this illustration two-dimensional. Each map node is visualized as a circle on the grid.
be obtained based on the calculation KRI, in which pre-classification of the data set without undergoing
KRI can be obtained by the following formula: a learning and training process (Bezdek, Chuah, and
Leep 1986; Villegas et al. 2017). The neighbor of
a sample to be divided is an object that has been
For instance, as depicted in Figure 5a, the middle correctly classified, the category to which the sample
data point, which has a PI value of 0.9, along with the to be divided is determined according to the category
remaining neighboring points, the majority of which of the nearest one or several samples. Therefore, the
have PI values lower than 0.9, constructed an attrac KNN method is suitable for classification problems
tive set. Due to its independence from every other data with overlapping sample sets to be classified or over
point, x1 is referred to as a “free attractor”. It is much lapping class domains because it is not affected by
more apparent that a similar situation occurred with outliers and its algorithm is simple and direct. It can
the PI value of 0.9 data point, which is the middle also be classified when the sample size and its char
point of Figure 5(b) as an outcome, the distance that acteristics are small, but the number of sample types is
separates x2 and x1 is greater than that of the majority required to be balanced (Figure 6).
of the points inside the dataset indicating the attrac Moreover, KNN is a non-parametric and instance-
tion of x1 ; more specifically, the KRI computation based learning algorithm, meaning it does not make
criterion is satisfied by only x2 , that has a PI value of assumptions about the underlying data distribution
0.95. In order to determine KRI the x1 point, equation and relies on the specific instances in the training set
8 has been used. for making predictions. This characteristic makes
KNN particularly useful in scenarios where the deci
3.2.3. K-Nearest neighbor (KNN) sion boundaries are complex and not easily defined by
K-nearest neighbor (KNN) is one of the simplest a simple mathematical function. The algorithm’s sim
mathematical classification and recognition algo plicity and flexibility, however, come at the cost of
rithms based on the Supervised Machine Learning computational efficiency, especially as the size of the
technique (Soucy and Mineau 2001). In KNN, each dataset grows. Despite its computational challenges,
sample can be represented by its nearest K neighbors, KNN remains a popular choice in various applications
and new samples can be directly classified based on the such as pattern recognition, image classification, and
recommendation systems, showcasing its versatility in output, facilitating the learning process and enhancing
solving diverse problems. the network’s predictive capabilities.
The specific steps of KNN algorithm implementa Assume neural networks with a hidden signal and
tion are as follows: an input layer n and output layer m, bj indicates the
It is assumed that there are C classes output of the hidden signals, θj is the value of the
k1 ; k2 ; k3 ; . . . ; kc , and each class has hidden layer’s threshold, the value θk represents the
Pi ði ¼ 1; 2; 3; . . . ; CÞ samples indicating the class. It threshold for the output signal, f1 is indicated the
is specified that the discriminant function of the class transfer factor of the hidden signal, while f2 is repre
ki is defined: sented the transfer function of the output signal, input
layer to hidden layer weights of wij , while hidden layer
to output layer weights wjk . After that, we will be able
where i means the ki type; k means the k-th in the Pi to obtain the output of the network, which is denoted
samples of class ki . by yk , while the output of the jth neuron of the hidden
According to question (9), the decision rule can be layer is denoted by tk .
written as: if gi ðxÞ ¼ min gi ðxÞ, then the decision
x 2 kj . This decision-making technique is entitled the
nearest neighbor technique, that is, for the unknown
samplex, as long as the Euclidean distance between x
Pm
Calculating the output yk of the output layer, this is:
and n ¼ ni known class samples is compared, the
i¼1
decision-making x is the same as the nearest sample,
and the class of sample x can be determined.
iterated K times, and the average error of all iterations looking at all the features together, this method takes
is reported, allowing each sample in the dataset to be each feature separately and identifies the relationship
tested. Integrating K-fold cross-validation with an of that feature with the target feature (Abellana and
exhaustive grid search technique aid in hyperpara Lao 2023). Pearson’s correlation is a statistical
meter tuning, ensuring optimal model performance. approach that measures the strength of a linear rela
This approach not only evaluates the model’s predict tionship between two variables (a and b). The method
ability but also addresses overfitting concerns during attempts to draw the line of best fit through data, with
the training process. In the context of our model values ranging from −1 to +1. A value of 0 indicates no
development, we have incorporated five folds in the relationship between the variables. Values between −1
K-fold cross-validation process (Figure 8). This com and 0 indicate a negative relationship and values
prehensive approach contributes to a more thorough between 0 and +1 indicate a positive correlation.
assessment of the model’s generalization capabilities.
Typical data splits are 70% training and 30% testing.
Figure 8. Visualization of the cross-validation process involving random subsampling. The initial dataset undergoes random
partitioning, creating a training set for model development and a testing set for validation purposes.
GEO-SPATIAL INFORMATION SCIENCE 11
Figure 9. (a) Weight feature importance scores and (b) heatmap of correlation features for lithofacies. Following a successful
ranking, the top five features scaled down with the feature selection method were carried forward to modeling to construct
models for lithofacies.
(RHOB) (7.70). Whereas photoelectric factor (PEF), used a robust scaling technique, called z-score
spontaneous potential (SP), and caliper (CAL) inputs method, that removes the median and scales the data
were ranked as the least impactful and relevant input according to the interquartile range (Equation 15).
features with influence factors of 6 and <6, respec The effect of features scaling can be clearly shown
tively. Therefore, it can be suggested that the top five when using Karnal density pair plots. Figure 10 dis
ranked inputs are the most relevant features. plays the Karnal density pair plots generated for all
features after applying z-score method. After normal
3.2.6. Measuring prediction accuracy ization, all the features have been scaled to a range of 0
Evaluation metrics are required to estimate the per and 1 (Figure 10a,b). The RMSE value in the after-
formance of the model (Arkalgud, McDonald, and normalized model is lower than the before normalized
Brackenridge 2021). The difference between the actual model (Figure 10c,d). The selected hyperparameters
logs and pseudo logs can be calculated by these eva based on the optimization routine are a max depth of
luation metrics, as there are several evaluation metrics 12, a learning rate of 0.05, and a minimum child
for regression. A simple following method has been weight of 8.
employed:
Figure 10. Comparison of statistical characteristics of feature curves (a) before and (b) after normalization, and the RMSE value is
lower than the RMSE values in the initial and normalized hyperparameter optimization of the models as shown in (c) and (d).
accuracy of lithology identification. The purpose of The method of log facies analysis is to establish the
lithological identification is to analyze the energy of relationship between log facies and rock facies and
the water body in the sedimentary period by using transform log facies into rock facies to realize rock
lithological information, to meet the demand for fine facies identification. The relationship between log
research on sedimentary microfacies. Therefore, con facies and rock facies is established based on core
sidering the demand for sedimentary microfacies calibration. Due to the complexity of the rock electri
research and the recognition accuracy, the lithofacies cal relationship, log facies, and rock facies often can
is divided into six categories (Table 1). Because the not be completely one-to-one corresponded. It is
particle size of sandstone and fine sandstone is rela necessary to determine the corresponding relationship
tively coarse, it is difficult to distinguish them in terms by referring to the probability of different rock facies
of logging characteristics, while both fine-grained corresponding to each well log. The probability is
sand and granular sand reflect a high-energy sedimen equal to the percentage of the cumulative thickness
tary environment, so they are collectively classified as of different rock facies corresponding to each well log
granular sand facies. The granular sand facies repre in the total thickness of the log facies. Taking the SOM
sent the high-energy sedimentary environment, the method of K-14 well as an example, the total thickness
siltstone facies represent the medium-energy sedimen of log facies 5 is 19 m, and the thickness of the good
tary environment, and the clay or shale facies repre sand reservoir part corresponding to log facies 5 is 16
sents the low-energy sedimentary environment. It is m, so the probability of log facies 5 corresponding to
divided into six types of lithofacies, which can not only good sand reservoir facies is 83.8%. The probability of
meet the production demand but also ensure high- the corresponding medium sand reservoir facies is
precision lithofacies identification. 16.2%, and the probability of the corresponding shale
and other facies are 0%. It is comprehensively consid
ered that log facies 5 corresponds to the good sand
5. Logging facies analysis method reservoir facies (Table 2). Similarly, based on the per
centage of core calibration probability, a good sand
A log facies is a set of logging features and their reservoir corresponds to log facies 5 and log facies 9. If
combinations that can reflect the sedimentary charac the probability of a similar selection of other rock
teristics and distinguish this sediment from other sedi facies is greater than 70%, the corresponding relation
ments. Rock facies refers to the total geological ship between log facies and rock facies be established.
environment in which rocks are formed, including
temperature, climate, stratum, age, etc. It is a rock or
rock combination formed in a certain sedimentary
6. Results and discussion
environment. Each type of lithofacies has different
physical properties, such as high porosity or low por In this section, we will discuss the results of selected
osity, high radioactivity or low radioactivity, oil, and machine learning models and the accuracy of predicting
gas, or water-bearing and fracture development. As lithofacies in a geometrically progradational sequence
a result, the same lithofacies may have multiple log environment. Based on the consistency table, lithofacies
characteristics, so one rock facies may correspond to are divided into six divisions, as shown in Figure 12.
multiple log facies. The relationship between lithofacies and logging char
The number of log facies can be set according to acteristics is analyzed in Figure 12, which represents
actual needs. Generally, the number of log facies is a cross plot of scattered points for various parameters.
larger than the actual of rock facies. The more the It can be observed that the natural gamma-ray vs deep
number of log facies, the more complex the corre resistivity log cross plot provides a good distinction for
sponding relationship with the rock facies. lithology. Conversely, the acoustic vs deep resistivity log
According to the literature, it is more appropriate for cross plot exhibits a relatively weak discrimination abil
the number of log facies to be set 1 to 3 times the ity for facies. Different lithological categories overlap
number of rock facies (Al Hasan et al. 2023). The rock significantly and cannot be effectively distinguished by
facies in the study area are divided into 6 categories, logging data alone in this complex lithological and
and the number of log facies is set to 10 categories. depositional environment.
14 M. ALI ET AL.
Table 2. Probability percentages indicating the relationship between log facies and corresponding rock facies based on core
calibration data. Bold values highlight significant probabilities supporting the identification of specific rock facies.
Core Facies 1 Core Facies 2 Core Facies 3 Core Facies 4 Core Facies 5 Core Facies 6
Log Facies 1 0 0 59.3 22.1 18.6 0
Log Facies 2 0 0 70.2 29.8 0 0
Log Facies 3 0 17.4 1.4 81.3 0 0
Log Facies 4 0 0 83.5 10.1 6.3 0
Log Facies 5 0 0 0 0 83.8 16.2
Log Facies 6 1.5 83.1 1.5 0 13.8 0
Log Facies 7 100 0 0 0 0 0
Log Facies 8 45.4 33 0 0 21.6 0
Log Facies 9 100 0 0 0 0 0
Recognizing the limitations of conventional logging example, changes can be seen in Figure 13 in neutron,
data in such a complex setting, we leverage the power density, and acoustic log characteristics in the pay
of diverse machine learning methodologies. The zone showing that the porosity log response changes
ensemble includes SOM, KNN, MRGC, and ANN. from low to high. Taking the MRGC method as an
These techniques collectively aim to discern, compare, example, the high porosity pay zone corresponds to
and enhance the recognition effects, providing log facies 8, the medium porosity medium pay zone
a robust framework for unraveling the intricacies of corresponds to log facies 9, and low porosity non-pay
lithofacies in the face of challenging geological and corresponds to log facies 1.
depositional conditions. Figure 13 provides a comprehensive visual repre
Both SOM and MRGC belong to cluster analysis sentation of the transformative process applied to the
methods. The process of identifying lithofacies by log facies of the K-15 well after core calibration. The
cluster analysis is to determine the number of cluster utilization of both SOM and MRGC cluster analysis
log facies first; then, cluster to obtain log facies, cali techniques enables the conversion of log facies colors
brate the core with coring wells, establish the corre into corresponding lithologies, establishing a crucial
sponding relationship between log facies and link between the acquired well data and its geological
lithofacies, and finally, convert log facies into lithofa significance. After carefully reviewing Figure 14, it
cies to realize lithofacies identification of non-coring becomes apparent that the lithologies derived from
wells and non-coring sections. The relationship the log facies transformation align remarkably well
between logging facies and rock correspondence is with the lithology profile extracted from the actual
the key to determining the identification effect. The core samples. This alignment underscores the effec
corresponding relationship can be determined based tiveness and reliability of the applied methods in accu
on the existing logging theoretical basis. Therefore, rately capturing the geological characteristics of the
even if the number of core samples is small, cluster subsurface formations. Specifically, employing the
analysis can be performed. The method can also iden SOM method reveals that the coring section of the
tify lithofacies. K-15 well spans a thickness of 9.8 m. Notably, the
The key to cluster analysis is log facies analysis, coincidence thickness, where lithofacies identification
where log facies are translated into geological facies. matches with the actual lithology, extends to 24.5 m.
In the process of transformation, there may be multi Despite the relatively lower recognition coincidence
ple log facies corresponding to one rock facies. For rate of 40%, the SOM method provides valuable
GEO-SPATIAL INFORMATION SCIENCE 15
Figure 13. The log facies correlated with core calibration by SOM and MRGC of K-15 well.
insights into the lithological composition of the well. coincidence rate of 90.7%. This signifies a robust cap
In contrast, the MRGC method demonstrates a higher ability of MRGC in precisely characterizing lithologi
level of accuracy in lithofacies identification. The coin cal transitions and patterns within the K-15 well.
cidence thickness under the MRGC approach is mea Figure 14 serves as a crucial visual aid, offering a side-
sured at 10.8 m, with an impressive identification by-side comparison of the transformed log facies using
16 M. ALI ET AL.
SOM and MRGC methods against the lithology profile makes it evident that the MRGC algorithms not only
derived from actual core samples. The consistent showcase a good classification effect for each lithology
alignment observed in the lithological interpretations but also demonstrate robustness in handling complex
underscores the reliability of the chosen methodolo geological variations. The visualization in Figure 15
gies in converting log data into meaningful geological allows us to discern the detailed performance differ
information, thus enhancing our understanding of the ences among the machine learning models, guiding us
subsurface geological conditions in the vicinity of the toward a more informed selection of models based on
K-15 well. their suitability for specific lithological contexts. In
Similarly, we are taking another well as an example, summary, the detailed examination of Figure 15
where sufficient core data information was not avail underscores the significance of employing MRGC
able; therefore, we utilized and combined the previous and KNN models for effective lithology prediction
near well K-15 core data information with K-14 well to and classification, particularly in instances where
identify and compare the actual rock facies with log data imbalances pose challenges for other machine
facies in the existing well. Therefore, we employed learning algorithms. The results from Figures 14 and
multiple machine learning models in this scenario. 15 highlight the effectiveness of the MRGC algorithm
Figure 15 visualizes the predictive results of the four in accurately classifying lithologies and its robust per
different machine learning models for K-14 well. The formance in handling complex geological variations
evaluation results are displayed in the log track from within the study area. Therefore, based on these obser
10 to 13, the performance of each model more intui vations, MRGC emerges as the most suitable method
tively. Tracks 11 and 13 of Figure 15 demonstrate that for lithofacies identification in our study.
the MRGC and KNN models exhibit a notable recog Consequently, we utilized the MRGC algorithms in
nition effect for various lithologies, surpassing other the process to construct a facies model in the next step.
cluster methods such as ANN and SOM. The distinct The next step is to develop the lithofacies model.
classification performance of MRGC and KNN in We have used the truncated Gaussian simulation
these tracks underscores their efficacy in handling (TGS) technique in this work on a G5 layer at the
diverse lithological variations within the well. Kadanwari gas field. The employment of the TGS
Notably, Figure 15 highlights that the ANN and method is to subsequently build a facies model that
SOM models show comparatively lower effectiveness, can terminate the characterization uncertainties
particularly in scenarios involving imbalanced classi and prospectively, a better definition of the area.
fication of lithologies. Further analysis of Figure 15 The TGS approach was first established to render
Figure 15. Comparison of lithofacies identification results with the different machine learning approaches.
GEO-SPATIAL INFORMATION SCIENCE 17
Figure 16. Illustration of the sequential model building based on lithofacies (a) estimated facies proportions (b) section across the
chronostratigraphic sublayer of G distributions. (c) WS-se section across the TGS facies correlated with the well log information.
stochastic images of sedimentary geology precar the sedimentary facies interpreted from the well logs
ious to fluvial-deltaic environments (Beucher et al. and the electro-facies obtained from the TGS electro-
1999; López and Aldana 2007). The fundamental facies volume.
purpose of the TGS model was to substitute poor
geological calculations with multiple Gaussian dis
tributions based on random function for com 7. Conclusions
monly used geostatistical simulations. The This research presents a comprehensive study of
simulation for lithotype relies on the value given machine learning approaches for lithofacies identifica
for the Gaussian random function and further, this tion in complex geological environments with high
given value selects the lithotype after determining dimensions, focusing specifically on the G5 layer of
the threshold. The TGS model is normally the Lower Goru formation in the Kadanwari gas field,
employed in the sedimentary environment. After Central Indus Basin, Pakistan. By systematically com
undergoing a geometric modification that flattens paring and analyzing the practical application out
one or more chronostratigraphic markers, the comes of four distinct methods SOM, MRGC, ANN,
simulation is run in a “working simulation grid.” and KNN and incorporating TGS analysis for reservoir
They are shown as vertically drawn proportion characterization, this research not only addresses the
curves or vertical proportion curves. These curves challenges associated with applying machine learning
can differ spatially. to well log data but also establishes a robust framework
After that, the MRGC technique was utilized to with global implications. The findings underscore the
propagate the lithofacies information obtained from strengths of unsupervised learning methods in lithofa
the cored wells to the non-cored wells. The lithofacies cies relationship establishment, particularly highlight
were then modeled employing TGS constrained to ing SOM’s fault tolerance networks and MRGC’s
depositional facies based on the geological information efficacy in handling domain intersection with multiple
and chronostratigraphic markers (Figure 16b). lithofacies classifications. Similarly, supervised learning
Figure 16a shows the global proportion curves for the methods, namely ANN and KNN, demonstrate their
data set. Figure 16c shows the cross sections of one respective adaptability and problem-solving capacities.
simulation for the same data set. This simulation has In real-world scenarios, such as the complex litho-
been performed by using proportions that vary in electric relationships of fluvio-deltaic environments
space. with limited core data, the study recommends the suit
The results obtained for each of the facies of the ability of KNN and MRGC methods. The integration of
main G5 layer, follow the geological information and TGS analysis following precise lithofacies identification
previous studies in the area and honors the well data. enhances reservoir characterization, providing a facies
It is important to notice the good lateral correlation volume that aids in defining prospective areas. Notably,
obtained with the TGS method. As can be observed in the TGS approach outperforms previous non-linear
Figure 16c, there is an excellent agreement between techniques, offering conclusive results and establishing
18 M. ALI ET AL.
a strong lateral correlation between identified well log Umar Ashraf received his Ph.D. in geophysics from the
ging lithofacies and TGS lithofacies. Beyond the Institute of Geophysics and Geomatics, China University
of Geosciences (Wuhan), China. He is currently an assistant
Kadanwari gas field, this research provides valuable professor of geophysics at Yunnan University. He focuses
insights and a practical reference for sedimentary litho on geophysical methods for subsurface exploration and
facies logging identification in diverse formations and seismic data analysis.
study areas, contributing significantly to the global Wakeel Hussain received his M.S. degree in oil and gas
interest in advancing the application of machine learn engineering from the China University of Geosciences
(Wuhan), China. He is currently pursuing Ph.D. degree
ing techniques in geological and reservoir studies. The
from China University of Geosciences (Wuhan). His
proposed workflow ensures consistency, reliability, and research interests include petroleum geology, well-log ana
efficiency in results, ultimately saving time and effort in lysis, and facies classification.
data processing and interpretation on a broader scien
tific scale.
ORCID
Muhammad Ali http://orcid.org/0000-0001-9795-1117
Disclosure statement Peimin Zhu http://orcid.org/0000-0003-1613-9261
Ren Jiang http://orcid.org/0000-0002-6750-2297
No potential conflict of interest was reported by the
Hao Zhang http://orcid.org/0000-0001-7845-3489
author(s).
Umar Ashraf http://orcid.org/0000-0003-2402-3605
Wakeel Hussain http://orcid.org/0009-0007-3582-3612
Funding
This research was funded by financial support from the
Data availability statement
National Natural Science Foundation of China [Grant The data supporting the findings of this study can be
Nos. 41774145, 72243011] and China’s National Key R&D obtained from the corresponding author upon
Program [Grant No. 2023YFB4104200]. a reasonable request.
Ali, M., R. Jiang, H. Ma, H. Pan, K. Abbas, U. Ashraf, and Ashraf, U., H. Zhang, A. Anees, M. Ali, X. Zhang, S. Abbasi,
J. Ullah. 2021. “Machine Learning - A Novel Approach of and H. Nasir. 2020. “Controls on Reservoir Heterogeneity
Well Logs Similarity Based on Synchronization Measures of a Shallow-Marine Reservoir in Sawan Gas Field, SE
to Predict Shear Sonic Logs.” Journal of Petroleum Science Pakistan: Implications for Reservoir Quality Prediction
& Engineering 203:108602. https://doi.org/10.1016/j.pet Using Acoustic Impedance Inversion.” Water 12 (11):
rol.2021.108602. 2972. https://doi.org/10.3390/w12112972.
Ali, M., M. J. Khan, M. Ali, and S. Iftikhar. 2019. Ashraf, U., H. Zhang, A. Anees, H. N. Mangi, M. Ali,
“Petrophysical Analysis of Well Logs for Reservoir X. Zhang, M. Imraz, et al. 2021. “A Core Logging,
Evaluation: A Case Study of “Kadanwari” Gas Field, Machine Learning and Geostatistical Modeling
Middle Indus Basin, Pakistan.” Arabian Journal of Interactive Approach for Subsurface Imaging of
Geosciences 12 (6): 215. https://doi.org/10.1007/s12517- Lenticular Geobodies in a Clastic Depositional System,
019-4389-x. SE Pakistan.” Natural Resources Research 30 (3):
Ali, M., H. Ma, H. Pan, U. Ashraf, and R. Jiang. 2020. 2807–2830. https://doi.org/10.1007/s11053-021-09849-x.
“Building a Rock Physics Model for the Formation Ashraf, U., H. Zhang, H. V. Thanh, A. Anees, M. Ali, Z.
Evaluation of the Lower Goru Sand Reservoir of the Duan, H. N. Mangi, and X. Zhang 2024b. “A Robust
Southern Indus Basin in Pakistan.” Journal of Petroleum Strategy of Geophysical Logging for Predicting Payable
Science & Engineering 194:107461. https://doi.org/10. Lithofacies to Forecast Sweet Spots Using Digital
1016/j.petrol.2020.107461. Intelligence Paradigms in a Heterogeneous Gas Field.”
Ali, M., P. Zhu, R. Jiang, H. Ma, M. Ehsan, W. Hussain, Natural Resources Research: 1–22.
H. Zhang, U. Ashraf, and J. Ullaah. 2023b. “Reservoir Ashraf, U., P. Zhu, Q. Yasin, A. Anees, M. Imraz,
Characterization Through Comprehensive Modeling of H. N. Mangi, and S. Shakeel. 2019. “Classification of
Elastic Logs Prediction in Heterogeneous Rocks Using Reservoir Facies Using Well Log and 3D Seismic
Unsupervised Clustering and Class-Based Ensemble Attributes for Prospect Evaluation and Field
Machine Learning.” Applied Soft Computing 148:110843. Development: A Case Study of Sawan Gas Field,
https://doi.org/10.1016/j.asoc.2023.110843. Pakistan.” Journal of Petroleum Science & Engineering
Ali, N., J. Chen, X. Fu, W. Hussain, M. Ali, S. M. Iqbal, 175:338–351. https://doi.org/10.1016/j.petrol.2018.12.
A. Anees, M. Hussain, M. Rashid, and H. V. Thanh. 2023. 060.
“Classification of Reservoir Quality Using Unsupervised Balmer, M., R. Weibel, and H. Huang. 2021. “Value of
Incorporating Geospatial Information into the
Machine Learning and Cluster Analysis: Example from
Prediction of On-Street Parking Occupancy – A Case
Kadanwari Gas Field, SE Pakistan.” Geosystems and
Study.” Geo-Spatial Information Science 24 (3): 438–457.
Geoenvironment 2 (1): 100123. https://doi.org/10.1016/j.
https://doi.org/10.1080/10095020.2021.1937337.
geogeo.2022.100123.
Beucher, H., F. Fournier, B. Doligez, and J. Rozanski. 1999.
Al-Qaness, M. A. A., A. A. Ewees, H. Fan, A. M. AlRassas,
“Using 3D Seismic-Derived Information in Lithofacies
and M. Abd Elaziz. 2022. “Modified Aquila Optimizer for
Simulations. A Case Study.” Paper presented at the SPE
Forecasting Oil Production.” Geo-Spatial Information
Annual Technical Conference and Exhibition, Houston,
Science 25 (4): 519–535. https://doi.org/10.1080/
Texas, October 3–6. https://doi.org/10.2118/56736-MS.
10095020.2022.2068385.
Bezdek, J. C., S. K. Chuah, and D. Leep. 1986. “Generalized
Alzubaidi, F., P. Mostaghimi, P. Swietojanski, S. R. Clark,
K-Nearest Neighbor Rules.” Fuzzy Sets and Systems
and R. T. Armstrong. 2021. “Automated Lithology 18 (3): 237–256. https://doi.org/10.1016/0165-0114(86)
Classification from Drill Core Images Using 90004-7.
Convolutional Neural Networks.” Journal of Petroleum Bhattacharya, S., T. R. Carr, and M. Pal. 2016. “Comparison of
Science & Engineering 197:107933. https://doi.org/10. Supervised and Unsupervised Approaches for Mudstone
1016/j.petrol.2020.107933. Lithofacies Classification: Case Studies from the Bakken
Antariksa, G., R. Muammar, and J. Lee. 2022. “Performance and Mahantango-Marcellus Shale, USA.” Journal of
Evaluation of Machine Learning-Based Classification Natural Gas Science & Engineering 33:1119–1133. https://
with Rock-Physics Analysis of Geological Lithofacies in doi.org/10.1016/j.jngse.2016.04.055.
Tarakan Basin, Indonesia.” Journal of Petroleum Science Bloch, S., R. H. Lander, and L. Bonnell. 2002. “Anomalously
& Engineering 208:109250. https://doi.org/10.1016/j.pet High Porosity and Permeability in Deeply Buried
rol.2021.109250. Sandstone Reservoirs: Origin and Predictability.” AAPG
Arkalgud, R., A. McDonald, and R. Brackenridge. 2021. Bulletin 86 (2): 301–328. https://doi.org/10.1306/
“Automated Selection of Inputs for Log Prediction 61EEDABC-173E-11D7-8645000102C1865D.
Models Using a New Feature Selection Method.” Paper Cai, J., and Y. Chen. 2022. “A Novel Unsupervised Deep
presented at the SPWLA 62nd Annual Logging Learning Method for the Generalization of Urban Form.”
Symposium, Virtual Event, 17–20. May. https://doi.org/ Geo-Spatial Information Science 25 (4): 568–587. https://
10.30632/SPWLA-2021-0091. doi.org/10.1080/10095020.2022.2068384.
Ashraf, U., W. Shi, H. Zhang, A. Anees, R. Jiang, M. Ali, H. Chai, H., N. Li, C. Xiao, X. Liu, D. Li, C. Wang, and D. Wu.
N. Mangi, and X. Zhang 2024c. “Reservoir Rock Typing 2009. “Automatic Discrimination of Sedimentary Facies
Assessment in a Coal-Tight Sand-Based Heterogeneous and Lithologies in Reef-Bank Reservoirs Using Borehole
Geological Formation Through Advanced AI Methods.” Image Logs.” Applied Geophysics 6 (1): 17–29. https://doi.
Scientific Reports 14 (1): 5659. org/10.1007/s11770-009-0011-4.
Ashraf, U., H. Zhang, A. Anees, M. Ali, H. N. Mangi, and X. Chang, H. C., D. C. Kopaska, and H. C. Chen. 2002.
Zhang 2024a. “An Ensemble-Based Strategy for Robust “Identification of Lithofacies Using Kohonen
Predictive Volcanic Rock Typing Efficiency on a Global Self-Organizing Maps.” Computers & Geosciences 28 (2):
Scale: A Novel Workflow Driven by Big Data Analytics.” 223–229. https://doi.org/10.1016/S0098-3004(01)00067-X.
Science of the Total Environment 173425. 10.1016/j.scito Chawshin, K., A. Gonzalez, C. F. Berg, D. Varagnolo,
tenv.2024.173425. Z. Heidari, and O. Lopez. 2021. “Classifying Lithofacies
20 M. ALI ET AL.
from Textural Features in Whole Core CT-Scan Images.” Lai, J., X. Fan, B. Liu, X. Pang, S. Zhu, W. Xie, and G. Wang.
SPE Reservoir Evaluation & Engineering 24 (2): 341–357. 2020. “Qualitative and Quantitative Prediction of
https://doi.org/10.2118/205354-PA. Diagenetic Facies via Well Logs.” Marine & Petroleum
Dixit, N., P. McColgan, and K. Kusler. 2020. “Machine Geology 120:104486. https://doi.org/10.1016/j.marpetgeo.
Learning-Based Probabilistic Lithofacies Prediction 2020.104486.
from Conventional Well Logs: A Case from the Umiat Lai, J., G. Wang, S. Wang, J. Cao, M. Li, X. Pang, C. Han, et
Oil Field of Alaska.” Energies 13 (18): 4862. https://doi. al. 2018a. “A Review on the Applications of Image Logs in
org/10.3390/en13184862. Structural Analysis and Sedimentary Characterization.”
Dubois, M. K., G. C. Bohling, and S. Chakrabarti. 2007. Marine & Petroleum Geology 95:139–166. https://doi.org/
“Comparison of Four Approaches to a Rock Facies 10.1016/j.marpetgeo.2018.04.020.
Classification Problem.” Computers & Geosciences 33 (5): Lai, J., G. Wang, S. Wang, J. Cao, M. Li, X. Pang, Z. Zhou,
599–617. https://doi.org/10.1016/j.cageo.2006.08.011. et al. 2018b. “Review of Diagenetic Facies in Tight
Feng, C., W. Shi, Y. Hu, and X. Zhao. 2018. “Depositional Sandstones: Diagenesis, Diagenetic Minerals, and
Environments and Petrofacies of X–XII Sand Groups of Prediction via Well Logs.” Earth-Science Reviews
K 2 qn 3 Formation, Daqingzijing Area, Songliao Basin, 185:234–258. https://doi.org/10.1016/j.earscirev.2018.
China.” Journal of Petroleum Exploration and Production 06.009 .
Technology 8 (2): 363–374. https://doi.org/10.1007/ Lan, X., C. Zou, Z. Kang, and X. Wu. 2021. “Log Facies
s13202-017-0400-9. Identification in Carbonate Reservoirs Using Multiclass
Geng, J., W. Gan, J. Xu, R. Yang, and S. Wang. 2020. Semi-Supervised Learning Strategy.” Fuel 302:121145.
“Support Vector Machine Regression (SVR)-Based https://doi.org/10.1016/j.fuel.2021.121145.
Nonlinear Modeling of Radiometric Transforming LeCun, Y., Y. Bengio, and G. Hinton. 2015. “Deep
Relation for the Coarse-Resolution Data-Referenced Learning.” Nature 521 (7553): 436–444. https://doi.org/
Relative Radiometric Normalization (RRN).” Geo- 10.1038/nature14539.
Spatial Information Science 23 (3): 237–247. https://doi. Li, L., Z. Wen, and Z. Wang. 2016. “Outlier Detection and
org/10.1080/10095020.2020.1785958. Correction During the Process of Groundwater Lever
Guresen, E., and G. Kayakutlu. 2011. “Definition of Monitoring Base on Pauta Criterion with Self-Learning
Artificial Neural Networks with Comparison to Other and Smooth Processing.” Paper Presented at the AsiaSim
Networks.” Procedia Computer Science 3:426–433. SCS AutumnSim, 643. Springer, Singapore.
https://doi.org/10.1016/j.procs.2010.12.071. Li, S., H. He, R. Hao, H. Chen, H. Bie, and P. Liu. 2020.
Guyon, I., and A. Elisseeff. 2003. “An Introduction to “Depositional Regimes and Reservoir Architecture
Variable and Feature Selection.” Journal of Machine Characterization of Alluvial Fans of Karamay Oilfield in
Learning Research 3:1157–1182. https://doi.org/10.1162/ Junggar Basin, Western China.” Journal of Petroleum
153244303322753616. Science & Engineering 186:106730. https://doi.org/10.
He, J., W. Ding, Z. Jiang, A. Li, R. Wang, and Y. Sun. 2016. 1016/j.petrol.2019.106730.
“Logging Identification and Characteristic Analysis of the Li, Y., T. Li, and H. Liu. 2017. “Recent Advances in Feature
Lacustrine Organic-Rich Shale Lithofacies: A Case Study Selection and Its Applications.” Knowledge and
from the Es3L Shale in the Jiyang Depression, Bohai Bay Information Systems 53 (3): 551–577. https://doi.org/10.
Basin, Eastern China.” Journal of Petroleum Science & 1007/s10115-017-1059-8.
Engineering 145:238–255. https://doi.org/10.1016/j.pet Liu, J., Z. Liu, K. Xiao, Y. Huang, and W. Jin. 2020.
rol.2016.05.017. “Characterization of Favorable Lithofacies in Tight
Hemmesch, N. T., N. B. Harris, C. A. Mnich, and D. Selby. Sandstone Reservoirs and Its Significance for Gas
2014. “A Sequence-Stratigraphic Framework for the Exploration and Exploitation: A Case Study of the 2nd
Upper Devonian Woodford Shale, Permian Basin, West Member of Triassic Xujiahe Formation in the Xinchang
Texas.” AAPG Bulletin 98 (1): 23–47. https://doi.org/10. Area, Sichuan Basin.” Petroleum Exploration and
1306/05221312077. Development 47 (6): 1194–1205. https://doi.org/10.1016/
Hussain, M., S. Liu, U. Ashraf, M. Ali, W. Hussain, N. Ali, S1876-3804(20)60129-5.
and A. Anees. 2022. “Application of Machine Learning López, M., and M. Aldana. 2007. “Facies Recognition Using
for Lithofacies Prediction and Cluster Analysis Approach Wavelet Based Fractal Analysis and Waveform Classifier
to Identify Rock Type.” Energies 15 (12): 4501. https://doi. at the Oritupano-A Field, Venezuela.” Nonlinear
org/10.3390/en15124501. Processes in Geophysics 14 (4): 325–335. https://doi.org/
Ioffe, S., and C. Szegedy. 2015. “Batch Normalization: 10.5194/npg-14-325-2007.
Accelerating Deep Network Training by Reducing Lyu, Q., S. Luo, Y. Guan, J. Fu, X. Niu, L. Xu, S. Feng, and
Internal Covariate Shift.” Paper presented at the S. Li. 2019. “A New Method of Lithologic Identification
Proceedings of the 32nd International Conference on and Distribution Characteristics of Fine-Grained
Machine Learning, Lille, France, 448–456. https://doi. Sediments: A Case Study in Southwest of Ordos Basin,
org/10.48550/arXiv.1502.03167. China.” Open Geosciences 11 (1): 17–28. https://doi.org/
Kim, J. 2022. “Lithofacies Classification Integrating 10.1515/geo-2019-0002.
Conventional Approaches and Machine Learning Nishitsuji, Y., and R. Exley. 2019. “Elastic Impedance Based
Technique.” Journal of Natural Gas Science & Facies Classification Using Support Vector Machine and
Engineering 100:104500. https://doi.org/10.1016/j.jngse. Deep Learning.” Geophysical Prospecting 67 (4):
2022.104500. 1040–1054. https://doi.org/10.1111/1365-2478.12682.
Kohonen, T. 1991. “Self-Organizing Maps: Ophmization Onalo, D., S. Adedigba, F. Khan, L. A. James, and S. Butt.
Approaches.” Paper presented at the Proceedings of the 2018. “Data Driven Model for Sonic Well Log
1991 International Conference on Artificial Neural Prediction.” Journal of Petroleum Science & Engineering
Networks, 981–990. Espoo, Finland, June 24–28. https:// 170:1022–1037. https://doi.org/10.1016/j.petrol.2018.06.
doi.org/10.1016/B978-0-444-89178-5.50003-8. 072.
GEO-SPATIAL INFORMATION SCIENCE 21
Ozkan, A., S. Cumella, K. Milliken, and S. Laubach. 2011. Vienna, Austria, May 23–26. https://doi.org/10.2118/
“Prediction of Lithofacies and Reservoir Quality Using 143001-ms.
Well Logs, Late Cretaceous Williams Fork Formation, Villegas, G., W. Liao, R. Criollo, W. Philips, and D. Ochoa.
Mamm Creek Field, Piceance Basin, Colorado.” AAPG 2017. “Detection of Leaf Structures in Close-Range
Bulletin 95 (10): 1699–1723. https://doi.org/10.1306/ Hyperspectral Images Using Morphological Fusion.”
01191109143. Geo-Spatial Information Science 20 (4): 325–332. https://
Pal, S. C., D. Ruidas, A. Saha, A. R. M. T. Islam, and doi.org/10.1080/10095020.2017.1399673.
I. Chowdhuri. 2022. “Application of Novel Data-Mining Wang, Z., D. Gao, X. Lei, D. Wang, and J. Gao. 2020.
Technique Based Nitrate Concentration Susceptibility “Machine Learning-Based Seismic Spectral Attribute
Prediction Approach for Coastal Aquifers in India.” Analysis to Delineate a Tight-Sand Reservoir in the
Journal of Cleaner Production 346:131205. https://doi. Sulige Gas Field of Central Ordos Basin, Western
org/10.1016/j.jclepro.2022.131205. China.” Marine & Petroleum Geology 113:104136.
Pei, Y., R. Chen, D. Li, X. Xiao, and X. Zheng. 2023. “FCN- https://doi.org/10.1016/j.marpetgeo.2019.104136.
Attention: A Deep Learning UWB NLOS/LOS Wu, D., S. Liu, H. Chen, L. Lin, Y. Yu, C. Xu, and B. Pan.
Classification Algorithm Using Fully Convolution 2020. “Investigation and Prediction of Diagenetic Facies
Neural Network with Self-Attention Mechanism.” Geo- Using Well Logs in Tight Gas Reservoirs: Evidences from
Spatial Information Science 27 (4): 1162–1181. https:// the Xu-2 Member in the Xinchang Structural Belt of the
doi.org/10.1080/10095020.2023.2178334. Western Sichuan Basin, Western China.” Journal of
Ruidas, D., S. C. Pal, A. R. M. Towfiqul, and A. Saha. 2023. Petroleum Science & Engineering 192:107326. https://doi.
“Hydrogeochemical Evaluation of Groundwater Aquifers org/10.1016/j.petrol.2020.107326.
and Associated Health Hazard Risk Mapping Using Ye, S.-J., and P. Rabiller. 2000. “A New Tool for Electro-
Ensemble Data Driven Model in a Water Scares Plateau Facies Analysis: Multi-Resolution Graph-Based
Region of Eastern India.” Exposure and Health 15 (1): Clustering.” Paper presented at the SPWLA 41st Annual
113–131. https://doi.org/10.1007/s12403-022-00480-6. Logging Symposium, Dallas, Texas, June 2000. June.
Shi, X., Y. Cui, X. Guo, H. Yang, R. Chen, T. Li, R. Li, Ying, Z., and P. Bao-Zhi. 2011. “The Application of SVM
J. Wang, R. Wang, and L. Meng. 2017. “Logging Facies and FMI to the Lithologic Identification of Volcanic
Classification and Permeability Evaluation:
Rocks.” Geophysical and Geochemical Exploration 35
Multi-Resolution Graph Based Clustering.” Paper pre
(5): 634–633. https://doi.org/10.1007/s12583-011-0163-z .
sented at the SPE Annual Technical Conference and
Yu, Z., Z. Wang, F. Zeng, P. Song, B. A. Baffour, P. Wang,
Exhibition. https://doi.org/10.2118/187030-MS.
W. Wang, and L. Li. 2021. “Volcanic Lithology
Song, L., Z. Liu, C. Li, C. Ning, Y. Hu, Y. Wang, F. Hong , et
Identification Based on Parameter-Optimized GBDT
al. 2021. “Prediction and Analysis of Geomechanical
Algorithm: A Case Study in the Jilin Oilfield, Songliao
Properties of Jimusaer Shale Using a Machine Learning
Basin, NE China.” Journal of Applied Geophysics
Approach.” Paper presented at the SPWLA 62nd Annual
194:104443. https://doi.org/10.1016/j.jappgeo.2021.
Logging Symposium, May 2021, Virtual Event. https://doi.
org/10.30632/SPWLA-2021-0089 104443.
Soucy, P., and G. W. Mineau. 2001. “A Simple KNN Zhang, J., W. Ambrose, and W. Xie. 2021. “Applying
Algorithm for Text Categorization.” Paper presented at Convolutional Neural Networks to Identify Lithofacies
the Proceedings 2001 IEEE International Conference on of Large-N Cores from the Permian Basin and Gulf of
Data Mining, 647–648. San Jose, CA, USA. Mexico: The Importance of the Quantity and Quality of
Tian, Y., H. Xu, X. Y. Zhang, H. J. Wang, T. C. Guo, Training Data.” Marine & Petroleum Geology 133:105307.
L. J. Zhang, and X. L. Gong. 2016. “Multi-Resolution https://doi.org/10.1016/j.marpetgeo.2021.105307.
Graph-Based Clustering Analysis for Lithofacies Zhen, Y., Y. Xiao, X. Zhao, X. Lu, J. Fang, J. Kang, and
Identification from Well Log Data: Case Study of L. Liu. 2023. “Identifying Lithofacies Types by Boosting
Intraplatform Bank Gas Fields, Amu Darya Basin.” Algorithm and Resampling Technique: A Case Study of
Applied Geophysics 13 (4): 598–607. https://doi.org/10. Deep-Water Submarine Fans in an Oil Field in West
1007/s11770-016-0588-3. Africa.” Petroleum Science and Technology: 1–24.
Valentín, M. B., C. R. Bom, J. M. Coelho, M. D. Correia, https://doi.org/10.1080/10916466.2023.2256787.
M. P. de Albuquerque, M. P. de Albuquerque, and Zheng, W., F. Tian, Q. Di, W. Xin, F. Cheng, and X. Shan.
E. L. Faria. 2019. “A Deep Residual Convolutional 2021. “Electrofacies Classification of Deeply Buried
Neural Network for Automatic Lithological Facies Carbonate Strata Using Machine Learning Methods:
Identification in Brazilian Pre-Salt Oilfield Wellbore A Case Study on Ordovician Paleokarst Reservoirs in
Image Logs.” Journal of Petroleum Science & Tarim Basin.” Marine & Petroleum Geology 123:104720.
Engineering 179:474–503. https://doi.org/10.1016/j.pet https://doi.org/10.1016/j.marpetgeo.2020.104720.
rol.2019.04.030. Zhou, Z., G. Wang, Y. Ran, J. Lai, Y. Cui, and X. Zhao. 2016.
Valzania, S., M. Kfoury, M. Grandis, A. Valdisturlo, “A Logging Identification Method of Tight Oil Reservoir
G. Fanello, L. Guerra, S. Heikal, A. Kashif, and Lithology and Lithofacies: A Case from Chang7 Member
A. Sultan. 2011. “Kadanwari Field: A Tight Gas of Triassic Yanchang Formation in Heshui Area, Ordos
Reservoir Study and a Successful Pilot Well Give New Basin, NW China.” Petroleum Exploration and
Life to an Exploited Field.” Paper presented at the SPE Development 43 (1): 65–73. https://doi.org/10.1016/
EUROPEC/EAGE Annual Conference and Exhibition, S1876-3804(16)30007-6.