li2018

SPE-191400-MS
Data-Driven In-Situ Geomechanical Characterization in Shale Reservoirs
Hao Li, Jiabo He, and Siddharth Misra, University of Oklahoma
Copyright 2018, Society of Petroleum Engineers
This paper was prepared for presentation at the 2018 SPE Annual Technical Conference and Exhibition held in Dallas, Texas, 24-26 September 2018.
This paper was selected for presentation by an SPE program committee following review of information contained in an abstract submitted by the author(s). Contents
of the paper have not been reviewed by the Society of Petroleum Engineers and are subject to correction by the author(s). The material does not necessarily reflect
any position of the Society of Petroleum Engineers, its officers, or members. Electronic reproduction, distribution, or storage of any part of this paper without the written
consent of the Society of Petroleum Engineers is prohibited. Permission to reproduce in print is restricted to an abstract of not more than 300 words; illustrations may
not be copied. The abstract must contain conspicuous acknowledgment of SPE copyright.
Abstract
Compressional and shear travel time logs (DTC and DTS) acquired using sonic logging tools are crucial
for subsurface geomechanical characterization. In this study, 13 ‘easy-to-acquire’ conventional logs were
processed using 6 shallow learning models, namely ordinary least squares (OLS), partial least squares
(PLS), elastic net (EN), LASSO, multivariate adaptive regression splines (MARS), and artificial neural
network (ANN), to successfully synthesize DTC and DTS logs. Among the 6 models, ANN outperforms
other models with R2 of 0.87 and 0.85 for the syntheses of DTC and DTS logs, respectively. The 6 shallow
learning models are trained and tested with 8481 data points acquired from a 4240-feet depth interval of
a shale reservoir in Well 1, and the trained models are deployed in Well 2 for purposes of blind testing
against 2920 data points from 1460-feet depth interval. Following that, 5 clustering algorithms are applied
on the 13 ‘easy-to-acquire’ logs to identify clusters and compare them with the prediction performance of
the shallow learning models used for log synthesis. Dimensionality reduction algorithm is used to visualize
the characteristics of the clustering algorithm. Hierarchical clustering, DBSCAN, and self-organizing map
(SOM) algorithms are sensitive to outliers and did not effectively differentiate the input data into consistent
clusters. Gaussian mixture model can well differentiate the various formations, but the clusters do not have a
strong correlation with the prediction performance of the log-synthesis models. Clusters identified using K-
means method have a strong correlation with the prediction performance of the shallow learning models. By
combining the predictive shallow learning models for log synthesis with the K-means clustering algorithm,
we propose a reliable workflow that can synthesize the DTC and DTS logs, as well as generate a reliability
indicator for the predicted logs to help an user better understand the performance of the shallow learning
models during deployment.
Introduction
Sonic logging tools transmit compressional and shear waves through the formation to obtain formation
matrix and fluid information. Compressional waves travel through both the rock matrix and fluid, while
shear waves only travel through the matrix. The wave traveltime depends on the elastic properties and
moduli of the rock as well as the composition and microstructure of the formation. Compressional and shear
travel time logs (DTC and DTS) can be computed from the waveforms recorded at the receiver. Sonic logs
contain critical geomechanical information for unconventional reservoir characterization. The difference
2 SPE-191400-MS
and variation in the DTC and DTS contain information about the formation porosity, rock brittleness, and
Young’s modulus, to name a few. However, sonic logs may not be always available due to financial or
operational constraints. This study aims to develop a workflow to synthesize both DTC and DTS logs
from ‘easy-to-acquire’ conventional well logs and simultaneously generate a reliability indicator for the
synthesized logs.
Well logging is essential for oil and gas industry to understand the in-situ subsurface petrophysical
and geomechanical properties. Certain well logs, like gamma ray (GR), resistivity, density, and neutron,
are considered as ‘easy-to-acquire’ conventional well logs and deployed in most of the wells. Other well
logs, like nuclear magnetic resonance (NMR), dielectric dispersion, elemental spectroscopy, and sonic,
are deployed in limited number of wells. Easy-to-acquire well logs can be processed using statistical
and machine learning methods to synthesize the well logs that are not frequently acquired in each well.
Researchers have explored the possibility of synthesizing certain ‘hard-to-acquire’ well logs under data
constraint, for e.g. Tariq et al., 2016; Li et al., 2017a; and Li et al., 2017b.
Machine learning algorithms, like ANN and fuzzy logic, are widely applied in prediction tasks in oil and
gas industry. The performance of these models can be easily adjusted according to the complexity of the
problem. Several studies have tried to implement machine learning techniques to determine sonic log from
other well logs. ANNs, Adaptive Neuro Fuzzy Inference System (ANFIS), and Support Vector Machines
(SVM) were used to predict both compressional and shear sonic time from GR, bulk density, and neutron
porosity (Elkatatny et al., 2016). The study achieved correlation coefficient of 0.99 when tested with field
data. In another study, shear wave velocity (reciprocal of DTS) is predicted using the intelligent system,
which combined the algorithms of fuzzy logic, neuro fuzzy, and ANNs. The mean squared error during the
testing stage was around 0.05 (Rezaee et al., 2007). The models show stable performance, but the model can
only predict shear wave. A similar study applied committee machine with intelligent systems to predict sonic
travel time from conventional well logs (Asoodeh et al., 2012). Apart from machine learning algorithms,
other studies predicted DTS or DTC using empirical equations (Iverson et al., 1988; Greenberg et al., 1992),
empirical correlations (Maleki et al., 2014), or self-developed model (Keys et al., 2002). Other researchers
tried to predict DTS and DTC in thin beds using petrophysical properties instead of raw conventional logs
(Baines et al., 2008).
For the problem of predicting DTC and DTS logs from conventional logs, our study aims to test the
performances of shallow learning models. Complex predictive model with large number of parameters
is not necessary for the regression problem we are investigating in this study. The 6 shallow learning
models, namely, Ordinary Least Squares (OLS), Partial Least Squares (PLS), Least Absolute Shrinkage
and Selection Operator (LASSO), ElasticNet model, Multivariate Adaptive Regression Splines (MARS),
and ANN, are implemented in this study. The first four models are linear regression models, and the last
two models are nonlinear regression models. The 13 ‘easy-to-acquire’ conventional logs together with DTC
and DTS logs acquired in two wells are used for this study. Currently, such prediction/synthesis results
are not accompanied by corresponding reliability of the predictions. Our study shows that the prediction
performance of the machine learning models for the log synthesis can be quantified in terms of a reliability
indicator generated using clustering algorithm aimed at identifying group patterns in the dataset based on
the similarity between data points. Clustering algorithms have been widely applied for the identification
of lithofacies (Moghaddas et al., 2017; Shi et al., 2017), clustering wells (Qin et al., 2017), or formation
evaluation (Jain et al., 2015). In this study, we applied 5 clustering algorithms to generate the reliability
indicators to accompany the synthesized logs. The 5 clustering algorithms are K-means, Gaussian Mixture,
Hierarchical clustering, Density-Based Spatial Clustering of Application with Noise (DBSCAN), and self-
organizing map (SOM). Our clustering results together with dimensionality reduction techniques facilitates
clear visualization of the characteristics of each clustering algorithm. The results of K-means clustering are
closely correlated to the prediction performance of the ANN model.
SPE-191400-MS 3
In this paper, 6 shallow learning models and 5 clustering algorithms process 13 ‘easy-to-acquire’
conventional logs to synthesize the compressional and shear travel times along with an indicator of the
reliability of the log synthesis. ANN model has the best prediction accuracy among the 6 prediction
models with an R2 of 0.85. K-means clustering algorithm has a good differentiation ability and the
resultant group numbers are closely related to the ANN prediction accuracy. By combining the shallow-
learning ANN model and the K-means clustering algorithm, we developed a prediction workflow, that
can synthesize the compressional and shear travel time logs and simultaneously generate the reliability
indicator for the prediction results. This study will enable engineers and geoscientists to obtain improved
geomechanical characterization when sonic logging tool is not available due to operational or financial
constraints. Importantly, the reliability of the prediction results can also be computed to facilitate the use
of the synthesized logs.
Data Preparation and Preprocessing

Data preparation
Well logging data used in this study was acquired from two wells. In Well 1, 8481 well log data points were
acquired from a 4240-feet depth interval. In Well 2, 2920 data points were acquired from 1460-feet depth
interval. Gamma Ray log (GR), caliper log (DCAL), density porosity log (DPHZ), neutron porosity log
(NPOR), photoelectric factor log (PEFZ), bulk density log (RHOZ), synthetic lithology log, and laterolog
resistivity logs at 6 depths of investigation (RLA0, RLA1, RLA2, RLA3, RLA4, RLA5) are selected as the
13 easy-to-acquire conventional logs (as shown in Figure 1, Tracks 2-6) that are fed into the six predictive
models and the 5 clustering algorithms. The 4240-feet formation has 13 lithologies, and the synthetic
lithology log use numbers in the range from 1 to 13 to indicate the lithology. DTC and DTS logs are the
outputs of the models (as shown in Figure 1, Track 7).
Figure 1—Track 1 is depth, Track 2 contains gamma ray and caliper logs, Track 3 contains density porosity and neutron
porosity logs, Track 4 contains formation photoelectric factor and bulk density logs, Track 5 is laterolog resistivity
logs at shallow depths of investigation (RLA0, RLA1, RLA2), Track 6 is laterolog resistivity logs at deep depths of
investigation (RLA3, RLA4, RLA5), and Track 7 contains DTC and DTS logs for a 200-feet section of the formation.
Data preprocessing
Data preprocessing aims to facilitate the training process by appropriately transforming and normalizing the
entire dataset. Preprocessing is necessary before applying machine learning algorithms to identify outliers
or normalize different data to equivalent range. Normalization ensures fast convergence of the learning
process. Normalization is performed using equation 1:
4 SPE-191400-MS
(1)
where yi is the original value of a log response and y′i is the normalized value of the log response at each
depth i.
Measurement of prediction performance

The correlation coefficient (R2) is used to compare the prediction performance of all models, which is
formulated as
(2)
where
(3)
and
(4)
where n is the number of depths for which prediction needs to be performed, j=1 indicates the DTC log
and j=2 indicates the DTC log, i represent the depth, ypi,j is the sonic log j predicted at depth i, ymi,j is the
sonic log j measured at depth i, and is the mean of sonic log j measured at all depths for which training or
testing are being performed. RSSj is the sum of squares of the residuals and TSSj is the total sum of squares
proportional to the variance of the corresponding sonic log j.
METHODOLOGY
Six simple prediction models are used to synthesize DTS and DTC logs by processing 13 ‘easy-to-acquire’
logs. The 6 shallow learning models can capture the hidden relationships between the 13 input logs and
the 2 output sonic logs.
Ordinary Least Squares (OLS) model

OLS model is one of the simplest statistical regression model that fits the data by reducing the Sum of
Squared Errors (SSE) between the modeled and measured data. The OLS model assumes the output yi is a
linear combination of input values xip and error εi formulated as
(5)
where i represents a specific depth from the total depth samples available for model training and p represents
the number of input logs available for the training. The model fits the data set by reducing the SSE expressed
as
(6)
where is the predicted output of the model. In this study, the input data xip are the 13 raw logs at depth i
and the output data yi are sonic logs. OLS predictions are adversely affected by outliers, noise in data, and
correlations among the inputs.
Partial Least Squares (PLS) model

PLS model aims to find the correlations between the input data and output data by constructing latent
structures. Input and output logs are decomposed into their latent structures. Latent structure corresponding
to the most variation in output is extracted and then explained using a latent structure in the input space.
SPE-191400-MS 5
New latent structures are combined with original variable to form components. The number of components
m (no more than inputs) are chosen to maximally summarize the covariance with outputs. The determination
of m builds the PLS model with fewer inputs than OLS model, suited when there are high correlations
among inputs. The removal of redundant inputs can save computational time for training the model while
simplifying the model’s structure. In applying the PLS model, the most important parameter is the number
of components to be generated. In our study, smallest m with the best prediction performance is used to build
the model. In the PLS model, sonic logs are generated with constructed components, which are combinations
of latent structures of original inputs. We select the number of components by testing a range of values
and monitoring the change of the model performance. The best performance of the model occurs when the
number of components equals 13, which indicates absence of correlated inputs.
Least Absolute Shrinkage and Selection Operator (LASSO) model

LASSO is a linear regression model that combines the SSE and constrains the sparsity of the coefficient
matrix with L1 norm. The objective function of LASSO is expressed as
(7)
where w is the coefficient vector, X is the measured input vector, Y is the measured output vector, n is the
number of depth samples in the training dataset, and α is the penalty parameter that balances the importance
between the SSE term and the regularization term, which is the L1 norm of the coefficient vector. If α
increases, the regularization term punishes the coefficient matrix to be sparser. The α term is optimized
by testing a range of the values. Different values of the α is tested and the prediction performance of the
model is compared. We select the value of the α as 4.83 for which the prediction accuracy of the model is
the best. R2 for the DTC and DTS predictions is 0.79 and 0.75, respectively. When 4.83 is used, the values
of coefficient for each input log are listed in Table 1. Coefficients of 6 out of the 13 input logs are zero
in the LASSO model. Those logs with a coefficient of 0 are less important to the desired log synthesis as
compared to other logs with non-zero coefficients. These logs may contain redundant and unrepresentative
information not essential for the LASSO model predictions.
Table 1—Estimates of coefficients βq in the LASSO model for λ = 4.83.
Lithology GR DCAL DPHZ NPOR PEFZ RHOZ RLA0 RLA1 RLA2 RLA3 RLA4 RLA5
0.31 0.06 −1.29 0.00 41.63 0.60 −55.51 0.00 0.00 −1.01 −1.05 0.00 0.00
ElasticNet model
ElasticNet model is a linear regression model suitable for high-dimensional data that combines both L1 norm
and L2 norm as the penalty term. Unlike the LASSO model, ElasticNet model preserves certain groups of
correlated input logs and does not ignore highly correlated inputs. The objective function of the ElasticNet
model is defined as
(8)
ElasticNet model is a better model than LASSO model for high dimensional data with highly correlated
variables when certain groups of correlated variables cannot be neglected. Two penalty parameters α1 and
α2 are determined through optimization to be 4.8 and 0.1, respectively.
6 SPE-191400-MS
Multivariate Adaptive Regression Splines (MARS) model

MARS model uses multiple linear regression models in the input space. By fitting the data using multiple
linear regressions, the model can capture the non-linearity of the dataset. The model is a weighted sum of
basis functions. MARS model is formulated as
(9)
where xqi is the value of q-th input log xq at the i-th depth point, Bq(xqi) is a basis function and αq is the
coefficient of Bq. The basis function can have many different forms, but most likely it is a hinge function.
A hinge function is a linear function within a range: max(0, xqi − Cq) and max(0, Cq − xqi). Hinge function
partition the input into different sections by using different Cq. In our study, we use 21 terms to partition
the input logs. The model generates DTC and DTS with R2 of 0.85 and 0.83. Three inputs, namely RLA0,
RLA4, and RLA5, have high correlations with other inputs, namely RLA1, RLA2, and RLA3, and are not
used in the MARS model.
Artificial Neural Network (ANN) model

ANN is a widely used machine learning model suitable for both linear and nonlinear regression. A neural
network comprises an input layer, an output layer and few hidden layers. The capacity of the neural network
model to fit data can be adjusted by adding or decreasing the number of hidden layers and the number of
neurons in each hidden layer. Each hidden layer contains neurons, made of parameters (weights and biases),
to perform matrix computations on signals computed in the previous layer. Activation function in each layer
add nonlinearity to the computation. In our case, there are 13 input logs and 2 output logs to be synthesized,
which are the DTS and DTC logs. The dimension of the input and output layers are 13 and 2 respectively.
We use two hidden layers in the ANN model with 9 and 5 neurons in the first and second hidden layer,
respectively. The neural network implemented in our study utilizes Conjugate Gradient backpropagation to
update the parameters of the neurons.
CASE STUDY
Prediction results of six models
The above mentioned 6 shallow learning models are trained and tested in a 4240-feet depth interval in Well
1 with 8481 data points, and deployed for blind testing in 1460-feet depth interval in Well 2 with 2920 data
points. During training and testing phases, the models are trained with 80% of randomly selected data from
Well 1, and tested with 20% of data. The performance of log synthesis is evaluated in terms of R2. The log
synthesis results for Wells 1 & 2 are shown in Table 2. OLS and PLS exhibit similar performances during
training and testing but not during the deployment (blind-testing). LASSO and ElasticNet have similar
performances during training, testing and blind testing. Among the 6 models, ANN performs the best with R2
of 0.85 during training and testing and 0.84 during the blind testing, whereas LASSO and ElasticNet perform
worst with an R2 of 0.76 during the blind testing. Cross validation was performed to ensure robustness of the
model predictions. As shown in Table 2, when the trained models are deployed in Well 2, all models exhibit
slight decrease in prediction accuracy. ANN has the best performance during deployment. The accuracy
of the DTC and DTS logs synthesized using ANN model is shown in Figure 2, where the measured and
synthesized sonic logs are compared across randomly selected 300 depth samples from Well 2.
SPE-191400-MS 7
Table 2—Prediction performances in terms of R2 for the six models trained and tested in Well 1 and deployed in Well 2.
Accuracy OLS PLS LASSO ElasticNet MARS ANN
DTC 0.830 0.830 0.791 0.791 0.847 0.870

Well 1
DTS 0.803 0.803 0.756 0.753 0.831 0.848
DTC 0.804 0.790 0.778 0.774 0.816 0.850

Well 2
DTS 0.794 0.769 0.763 0.755 0.806 0.840
Figure 2—Comparison of measured (dashed) and synthesized (solid) DTC and DTS logs in Well 2, when
the ANN model is trained and tested in Well 1 and deployed in Well 2 to synthesize the DTC and DTS logs.
Comparison of prediction performances of six models in Well 1

In this section, Relative Error (RE) is used to evaluate the prediction performance of the machine learning
models for log synthesis. RE for a log synthesis is formulated as
(10)
where P is the predicted value and M is the measured value of either DTS or DTC log at a depth i. RE
values are first individually calculated for DTC and DTS logs, then the two RE values are averaged at each
depth to represent the overall prediction performance of a shallow learning model. Averaged RE at each
depth is further averaged over 10-ft depth intervals to reduce the effects of noise, borehole rugosity and
thin layers. The averaged RE will better describe the overall performance of a model in formations with
different lithologies.
The first 6 columns in Figure 3 shows the averaged RE of the six models in Well 1. Whiter colors represent
lower RE, and darker colors represent higher RE, which is inversely related to the prediction performance.
The 6 models exhibit very similar patterns of RE over the 4250-feet depth interval, as shown in the 6
columns. The 6 models perform badly in the upper middle part of the selected formation (around 1250 ft to
1800 ft below the top of the formation depth under investigation). Possibly, the zone of poor performance
has certain physical properties that are very different from the rest or those where the logs have very distinct
statistical features. In the following sections, clustering algorithms process the ‘easy-to-acquire’ logs to
identify clusters that exhibit high correlation with the relative errors in the synthesized DTC and DTS logs
using the 6 machine learning models, with a focus on the ANN model.
8 SPE-191400-MS
Figure 3—First 6 columns are the comparisons of REs of the synthesized DTS and DTC logs generated
using the six models for the 4240-feet depth interval in Well 1, where whiter colored intervals represent
depths that represent zones in which the learning models exhibit good prediction performances.
The last 5 columns demonstrate the clustering results using different clustering algorithms.
CLUSTERING ANALYSIS OF THE PERFORMANCE

In this section, we describe the implementations of 5 clustering methods that process the ‘easy-to-acquire’
input logs to differentiate the various formations in a well into distinct groups (depth intervals) based on
certain similarities of the ‘easy-to-acquire’ input logs for various groups. The goal is to generate clusters/
groups and assess their correlations with the accuracy of the machine learning models for log synthesis as
applied to the training and testing datasets. In doing so, the clustering-algorithm-generated group numbers
exhibit strong correlation with the relative error of synthesized logs that can be used as reliability indicator
for the prediction performance of machine learning models for log synthesis.
Clustering algorithms aim to find relationships in the dataset in an unsupervised manner. The goal of
clustering is to group data into a number clusters such that the data points in the same cluster shares
the most similarity. Clustering technique provides a direct way to explore and visualize dataset. In this
study, we implement 5 clustering methods: centroid-based K-means, distribution-based Gaussian Mixture,
Hierarchical clustering, Density-Based Spatial Clustering of Application with Noise (DBSCAN), and self-
organizing map (SOM) clustering. Different clustering methods are suitable to deal with different types of
dataset.
K-means clustering algorithm starts by randomly setting the center of each group. Data points are
clustered around nearest centers and new centers are iteratively recalculated based on data points in various
clusters. Gaussian Mixture clustering model assumes the dataset is generated based on Gaussian distribution.
Hierarchical clustering model clusters dataset by merging or splitting data hierarchically based on certain
similarities. DBSCAN cluster the dataset based on the density of the data points. The algorithm will
group points with a lot of neighbors into one group and recognize points with fewer neighbors as outliers.
The algorithm needs users to define the distance of neighbors and the number of neighbors to identify a
cluster. The last algorithm, SOM, is an algorithm that utilizes neural networks to perform an unsupervised
learning on the dataset. The algorithm is an unsupervised dimensionality reduction algorithm that projects
high dimensional data into 2 dimensions while keeping its original similarity. Here, we first apply SOM
projection, then use K-means to cluster the dimensionality-reduced data into groups.
SPE-191400-MS 9
Correlation analysis
We first applied the clustering algorithms on all the input logs. The clustering results were unreliable. High
dimensionality and high nonlinearity when using all the input logs resulted in complex relationships that
were challenging for the clustering algorithms to resolve into reliable groups. We then selected three logs,
namely DPHZ, NPOR, and RHOZ, for the clustering becase the three logs exhibited strong correlation with
the prediction accuracy of machine learning models in Well 1 (Figure 4).
Figure 4—Correlation plots between relative error in log synthesis using ANN and (a) DPHZ, (b) NPOR, and (c) RHOZ.
K-means clustering
K-means clustering requires us to manually set the number of clusters. Formations are clustered into 3
groups, such that the groups correspond to good, intermediate, and bad prediction performances. Figure
5 shows the inertia, or the sum of squared criterion, with respect to the number of clusters. The turning
point of the plot is approximately when the number of clusters is 3, such that on addition of more clusters,
the rate of decrease in inertia reduces. For the purposes of comparison, we also use 3 clusters for other
clustering algorithms.
Figure 5—Inertia with respect to the number of clusters in K-means clustering to identify the optimal number of clusters.
The K-means clustering results are acquired across the 4240-feet formation in Well 1. Similar to the
calculations of RE for the synthesized logs, we averaged the group number every 50-ft to eliminate the
effects of noise and outliers. The averaged group number is compared against the averaged RE across
the formation in Figure 6. The K-means clustering results are also shown in column 7 in Figure 3, which
compares the correlation between the group pattern of K-means clustering results with the relative error of
the log synthesis. Figure 6 shows a strong correlation between the averaged group number and the prediction
accuracy with Pearson correlation coefficient to be 0.76. The correlation indicates the relationship between
different groups obtained from the clustering and the prediction accuracy of the ANN-based log prediction
model. Zone with similar properties are clustered into one group, and each group exhibits similar log-
10 SPE-191400-MS
generation prediction performances. Figure 6 shows that if a zone is clustered into group 2, the prediction
performance is very likely to be lower than other groups. Since the value of the group number is meaningless,
we conducted permutation of three group numbers to generate the best correlated averaged group number
and relative error plot.
Figure 6—Comparison between averaged K-means clustering group number and averaged prediction relative error.
Gaussian Mixture
Figure 7—Visualization of averaged group number and averaged

prediction relative error with linear regression of 95% confidence.
Gaussian mixture clustering algorithm assumes the data in a specific cluster are generated based on a specific
Gaussian distribution. For a dataset with multiple dimension, the Gaussian mixture model fit the dataset by
parameterizing the weight of each cluster ϕi, and the mean and covariance of the cluster Σi, where i is the
cluster number. If there are K clusters in the dataset, Gaussian mixture model fits the dataset by optimizing
the following sum of Gaussian distributions:
SPE-191400-MS 11
(11)
where x is the data point vector, K is the number of clusters, is the mean of a cluster, Σi is the covariance
matrix, and ϕi is the component weights. N is Gaussian distribution. The sum of the weights of all clusters
equals to 1. After fitting the data with multiple Gaussian distributions, the results can be used to cluster
each data point into a cluster.
Using multiple Gaussian distributions to fit the dataset is reasonable for clean data with no noise. Well
logs used in this study contain noise and uncertainties, which may result in a high variance of each cluster.
The clusters may overlap with each other and hard for the algorithm to differentiate into distinct clusters.
From Figure 7a, we can see that the average group number correlate loosely with the relative error with
a Pearson correlation coefficient of -0.22. The points scatter all over the plot and do not show obvious
correlation. In Figure 3, we can see that Gaussian mixture model identified several layers. However, the
pattern learned by the model is completely different from the patterns of relative errors of shallow learning
models used for log synthesis.
Hierarchical clustering
Our application of Hierarchical clustering algorithm starts with every data points as a cluster, then clusters
are repeatedly merged together based on their similarity, until the target number of clusters is reached. The
similarity of clusters is evaluated based on the sum of squared difference. The merging process forms a
hierarchical tree of clusters. The hierarchical clustering algorithm does not differentiate the input dataset
enough. In Figure 7, we can see that most of the averaged group number are located near group 0, which
means the method cluster most of the formation data points into one group. The cluster number negatively
correlated with the relative error. The results are shown in Figure 3 and Figure 7 demonstrate that the
hierarchical cluster algorithm does not differentiate different formation as expected.
DBSCAN clustering
DBSCAN is a density-based clustering method. Unlike the K-means clustering, the DBSCAN methods do
not need the user to manually define the number of clusters. Instead, it requires the minimum number of
neighbors and the range of distance to identify neighbors. Within a certain range, the algorithm will count
the number of neighbors, if the number of neighbors exceeds the minimum values, DBSCAN will identify
this group of data points as a cluster. Since high dimensional well logs can not be easily visualized, we tried
a different combination of the parameters to get 3 clusters. We set our minimum number of neighbors as
100 and the range of distance as 10. The results show that DBSCAN clustering method identifies a large
number of data points as outliers, which are clustered into group -1. By averaging the result similar to that
implemented in K-means, we found that DBSCAN algorithm does not successfully identify the patterns in
the input logs. Most of the formations are clustered into group -1, which means outliers, or clustered into
group 0 (Figure 3).
SOM clustering
SOM is not strictly a clustering algorithm. It is a dimensionality reduction algorithm mostly used in
high dimensional data visualization. Based on its characteristics of reducing the dimensionality while
remaining the high dimensional relationship, SOM clustering algorithm is developed by first applying SOM
dimensionality reduction, and then perform K-means clustering. In our study, the SOM has a dimension
of 50 by 50. The SOM was initialized by random weight vectors. During training, the weight vectors are
updated based on the similarity between the weight vectors and input vectors. The similarity during the
update is evaluated based on Euclidean distance. The clustering method is basically a K-means clustering
performed on the mapping result of SOM. The mapping may result in alteration of information in the original
12 SPE-191400-MS
dataset. The result of SOM clustering does not have a strong correlation with the prediction relative error
of the 6 models, which can be easily seen in both Figure 3 and Figure 7.
Comparison of clustering results

To better visualize the clustering results of each clustering method, we use t-Distributed Stochastic Neighbor
Embedding (t-SNE) dimensionality reduction algorithm to project the input logs into 2 dimensions.
Dimension reduction enables us to plot the input logs as points while keeping its relationship in high
dimensional space. This visualization technique can help us compare the characteristics of the applied
clustering algorithms.
t-SNE is one of the most effective algorithms for nonlinear dimensionality reduction. The basic principle is
to compare the similarity between two data points and construct probability based on the similarity. When
projecting the data points into lower space, the t-SNE algorithm constructs a similar probability distribution
and minimizing the difference between the two distributions. If a data point is similar to another in high
dimensional space, then it is very likely to be picked as a neighbor in lower space. The algorithm has been
applied to visualize dataset like figures, text, and music based on the similarity.
To apply t-SNE, it needs to define perplexity and training step. Perplexity defines the number of
neighbors, and it usually ranges from 5-50 and it needs to be larger for large dataset. In our study, we tested
a range of values and selected the perplexity and the training steps to be 100 and 5000. The results are
shown in Figure 8.
Figure 8—Visualizing input logs using t-SNE dimensionality reduction algorithm colored with
clustering results. Each point represents a layer, points near each other shares more similarity.
Red circles denote formations with a high relative error when applying 6 predictive models.
SPE-191400-MS 13
Figure 8 is the results of dimensionality reduction. Figure 8 has 7 subplots, each plot uses the same
manifold from t-SNE but colored with different information, such as prediction relative error, lithology, and
cluster numbers obtained from various clustering methods. Each point on the plots represents a formation
layer. The t-SNE algorithm projects input logs as a point on the plots. Input logs that are similar to each other
will be projected as neighbors. On the plots, points are divided into several blocks. Data points in the same
block share the most similarity. The shapes of blocks are random, and it may change when the algorithm is
applied using different parameters. The value of the x-axis and y-axis does not have any meaning.
Figure 8a is colored with the relative error of log synthesis performed using the ANN prediction model.
In the figure, formations with higher relative errors are concentrated in the two blocks highlighted inside red
circles. Compared with Figure 8b, the data points with low prediction accuracy are mostly from formations
5 and 8. For other layers, the prediction relative errors are mostly lower than 0.1. Comparing Figure 8a and
8b, the prediction relative error has a similar pattern with the lithology. Figure 8c-g are colored with the
group numbers computed using different clustering algorithms. By comparing the clustering results from
Figure 8c-g, we can get the characteristics of each clustering algorithm. Figure 8e-g are very similar to each
other. Hierarchical clustering, DBSCAN, and SOM generate very similar clustering results. Most of the
data are clustered into one group, and the rest of the data are clustered into other two groups. Data points in
the two smaller clusters are in the upper middle position in the plots. Interestingly, the DBSCAN algorithm
identifies data points in the position as outliers. Hierarchical clustering, DBSCAN, and SOM are sensitive
to the outliers. These algorithms may identify outliers as one cluster and the normal data as another cluster.
K-means clustering results and Gaussian Mixture clustering results are shown in Figure 8c and 8d,
respectively. Both algorithms show good differentiation ability on the input logs. The Gaussian Mixture
clustering results are closely related to the t-SNE dimension reduction results. Different clusters in the
Gaussian Mixture model coincide with different blocks on the plots. This may be due to the reason that both
algorithms rely on probability distribution to model the data. However, the results from Gaussian Mixture
model does not have a similar pattern with the lithology and relative error. K-means clustering algorithm
has the best correlation with the prediction relative error. This is also confirmed by analysis in the section
5.2. For the data points that have a high relative error when applying the prediction models (notated by
the red circle), K-means clustering algorithm cluster them into group 2. Moreover, the K-means clustering
results have a very close pattern with the lithology plot in Figure 8b.
CONCLUSION
This study applied 6 shallow machine learning models to synthesize the compressional and shear travel
time logs by processing 13 conventional easy-to-acquire logs in a shale reservoir. All the 6 models achieve
good prediction accuracies with R-squared for log synthesis using ANN model of 0.85 in one well. ANN
model performs the best among the 6 models. The models are deployed on second well to validate the
robustness of the models during bling test, and the R-squared for log synthesis using ANN model is 0.85 in
the second well. The 6 models show similar distributions of relative errors for the synthesized compressional
and shear travel time logs. In few formations, the 6 shallow learning models exhibit high relative error.
For the purpose of generating a reliability indicator to accompany the synthesized logs when deploying
these shallow learning models in formations where the desired logs are absent, we applied 5 clustering
methods to cluster the formations using the easy-to-acquire conventional input logs. Among the 5 clustering
algorithms, centroid-based K-means clustering algorithm outperforms other algorithms in generating the
relatability indicator that exhibits strong correlation with the relative errors in log synthesis. Hierarchical
clustering, DBSCAN, and SOM clustering algorithm are sensitive to outliers but fail in generating the
reliability indicator. Gaussian Mixture model can differentiate the formations into robust clusters but the
clusters do not correlate with the relative errors. Most formations that have prediction relative error higher
than 0.3 are clustered into group 2 by K-means clustering. By applying predictive models, like ANN, we
14 SPE-191400-MS
can improve geomechanically characterization under data constraint by predicting DTC and DTS logs. At
the same time, K-means clustering can give an indication of the reliability of the prediction results generated
using ANN model. The proposed procedure provides a data-driven approach to solve the characterization
problems under data constraint using the redundant information in well logs.
REFERENCES
Asoodeh, M., and Bagheripour, P. 2012. Prediction of compressional, shear, and stoneley wave velocities from
conventional well log data using a committee machine with intelligent systems. Rock mechanics and rock engineering
45 (1): 45-63.
Baines, V., Bootle, R., Pritchard, T., Macintyre, H., and Lovell, M. 2008. Predicting Shear And Compressional Velocities
In Thin Beds. Presented at the 49th Annual Logging Symposium
Elkatatny, S.M., Zeeshan, T., Mahmoud, M., Abdulazeez, A., and Mohamed, I.M. 2016. Application of Artificial
Intelligent Techniques to Determine Sonic Time from Well Logs. Presented at the 50th U.S. Rock Mechanics/
Geomechanics Symposium, Houston, Texas, 2016/6/26/.
Greenberg, M., and Castagna, J. 1992. SHEAR - WAVE VELOCITY ESTIMATION IN POROUS ROCKS:
THEORETICAL FORMULATION, PRELIMINARY VERIFICATION AND APPLICATIONS. Geophysical
prospecting 40 (2): 195-209.
Iverson, W.P., and Walker, J.N., 1988. In SEG Technical Program Expanded Abstracts 1988, Chap. 111-113. Society of
Exploration Geophysicists.
Jain, V., Gzara, K., Makarychev, G., Minh, C.C., and Heliot, D. 2015. Maximizing Information through Data Driven
Analytics in Petrophysical Evaluation of Well Logs. Presented at the SPE Annual Technical Conference and Exhibition,
Houston, Texas, USA, 2015/9/28/.
Keys, R.G., and Xu, S. 2002. An approximation for the Xu-White velocity model. Geophysics 67 (5): 1406-1414.
Li, H., and Misra, S. 2017a. Prediction of subsurface NMR T2 distribution from formation-mineral composition using
variational autoencoder. SEG Technical Program Expanded Abstracts 2017: 3350-3354.
Li, H., and Misra, S. 2017b. Prediction of Subsurface NMR T2 Distributions in a Shale Petroleum System Using
Variational Autoencoder-Based Neural Networks. IEEE Geoscience and Remote Sensing Letters PP (99): 1-3.
Maleki, S., Moradzadeh, A., Riabi, R.G., Gholami, R., and Sadeghzadeh, F. 2014. Prediction of shear wave velocity using
empirical correlations and artificial intelligence methods. NRIAG Journal of Astronomy and Geophysics 3 (1): 70-81.
Moghaddas, H., Habibnia, B., Ghasemalaskari, M.K., and Moallemi, S.A. 2017. Lithofacies classification based on
multiresolution graph-based clustering using image log in South Pars gas field. Presented at the 2017 SEG International
Exposition and Annual Meeting, Houston, Texas, 2017/10/23/.
Qin, X., Xu, Y., Yan, H., and Han, D.-h. 2017. Unsupervised well clustering: Pattern recognition in overpressure
mechanisms. Presented at the 2017 SEG International Exposition and Annual Meeting
Rezaee, M.R., Ilkhchi, A.K., and Barabadi, A. 2007. Prediction of shear wave velocity from petrophysical data utilizing
intelligent systems: An example from a sandstone reservoir of Carnarvon Basin, Australia. Journal of Petroleum
Science and Engineering 55 (3-4): 201-212.
Shi, X., Cui, Y., Guo, X., Yang, H., Chen, R., Li, T., Li, R., Wang, R., Wang, J., and Meng, L. 2017. Logging Facies
Classification and Permeability Evaluation: Multi-Resolution Graph Based Clustering. Presented at the SPE Annual
Technical Conference and Exhibition, San Antonio, Texas, USA, 2017/10/9/.
Tariq, Z., Elkatatny, S., Mahmoud, M., and Abdulraheem, A. 2016. A New Artificial Intelligence Based Empirical
Correlation to Predict Sonic Travel Time. Presented at the International Petroleum Technology Conference, Bangkok,
Thailand, 2016/11/12/.

li2018

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

li2018

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

li2018

Uploaded by

Copyright:

Available Formats

SPE-191400-MS

Data-Driven In-Situ Geomechanical Characterization in Shale Reservoirs

Hao Li, Jiabo He, and Siddharth Misra, University of Oklahoma

Copyright 2018, Society of Petroleum Engineers

Data Preparation and Preprocessing

Measurement of prediction performance

Ordinary Least Squares (OLS) model

Partial Least Squares (PLS) model

Least Absolute Shrinkage and Selection Operator (LASSO) model

Table 1—Estimates of coefficients βq in the LASSO model for λ = 4.83.

Multivariate Adaptive Regression Splines (MARS) model

Artificial Neural Network (ANN) model

Accuracy OLS PLS LASSO ElasticNet MARS ANN

DTC 0.830 0.830 0.791 0.791 0.847 0.870

DTC 0.804 0.790 0.778 0.774 0.816 0.850

Comparison of prediction performances of six models in Well 1

CLUSTERING ANALYSIS OF THE PERFORMANCE

Figure 7—Visualization of averaged group number and averaged

Comparison of clustering results

You might also like