Adaptive Multi-Resolution Graph-Based Clustering Algorithm For Electrofacies - Wu Hongliang
Adaptive Multi-Resolution Graph-Based Clustering Algorithm For Electrofacies - Wu Hongliang
net/publication/344313893
CITATIONS READS
36 149
6 authors, including:
Binsen Xu
China University of Petroleum, Beijing
9 PUBLICATIONS 147 CITATIONS
SEE PROFILE
All content following this page was uploaded by Binsen Xu on 29 September 2024.
Manuscript received by the Editor September 28, 2019; revised manuscript received February 07, 2020.
*This work was sponsored by the Science and Technology Project of CNPC (No. 2018D-5010-16 and 2019D- 3808)
1. Research Institute of Petroleum Exploration & Development, PetroChina, Beijing 100083, China.
2. School of Computer Science and Technology, North China University of Technology, Beijing 100144, China.
3. College of Software, Beihang University, Beijing 100191, China.
♦Corresponding Author: Wang Hua-Feng (Email: [email protected])
©2020 The Editorial Department of APPLIED GEOPHYSICS. All rights reserved.
13
Adaptive multi-resolution graph-based clustering algorithm
not achieved good results in the practical applications networks, we proposes a graph-based adaptive multi-
because of slow learning speed and low generalization resolution clustering analysis method (AMRGC).
capacity. Since then, Ye (Ye, 2000) proposed a multi- Compared with the MRGC, we mainly make two
resolution graph-based clustering method (MRGC), aspects of improvement in new proposed method: (1) a
and this algorithm has been developed based on some light kernel representative index (LKRI) is used instead
lithofacies analyzing software (Sutadiwirya, 2008; of the KRI in MRGC algorithm; (2) Use the BP(MLP)
Pabakhsh, 2012; Khoshbakht and Mohammadnia, 2000; instead of KNN in the prediction of the distribution
Pabakhsh, 2012; Nouri‐Taleghani, 2015; Tian, 2016). of the new data stage (also named as the propagation
Compared with the traditional clustering algorithms stage). Consequently, the unstable results of the
(TCAs) such as the Self Organizing Map (SOM)( traditional K-nearest neighbor algorithm due to random
Kohonen, 1990; Tian, 2016), Dynamic Clustering (DYN) initialization parameters are effectively avoided.
(Diday, 1971; Mourot, 1993), Ascendant Hierachical
Clustering (AHC) (Lance and Williams, 1967; Lukasová,
1979) and Artificial Neutral Network (ANN) (Tang, Graph-based adaptive multi-resolution
2011), the MRGC (Ye, 2000) is proved to be a better
option, because those TCAs have been thought to have cluster analysis method
following limitations: (1) need to know the number of
clusters beforehand; (2) be sensitive to initial conditions
and variations of parameter values; (3) not robust to In general, the MRGC at least consists of five steps
the data variation in practical application (Mourot and (Ye, 2000): (1) Calculating the Neighbor Index(NI)
Bousghiri, 1993). which will help estimate the probability density function
For all clustering methods, determining the optimal (PDF); (2) Calculating the KNN attractions based on the
number of clusters is one of the most important tasks calculated NI values in the first step; (3) Calculating the
of a clustering algorithm. For instance, MRGC is Kernel Representative Index (KRI) which is necessary
characterized by the nonparametric K-nearest neighbor, for generating a proper range in determining the number
approach and a graph data representation. In practice, of clusters; (4) Merging those initial small groups to
the MRGC is a good tool which analyzes the structure of form final clusters by the Nearest Neighbors’ Attraction
the complex data and partition natural data groups into Power; (5) Predicting the distribution of the new data. As
different shapes, sizes, and densities. However, due to shown in Figure 1, the difference between the traditional
the large amount of calculation of the MRGC algorithm MRGC and AMRGC mainly lies in (i) calculating
itself, and the use of the KNN method as an important the Light-Kernel Representative Index (LKRI); and
step in the MRGC propagation prediction stage, the (ii) using LPA based on BP network to predict the
overall process has seriously reduced the calculation distribution of new data.
efficiency of the MRGC algorithm. As mentioned earlier,
MRGC needs to rely on initialized parameter values, Data normalization
which ultimately leads to unstable analysis results. In order to eliminate the inconsistency of the actual
In summary, the BP neural network method and the data dimension of the original log and the fluctuation
MRGC method each have advantages and disadvantages, of the value range, the acquired raw log data is usually
but there is a certain complementarity between the cleaned and being normalized prior to further processing
two. In this paper, inspired by MRGC and BP neural (Dodge, 2003). Practically, the raw log data is
MRGC MRGC
Calculating the Kernel Using KNN to predict the
Representative Index (KRI) distribution of new data
Data
Normalization AMRGC
Calculating the Light-Kernel Merging groups AMRGC
Representative Index (L-KRI) to form final Using MLP to predict the
Calculating the
Neighbor Index(NI) clusters distribution of new data
Form KNN
Attraction Sets
process in Figure 1.
Fig.2 Illustration of the relationship among U, V, and W.
1
j i n 1 xij ( j 1,2,, m ), (1) As shown in Figure 2, 1NN of x1 is x5, so we can get
n
U[1][1]=5, which means 1NN of x1 is x5, V[1][5]=1,
1 n which means x5 is 1NN of x1, and x1 is x5’s 5NN, so
2j
n 1 i 1
( xij j )2 ( j 1,2,, m ), (2) there is W[1][1]=5, which means that x1 is 5NN of its
1NN. Among them, W[i][j] can be calculated from the
matrix U, V, and further we have W[i][j] according to
(1 , 2 , , m ) ( j 1,2,, m ), (3) V[U[i][j]][i].
2 ( 12 , 22 , , m2 ) ( j 1,2,, m ), (4)
Calculation of the neighbor index(NI)
Let the measurement of a given point x be an element
where xi represents the vector of xi after the norma- of a set of measurement points D = {x1, x2, x3, ..., xn}, and
lization, such that, let point y be measurement point x’s th Nearest Neighbor
(NN) in the set of measurement points S, n≤K. The
xij j “limited rank” τn of measurement point x with respect to
xij ( j 1,2,, m ), (5) its nth NN(y) is defined to be:
2j
c (
xi ) exp( m / ), (8)
xi ( xi1 , xi 2 , , xij ,, xim ) ( j 1,2,, m ), (6)
where α is greater than zero and c≤n-1. It is noted that
Thus, we have the Euclidean distance between any
α is insensitive to the size of the data set and may be
given two data points (xi, xj) as:
initialized once for all data set. In practice, α will be set
to 10.
D ( xi , x j ) mj 1 ( xij x jj ) 2 , (7) In fact, in this paper, the limited rank τi is defined for
calculating xi’s K nearest neighbors. The sum of the
Next, we define a matrix U=(uij)n×n, and U[i][j]=uij
limited ranks for each point x is expressed as,
which is the index value of the jth nearest neighbor(NN)
of measurement of vector xi based on the calculation s( xi ) in11 c ( xi ),
(9)
of D( xi , x j ) , i.e. x j ’s j th nearest neighbor is xuij . For
convenience, xi is considered to be the n th nearest Secondly, the smallest value Smin and largest value
Smax of rank sums are calculated by
neighbor of itself. Once U is determined, a sorting
algorithm can be applied to identify the rank matrix
Smin ( x ) Min{s(
xi )},(i 1, 2,, n 1), (10)
W=(w ij) n×n in which w ij means W [i][j] and W indicates
that how much xi is relevant to its’ jth nearest neighbor. Smax ( x ) Max{s (
xi )},(i 1, 2,, n 1), (11)
Meantime, a companion matrix V=(vij)n×n is initialized
with vj=j (j=1, 2, ..., n), which indicates that x j is the Thus, the NI calculation formula can be expressed as,
15
Adaptive multi-resolution graph-based clustering algorithm
s ( xi ) S min ( x ) means xi is most likely in the valley zone.
NI ( xi ) ,(i 1, 2, , n 1). (12)
S max ( x ) S min ( x ) Also, at very start, we need to determine all objects’
situations, then the attractions sets are formed. We have
the attraction sets S={Si}(i=1, 2, ..., N) (N is the number
The K nearest neighbors’ attraction calculation of kernel points) which is the initial group which can
method help cluster the delta.
After calculating the NI of all objects, we can use Calculating the light kernel representative
a multidimensional KNN point-to-point attraction
algorithm. And it will help us calculate adherence points’ index (LKRI)
attraction for every measurement ofpoint and determine In order to further determine the cluster core, we need
the small data group center. Attrxi ( x j ) is xi ’s Kth nearest to calculate the KRI of each kernel point xi (i=1, 2, ...,
neighbor x j ’s attraction to xi , which can be calculated n), we set 1-st nearest neighbor with x j , whose NI( x j )
by, > NI(xi ), and H(xi , x j ) = h , which means that x j is the
hthNN of xi . Thus, the KRI of xi can be calculated as
Attrx ( x j ) NI ( x j )Vx ( x j ) NI ( xi ).
(13)
i i follows,
It is acknowledged that a better data clustering results
could be obtained even with a larger amount of data and KRI ( xi ) NI ( xi )a H ( xi , x j )b D( xi , x j )c , (16)
a higher data dimension when the K value is ranged from
4 to 12 (Hastie, 2009). Meantime, it is also realized that where a, b, c are used to weight each corresponding
the probability of misclassification with K value in this function NI( x j ) , H(xi , x j ), and H(xi , x j ), D(xi , x j ) under
interval could be minimized. In view of this reason, the condition that a = b = c = 1.
K value is configured by 4 for the following experiments. Compared to MRGC, AMRGC only needs to calculate
the KRI value according to the equation (16) based on
The adherence function Vx ( x j ) is defined as ,
i
the previously obtained free attractor nodes. In contrast,
the MRGC will have to do the calculation on all the
1 (if x is one of K nearest neighbor points of x ),
Vx ( x j ) j i
data points. Therefore, we call the new algorithm LKRI,
i
0 otherwise.
(14) which means that AMRGC is with a less computational
load. In the experimental section, the benefits of
replacing KRI with LKRI will be discussed in detail.
When xi belongs to the maximum of Attrxi ( x j ) , xi
On the other hand, the reason why this algorithm
is attracted to the nearest neighbor x j . If all x j in the K can obtain suitable kernel point candidates is that by
nearest neighbor of xi do not attract xi , and this implies analyzing NI, as we know, the points with higher NI
values are more likely to become the center of clustering.
that xi is not adhered to any other points. We calculate
Actually, an NI value can be analogously attractive, and
the maximum of Attrxi ( x j ) as the Attrx , that is,
i
a higher NI value means that the selected point is much
more attractive to other data points, the more likely it
Attrx Max{ Attrx ( x j )}. (15) will be near the center of the cluster.
i i
higher NI in the vicinity. This also means a point whose cluster center points. Not only is its’ clustering result
NI value is larger than that of the known point tends to quite consistent with that of the traditional MRGC when
be a long way from the known point. That is, the free there are fewer center points needed to be determined,
attractor nodes chosen by thresholding NI values of but also it outperformed the traditional MRGC when
given points tend to have a high probability of being many center points occurred. In short, the new algorithm
kernel point candidates. Later experiments also show can greatly reduce the amount of process computation
that, compared with the results of previous kernel under the premise of ensuring that the correct cluster
selection algorithms, AMRGC is good at selecting the center is chosen.
(a) (b) 0.2
0.1
0.03
0.1
0.2
0.6
0.8
0.95
0.05 0.08
0.9
0.7 0.7
0.5 0.55
0.15 0.2
0.3 0.1
At each layer, we first compute the total input z for each equation (20), the backward propagation algorithm is
unit, which is a weighted sum of the outputs of the units used. In Figure 4b, the equations are used for computing
in the layer below. Then a non-linear function f(∙) is the backward propagation. The output error between
applied to z to get the output of the unit. For simplicity, output and target value can be describe as equation (21),
the bias terms isare omitted. The non-linear function which we also call it as “Loss function” and is shown as
used in neural networks is a rectified linear unit (ReLU) following:
f(z) = Max(0, z). The output yl is classes’ label vector 1 1 l
which is predestinated for AMRGC. Such that yl will be E || t yL ||22 (tl f ( ml f ( jm f ( ij xi )))) 2 ,
2 2 l 1
expressed as
(m H 2, j H1 , i input, l output ). (21)
( zl ) f ( ml ym )
yl f
where the ωml, ωjm and ωij can be referred to Figure 4.
= f ( ml f ( zm )) f ( ml f ( jm y j ))
Once the ∂E/∂zm is known, the error-derivative for the
weight ωjm on the connection from unit j in the layer
below is perceived as yj ∙ ∂E/∂zm. Then we can determine
( ml f ( jm f ( z j ))) f ( ml f ( jm f ( ij xi ))),
f (20)
the relationship between E and input vectors by using
In order to confirm the values of ω ml, ω jm, ω ij in the chain rule of derivatives.
(a) (b)
Output
units
Hidden
units H2
Hidden
units H1
Input
units
Fig. 4 Propagation lgorithm workflow. (a) The four-layer forward propagation perceptron.
(b) The four-layer backward propagation perceptron.
When we have new objects to train, we can use parameter K in this experiment was set to 4, and the
gradient descent algorithm to determine the unknown merging parameter K’ was set 8. In the experiment, we
weight parameters and predict the test objects by randomly selected a group of acquired logging data,
corresponding weight parameters. In recent years, the which consists of 840 data points. After descending
neural network technology has made a breakthrough the LKRI, the result is shown in Figure 5. These points
progress (Hinton, 2006) and received great attention. are arranged in descending order according to the
As aspect of our contribution, the new proposed method LKRI value, and the free attraction points in the front
combines the neural network with the clustering method. more likely become kernel points, which are retained
Specifically, this method extracts features from the
training data by neural network, and then uses the trained 1.0
Section 3. 0.4
0.2
18
Wu et al.
in the subsequent fusion process. The dark blue circle little effect on the clustering results and the clustering
area indicates the number of corresponding point sets results are relatively stable, which also indicates the
obtained by selecting different numbers of free attraction robustness of AMGRC. The effect of K value on
points. Wherein, each ellipse or circle represents a set of clustering results is shown in Figure 6. According to the
points formed by clustering. clustering results shown in Figure 6, it can be concluded
In this experiment, five-dimensional data such as that the initial value of the K value has little effect on
natural gamma-ray(GR), lithology density(DEN), the final clustering results. The colored balls in Figure
compensated neutron (CNL), sonic interval transit time 6 represent different clusters acquired after data points
(AC), and deep resistivity (RD) were used for clustering are processed by the clustering algorithm, and the colors
experiments. Finally, the two-dimensional data of GR represent different categories. The lithofacies label is
and CNL are drawn and displayed. In the figure, the represented by each color needs to be obtained after
abscissa axis is GR and the ordinate axis is CNL. It can further processing.
be seen from these experiments that the K value has
(a) K=5 (b) K=6 (c) K=7 (d) K=8
25 25 25 25
20 20 20 20
15 15 15 15
CNL (%)
CNL (%)
CNL (%)
CNL (%)
10 10 10 10
5 5 5 5
0 0 0 0
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
GR (API) GR (API) GR (API) GR (API)
Through the LKRI value, we can obtain multiple Therefore, the AMRGC algorithm can greatly reduce
important indicators from those data points, which the amount of calculation. The experiment results depict
corresponds to the best cluster number at different that the AMRGC algorithm can still produce valid
resolutions. After obtaining the optimal number of results although the amount of calculations decreases
clustering centers, we can still manually adjust the dramatically . In other words, the AMRGC algorithm can
clusters. As for computing resource consumption, the obtain the same kernel points as the MRGC algorithm,
AMRGC algorithm only needs to calculate the KRI and can even obtain better clustering results than the
value of a free attractor node (in this case, about 40 MRGC algorithm. As shown in Figure 8, it illustrates
data points), while the traditional MRGC algorithm different clustering results with different numbers of
must calculate the KRI value of all data points (in this clusters: 3, 5, 7, and 10 (both the result of traditional
case Case, 840 data points). This indicates that the MRGC and our proposed AMRGC algorithm are listed).
computational cost of the new KRI calculation method
According to the results, we can observe that when the
(LKRI) is reduced by 1/16 compared to what MRGC
number of clusters is small, LKRI and KRI will choose
usually requires. In fact, as the amount of data increases,
the same kernel and produce the same clustering results.
the proportion of all data points and free attractor nodes
Only when the number of clusters is large (for example,
increases rapidly (see Figure 7).
we found that it is 10 in this experiment), LKRI will
4000
select different kernel points from that the KRI select,
3500 KRI
LKRI and certainly the clustering results will be different.
3000
Also, the experimental results show that the kernel
Operations
2500
points selected by the LKRI method can obtain the same
2000
1500
or even better clustering results.
1000
In general, log interpretation experts understand
500 the surface truth of lithofacies by referring to the
0 information provided by coring wells. Based on coring
0 500 1000 1500 2000 2500 3000 3500 4000
well data, the number of clusters was determined.
Data size (Sample number)
According to the recommendations of experts, in the
Fig. 7 The comparison of the calculation consumption following experiments, we used the cluster clustering
between AMRGC and MRGC.
19
Adaptive multi-resolution graph-based clustering algorithm
model to classify the lithofacies. In addition, in order to DYN algorithms to 5 (all of these methods need to know
compare the clustering effect with other algorithms, we the number of clusters in advance). Figures 9 and 10
also set the number of clusters of the SOM, AHC, and show the final clustering performance of each method.
Clustering
AMRGC MRGC
number
25 25
20 20
15 15
CNL (%)
CNL (%)
3 3
10
N=3 10 2 2
5 1 5 1
0 0
0 50 100 150 200 0 100 200
GR (API) GR (API)
25 25
20 20
5 5
15 15
CNL (%)
CNL (%)
4 4
N=5 10 3 10 3
5 2 5 2
1 0 1
0
0 50 100 150 200 0 50 100 150 200
GR (API) GR (API)
25 25
7 7
20 20
6 6
15 15
CNL (%)
CNL (%)
5 5
N=7 10 4 10 4
5 3 5 3
2 2
0 0
0 50 100 150 200 1 0 50 100 150 200 1
GR (API) GR (API)
25 10 25
10
9
20 20 9
8
8
15 7 15 7
CNL (%)
CNL (%)
6 6
N=10 10 5
10
5
4 5 4
5
3 3
0
0 2 2
0 50 100 150 200
0 50 100 150 200 1 1
GR (API) GR (API)
CNL (%)
CNL (%)
CNL (%)
20
Wu et al.
As shown in Figure 10, it intuitively proves that the each class in clustering result, one point is one target,
AMRGC algorithm performs better than other methods and the target has two dimensional attribute values of
(such as SOM, AHC, DYN, etc.) in well facies GR and CNL.
prediction. The legends in Figure 8 and Figure10 are In order to compare AMRGC quantitatively with
further illustrated in Figure 11. The point number column other unsupervised clustering algorithms, we divided
in Fig. 11 represents the number of sample points for each well section of the experiment into 10 small
5 5 5 5
3990
4000
4010
4020
4030
Dolomite
Facies 1 5 limestone
290 26.0 5.1
Argillaceous
Facies 2 4 dolomite
58 46.0 4.2
Containing
Facies 3 3 dolomite
130 63.6 6.4
Fig. 11 The correspondence of the facies types and the AMRGC clustering results.
21
Adaptive multi-resolution graph-based clustering algorithm
sections to calculate the coincidence rate separately. The hyperparameters used in the experiment are shown in
coincidence rate is calculated by comparing the sum Table 2. The specific steps are as follows: first, randomly
of each actual length in the prediction result with the select 80% of the data points from 840 data points and
total length of the actual lithofacies. The final degree of apply the AMRGC algorithm to obtain the clustering
overall consistency is measured by an average. From result; then, use the obtained output as input for training
Figure 10 and Table 1, it can be seen that the AHC MLP; finally, apply the trained model to the remaining
algorithm is easy to confuse the two types of lithology, 20% of the data points are predicted. Experimental
which seriously affects the coincidence rate. For the results show that when the accuracy of the training data
AMRGC algorithm, it has a higher coincidence rate, so reaches 87.5%, MLP can obtain better and more stable
it has better discrimination ability for most lithologies. prediction ability than LPA. The results are shown in
Figure 13. Next, for comparison, we use 20% of the
Tab. 1 the conformity ratios given by methods
data set to train the MLP, and use the generated training
Algorithm Conformity Ratio
model to predict the labels for the remaining 80% of the
AHC 30.2%
DYN 77.5%
data points. However, because the amount of training
SOM 81.2% data at this time is much smaller than the amount of test
AMRGC 84.2% data, it cannot have a good generalization ability for
the test data. In this case, using semi-supervised LPA
Propagation prediction generally yields better results than using multi-layer
perceptual neural networks with supervised learning.
In the experiment of propagation prediction, a 4-layer
However, without loss of generality, that is, to ensure a
perceptual neural network was used. The specific
sufficient amount of data for training, it is not difficult
(a) (b)
12 12
10 Confound 10
8 8
1 1
CNL (%)
CNL (%)
6 2 6 2
3 3
4 4 4 4
5 5
2 Confound 2
0 0
0 50 100 150 0 50 100 150
GR (API) GR (API)
Fig. 12 Prediction results comparison based on Label propagation and MLP propagation.
(a) LPA based on KNN. (b) MLP based on BP.
22
Wu et al.
to find that a multi-layered perceptual neural network Specifically, we compare the performance of AMRG
that has iterated 500 times has better propagation with other traditional clustering algorithms based
performance than LPA. The corresponding comparison on various indicators(as shown in Table 3). Specific
results are shown in Figure 14. indicators include whether prior knowledge is
In addition, we also compared the new method needed, processing efficiency in large-scale data, and
with the traditional unsupervised clustering method. characteristics of the propagation stage.
(a) (b)
25 25
20 20
1 1
15 15
CNL (%)
CNL (%)
2 2
10 3 10 3
4 4
5 5 5 5
Unbalanced 0 0
Confound
0 100 200 0 100 200
Unbalanced GR (API) GR (API)
Fig. 13 Results for a crossing validation. (a) LPA based on KNN. (b) MLP based on BP.
24
Wu et al.
Appendix
25
View publication stats